Loading content...
Imagine you're searching for a specific word in a book. Without any index or table of contents, what would you do? You'd start at the beginning, check if the first few characters match your word, and if not, move forward slightly and try again. This intuitive process—checking every possible position systematically—is exactly what the brute force pattern matching algorithm does.
The brute force approach (also called the naive algorithm) is the simplest conceivable solution to the pattern matching problem. It requires no preprocessing, no auxiliary data structures, and no mathematical cleverness. It simply answers the question: "Does the pattern occur here? No? Then what about here? And here?"
Yet this simplicity is precisely what makes it so important. The brute force algorithm serves as the conceptual baseline against which all optimized pattern matching algorithms are measured. You cannot truly appreciate why KMP achieves linear time or why Rabin-Karp uses hashing without first understanding what naive matching does—and what it does wrong.
By the end of this page, you will deeply understand the brute force pattern matching algorithm—its mechanics, its intuition, its implementation, and most importantly, the fundamental inefficiency that motivates every advanced string algorithm you'll encounter.
Before diving into the algorithm, let's crystallize the problem we're solving:
Given:
T of length nP of length mm ≤ n (the pattern is no longer than the text)Find:
i in T where P occurs as a substringi means T[i..i+m-1] = P[0..m-1]Example:
"ABABDABACDABABCABAB""ABABCABAB"The brute force algorithm attacks this problem in the most direct way possible: try every starting position in the text, and for each, check if the pattern matches character by character.
Throughout this module, we use 0-indexed strings. Position 0 is the first character. When we say the pattern occurs at position i, we mean T[i] is the first character of the occurrence, and T[i+m-1] is the last.
The brute force algorithm embodies a simple principle: exhaustive verification at every valid position.
Valid positions are those where the pattern could potentially fit within the text. If the text has length n and the pattern has length m, then:
n - mn - m + 1 valid starting positionsn - m, the pattern extends from T[n-m] to T[n-1], using exactly the last m charactersThe algorithm in plain English:
P[0] with T[0], then P[1] with T[1], and so onm characters match, record this as an occurrencen - m + 1 positionsThe term 'brute force' doesn't mean 'stupid'—it means 'complete and direct.' The algorithm makes no assumptions, uses no shortcuts, and guarantees correctness through exhaustive checking. This is often the right first approach: get something correct, then optimize.
Why "at every valid position"?
Consider searching for pattern "ABC" in text "AABCD" (n=5, m=3):
| Position | Substring Checked | Match? |
|---|---|---|
| 0 | AAB | No |
| 1 | ABC | Yes ✓ |
| 2 | BCD | No |
We can't start at position 3 or beyond because T[3..5] wouldn't have enough characters to form a length-3 pattern. Hence, we check positions 0 through 5 - 3 = 2 (inclusive).
This exhaustive checking is the algorithm's strength (it never misses an occurrence) and weakness (it never skips an unnecessary check).
Let's formalize the brute force algorithm with precise pseudocode and then implement it in code:
Pseudocode:
NAIVE-PATTERN-MATCH(T, P)
n ← length(T)
m ← length(P)
occurrences ← empty list
for i ← 0 to n - m do
j ← 0
while j < m and T[i + j] = P[j] do
j ← j + 1
end while
if j = m then
append i to occurrences
end if
end for
return occurrences
Explanation of the pseudocode:
for i): Iterates through all valid starting positions in the text (0 to n-m inclusive)while j): Compares pattern characters with text characters starting at position ij = m): If we successfully matched all m characters, j will equal mT[i+j] ≠ P[j], meaning a mismatch occurred at pattern position j1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
/** * Naive (Brute Force) Pattern Matching Algorithm * * Searches for all occurrences of pattern P in text T. * Returns an array of starting indices where P is found. * * Time Complexity: O(n * m) worst case, O(n) best case * Space Complexity: O(1) auxiliary (excluding output storage) * * @param text - The text string to search within * @param pattern - The pattern string to search for * @returns Array of starting positions (0-indexed) where pattern occurs */function naivePatternMatch(text: string, pattern: string): number[] { const n = text.length; const m = pattern.length; const occurrences: number[] = []; // Edge cases if (m === 0) { // Empty pattern matches at every position (including after last char) return Array.from({ length: n + 1 }, (_, i) => i); } if (m > n) { // Pattern longer than text: no possible match return occurrences; } // Try each valid starting position for (let i = 0; i <= n - m; i++) { let j = 0; // Compare pattern with text starting at position i while (j < m && text[i + j] === pattern[j]) { j++; } // If we matched all m characters, record this occurrence if (j === m) { occurrences.push(i); } } return occurrences;} // Example usageconst text = "ABABDABACDABABCABAB";const pattern = "ABABCABAB";console.log(`Text: "${text}"`);console.log(`Pattern: "${pattern}"`);console.log(`Occurrences at: ${naivePatternMatch(text, pattern)}`);// Output: Occurrences at: [10]Understanding the algorithm deeply requires tracing through an example step by step. Let's work through a complete example that reveals the algorithm's behavior:
Text: T = "AABCAAB"
Pattern: P = "AAB"
Here, n = 7 and m = 3, so we'll check positions 0 through 4 (inclusive).
| Position i | Substring T[i..i+2] | Comparison Steps | Result |
|---|---|---|---|
| 0 | AAB | A=A ✓, A=A ✓, B=B ✓ → j reaches 3 | MATCH at position 0 |
| 1 | ABC | A=A ✓, B≠A ✗ → mismatch at j=1 | No match |
| 2 | BCA | B≠A ✗ → mismatch at j=0 | No match |
| 3 | CAA | C≠A ✗ → mismatch at j=0 | No match |
| 4 | AAB | A=A ✓, A=A ✓, B=B ✓ → j reaches 3 | MATCH at position 4 |
Result: Pattern "AAB" occurs at positions 0 and 4.
Observations from this trace:
Total comparisons performed: 3 + 2 + 1 + 1 + 3 = 10 comparisons
Matches found: 2 occurrences where all 3 characters matched
Early termination: Positions 2 and 3 terminated after just 1 comparison (first character mismatch)
No position skipped: Even though we found a match at position 0, we still checked position 1, 2, 3, and 4 independently
This last observation is crucial. The naive algorithm treats every position independently. It doesn't "remember" anything from previous comparisons. We'll see in later pages how this lack of memory creates the opportunity for optimization.
At position 0, we verified that T[0..2] = "AAB". This tells us T[1..2] = "AB". When we move to position 1, we then compare T[1] with P[0] = 'A'—but we already know T[1] = 'A' from the previous match! The naive algorithm discards this information. Smarter algorithms exploit it.
The inner loop of the naive algorithm deserves careful examination because its behavior determines both correctness and efficiency.
The inner loop:
while j < m and T[i + j] = P[j] do
j ← j + 1
end while
Two-condition exit:
The loop can exit for exactly two reasons:
j = m (complete match): We've compared all m characters of the pattern, and every single one matched. This means P occurs at position i in T.
T[i + j] ≠ P[j] (mismatch): We found a character that doesn't match. No point continuing—this position cannot be a match.
Why this two-condition design is elegant:
j < m check comes first, preventing out-of-bounds accessj tells us exactly how many characters matched (useful for debugging and optimization)Visualizing match depth:
Let's denote the match depth at position i as the value of j when the inner loop exits. This tells us how far into the pattern we successfully matched.
Example: Text = "ABAAC", Pattern = "AAC"
| Position i | Characters compared | Match depth j | Why loop stopped |
|---|---|---|---|
| 0 | A=A | 1 | A≠A (T[1]='B') |
| 1 | ... | 0 | B≠A (first char) |
| 2 | A=A, A=A, C=C | 3 | j=m (full match!) |
At position 0, we matched 1 character before failing. At position 1, we matched 0 characters. At position 2, we matched all 3 characters—a complete match.
Key insight: The match depth varies dramatically across positions. When many characters match before a failure, we've done significant work that is then "thrown away" when we move to the next position. This wasted work is the fundamental inefficiency we'll address in later algorithms.
When implementing pattern matching, tracking the match depth at each position is invaluable for debugging. If you're getting wrong answers, print the match depth at each position—it immediately reveals where comparisons are going wrong.
Why is the naive algorithm correct? This might seem obvious, but stating it precisely builds intuition for more complex algorithms.
Theorem: The naive pattern matching algorithm finds all and only the positions where pattern P occurs in text T.
Proof (informal but rigorous):
Completeness (finds all occurrences):
P at position i means T[i+j] = P[j] for all j ∈ [0, m-1]i from 0 to n-mi, the inner loop will match all m charactersj will reach m, and we record position iSoundness (only finds actual occurrences):
i only when j = m after the inner loopj = m means the inner loop executed m iterations without breakingT[i+j] = P[j] for one value of jT[i..i+m-1] = P[0..m-1] (definition of occurrence)Edge cases:
P is empty (m = 0), we define it to match at every position (including position n)m > n, there are no valid positions, and we correctly return emptyWhen we later study optimized algorithms like KMP, we'll need to prove they're complete—that they don't accidentally skip valid occurrences while avoiding redundant work. The naive algorithm's completeness is trivial because it checks everything. For KMP, proving completeness requires careful reasoning about the failure function.
The naive algorithm can be implemented in several equivalent ways. Understanding these alternatives deepens comprehension and prepares you for variations in interview problems.
Formulation 1: Using Substring Comparison
Instead of comparing character by character, compare entire substrings:
12345678910111213141516171819202122
/** * Naive pattern matching using substring comparison. * Conceptually cleaner but typically slower due to substring creation. */function naiveWithSubstring(text: string, pattern: string): number[] { const n = text.length; const m = pattern.length; const occurrences: number[] = []; for (let i = 0; i <= n - m; i++) { // Extract substring and compare if (text.substring(i, i + m) === pattern) { occurrences.push(i); } } return occurrences;} // Note: substring() creates a new string each iteration,// which adds O(m) time and space per position.// Total: O(n * m) time, O(m) extra space per iteration.Formulation 2: Right-to-Left Comparison
Compare characters from the end of the pattern instead of the beginning:
123456789101112131415161718192021222324252627282930
/** * Naive pattern matching comparing right-to-left. * Same complexity, but different behavior on mismatches. * This direction choice inspires the Boyer-Moore algorithm. */function naiveRightToLeft(text: string, pattern: string): number[] { const n = text.length; const m = pattern.length; const occurrences: number[] = []; for (let i = 0; i <= n - m; i++) { let j = m - 1; // Start from the last character // Compare from right to left while (j >= 0 && text[i + j] === pattern[j]) { j--; } // If j went below 0, all characters matched if (j < 0) { occurrences.push(i); } } return occurrences;} // This direction doesn't change worst-case complexity,// but it's the foundation for the Boyer-Moore algorithm,// which uses the mismatch position to make smarter skips.Right-to-left comparison is the foundation of the Boyer-Moore algorithm. When we find a mismatch at position j, Boyer-Moore uses information about which character caused the mismatch to skip multiple positions at once. This is harder to exploit with left-to-right comparison.
We've now established a complete understanding of the brute force pattern matching algorithm. Let's consolidate the key points:
What's next:
In the next page, we'll visualize how the pattern "slides" over the text during the algorithm's execution. This sliding window perspective is crucial for understanding both the algorithm's behavior and the optimization opportunities that exist.
You now deeply understand the mechanics of brute force pattern matching. You can implement it, trace through examples, and explain why it's correct. Next, we'll visualize the sliding window behavior that characterizes this algorithm.