Loading content...
The brute force pattern matching algorithm can be understood through many lenses, but perhaps the most intuitive is the sliding window perspective. Imagine you have a transparent sheet with the pattern printed on it. You place this sheet over the text and check if the characters align. If they don't match perfectly, you slide the sheet one position to the right and check again.
This mental model isn't just a pedagogical convenience—it's the actual computational behavior of the algorithm. The "window" is the section of text currently being compared to the pattern, and it slides from position 0 to position n - m, checking for matches at every stop.
In this page, we'll dive deep into the mechanics of this sliding behavior, visualize it step by step, and understand the implications for both performance and correctness.
By the end of this page, you will have a vivid mental model of how the naive algorithm explores the text. You'll understand exactly which characters are compared, why we shift by exactly one position, and what patterns in the text cause best-case versus worst-case behavior.
Let's formalize what we mean by a "window" in the context of pattern matching.
Definition: Alignment Window
At position i, the alignment window is the substring T[i..i+m-1]—the m consecutive characters of the text starting at position i. The naive algorithm compares this window against the pattern P[0..m-1].
Properties of the window:
m characters (the length of the pattern)n - m (inclusive)n - m + 1 windows to checkm - 1 charactersExample: Text = "ALGORITHM" (n=9), Pattern = "GOR" (m=3)
| Window # | Position i | Window Contents | Characters |
|---|---|---|---|
| 1 | 0 | T[0..2] | ALG |
| 2 | 1 | T[1..3] | LGO |
| 3 | 2 | T[2..4] | GOR |
| 4 | 3 | T[3..5] | ORI |
| 5 | 4 | T[4..6] | RIT |
| 6 | 5 | T[5..7] | ITH |
| 7 | 6 | T[6..8] | THM |
The pattern "GOR" matches window #3 at position 2.
Consecutive windows overlap by m-1 characters. In the example above, windows 1 and 2 share 'LG', windows 2 and 3 share 'GO', etc. This overlap represents information that the naive algorithm ignores but optimized algorithms exploit.
Let's trace through a complete example with a visual representation of the window sliding across the text.
Text: T = "ABCABCD"
Pattern: P = "BCD"
We'll show the text, the current window position (highlighted), and what comparisons occur at each step.
Step 0: Window at position 0
Text: A B C A B C D
─────
Window: [A B C]
Pattern: B C D
Compare: A vs B → MISMATCH at j=0
Result: No match. Slide right.
Step 1: Window at position 1
Text: A B C A B C D
─────
Window: [B C A]
Pattern: B C D
Compare: B=B ✓, C=C ✓, A vs D → MISMATCH at j=2
Result: Near match (2/3 chars). Slide right.
Step 2: Window at position 2
Text: A B C A B C D
─────
Window: [C A B]
Pattern: B C D
Compare: C vs B → MISMATCH at j=0
Result: No match. Slide right.
Step 3: Window at position 3
Text: A B C A B C D
─────
Window: [A B C]
Pattern: B C D
Compare: A vs B → MISMATCH at j=0
Result: No match. Slide right.
Step 4: Window at position 4
Text: A B C A B C D
─────
Window: [B C D]
Pattern: B C D
Compare: B=B ✓, C=C ✓, D=D ✓ → MATCH!
Result: Pattern found at position 4!
Summary of this trace:
| Position | Comparisons Made | Result |
|---|---|---|
| 0 | 1 | Mismatch at first character |
| 1 | 3 | Near-match, failed at third character |
| 2 | 1 | Mismatch at first character |
| 3 | 1 | Mismatch at first character |
| 4 | 3 | Complete match |
Total comparisons: 1 + 3 + 1 + 1 + 3 = 9 comparisons
Notice how the work varies dramatically by position. Position 1 required 3 comparisons to discover a mismatch that position 0 found with just 1 comparison. This variability is key to understanding the algorithm's performance characteristics.
A natural question arises: Why does the naive algorithm shift the window by exactly one position after each comparison? Why not shift by more?
The answer is correctness through exhaustiveness. Shifting by one guarantees that no potential match position is skipped.
Proof by counterexample:
Suppose we tried to be "clever" and shift by 2 positions instead of 1:
Text: "ABAAB"
Pattern: "AAB"
With a shift of 2:
ABA vs AAB → mismatch at j=1AAB vs AAB → match!We found the match at position 2. But what if the match was at position 1?
Text: "AAABB"
Pattern: "AAB"
With a shift of 2:
AAA vs AAB → mismatch at j=2ABB vs AAB → mismatch at j=1We missed the match at position 1 ("AAB" at T[1..3])!
Conclusion: Shifting by more than 1 can miss occurrences. The naive algorithm's shift of exactly 1 ensures completeness.
The Knuth-Morris-Pratt algorithm's genius is proving that certain shifts are safe—guaranteed not to skip any matches—based on analyzing the pattern's internal structure. This allows larger shifts in specific situations without sacrificing correctness.
The conservative approach:
The naive algorithm takes the maximally conservative approach: assume nothing, skip nothing. This guarantees correctness but at the cost of efficiency.
Mathematical formulation:
Let match(i) be true if the pattern occurs at position i. The naive algorithm computes:
for each i from 0 to n-m:
if match(i): record i
This is a brute force search over the space of possible positions. Every element of the search space is examined. No heuristics, no pruning, no shortcuts.
At each window position, the naive algorithm performs a character-by-character comparison between the pattern and the text window. Let's analyze this comparison process in detail.
The comparison sequence at position i:
T[i+0] with P[0]T[i+1] with P[1]T[i+2] with P[2]m characters are comparedTerminology:
i: The number of characters that matched before the first mismatch (or m if all matched)j where T[i+j] ≠ P[j]Example analysis:
Text = "ABABABC", Pattern = "ABABC"
| Position i | Comparisons | Match depth | Mismatch character |
|---|---|---|---|
| 0 | A=A, B=B, A=A, B=B, A≠C | 4 | C (P[4]) |
| 1 | B≠A | 0 | A (P[0]) |
| 2 | A=A, B=B, A=A, B=B, C=C | 5 (complete) | None (match!) |
At position 0, we matched 4 characters before discovering the pattern doesn't fit. Those 4 successful comparisons represent work that didn't lead to a match.
At position 0, we verified that T[0..3] = 'ABAB' = P[0..3]. When we move to position 2, we compare T[2..3] = 'AB' with P[0..1] = 'AB'. But we already know T[2..3] = 'AB' from the previous comparison! We're re-verifying information we already have. This redundancy is the core inefficiency of the naive algorithm.
The overlap between consecutive windows is the source of the naive algorithm's inefficiency. Let's quantify this precisely.
Overlap quantity:
Window at position i covers T[i..i+m-1].
Window at position i+1 covers T[i+1..i+m].
The overlap is T[i+1..i+m-1], which has m - 1 characters.
Percentage overlap:
For a pattern of length m, consecutive windows share (m-1)/m of their characters:
As patterns get longer, the overlap percentage approaches 100%.
Redundant comparison example:
Consider matching pattern "ABCD" starting at position 0, where we get a match depth of 3 (ABC matches, D doesn't).
What we learn from position 0:
T[0] = AT[1] = BT[2] = CT[3] ≠ D (but we don't know what T[3] actually is, just that it's not D)What we need to check at position 1:
T[1] with P[0] = A: Is T[1] = A?T[2] with P[1] = B: Is T[2] = B?The redundancy:
At position 1, we compare T[1] with A. But we already know T[1] = B from position 0! Since B ≠ A, position 1 will immediately fail—and we could have known this without any comparison.
Similarly, if both T[1] = A and T[2] = B (which we might know from position 0's partial match), we could skip directly to comparing new characters.
| Scenario | Info from Position i | Reused at Position i+1 | Info Reused |
|---|---|---|---|
| Match depth 0 | T[i] ≠ P[0] | Nothing useful | 0% |
| Match depth 1 | T[i] = P[0] | Know T[i+1]... wait, no | 0% |
| Match depth k | T[i..i+k-1] = P[0..k-1] | Could know T[i+1..i+k-1] | Depends on pattern |
| Complete match (k=m) | Window is pattern | Overlap definitely known | 100% of overlap |
The pattern P[0..k-1] that we matched... if part of it matches a suffix of itself, then we can deduce things about T[i+1..] without re-comparing. This self-similarity in the pattern is exactly what the KMP failure function captures.
Let's visualize which characters of the text are accessed during a complete run of the naive algorithm.
Example: Text = "ABCDE" (n=5), Pattern = "CD" (m=2)
Windows checked: positions 0, 1, 2, 3 (four windows)
Position 0: compare T[0] and T[1] with P
A B C D E
^ ^
Position 1: compare T[1] and T[2] with P
A B C D E
^ ^
Position 2: compare T[2] and T[3] with P (MATCH!)
A B C D E
^ ^
Position 3: compare T[3] and T[4] with P
A B C D E
^ ^
Character access counts:
| Character | Position | Times Accessed |
|---|---|---|
| A | 0 | 1 |
| B | 1 | 2 |
| C | 2 | 2 |
| D | 3 | 2 |
| E | 4 | 1 |
Middle characters are accessed more often than edge characters because they appear in more windows.
General pattern:
For a text of length n and pattern of length m:
In the worst case, each character in the "middle" of the text is accessed m times (once per window it appears in). This multi-access pattern is the root cause of the O(n×m) worst-case complexity.
Another powerful way to visualize the naive algorithm is through an alignment grid showing all possible (position, pattern index) pairs.
Example: Text = "AABA", Pattern = "ABA"
The alignment grid is a 2D matrix where:
Pattern index j
0 1 2
┌───┬───┬───┐
Pos i=0 │T[0]│T[1]│T[2]│ A A B
│ A │ A │ B │ vs vs vs
│=A?│=B?│=A?│ A B A
├───┼───┼───┤
Pos i=1 │T[1]│T[2]│T[3]│ A B A
│ A │ B │ A │ vs vs vs
│=A?│=B?│=A?│ A B A
└───┴───┴───┘
Naive algorithm's traversal:
Row 0: Compare (0,0): A=A ✓ → (0,1): A≠B ✗ → Stop row 0 Row 1: Compare (1,0): A=A ✓ → (1,1): B=B ✓ → (1,2): A=A ✓ → Match at position 1!
The grid reveals:
The alignment grid shows that cells in the same column compare the same pattern character (P[j]) with different text characters (T[i+j] for various i). Cells in the same diagonal share the same text character. Optimized algorithms exploit these relationships to avoid redundant cell visits.
Let's implement a version of the naive algorithm that explicitly tracks and reports the sliding behavior for educational purposes:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
/** * Verbose Naive Pattern Matching with detailed sliding visualization. * This version prints the state at each window position for educational purposes. */function verboseNaiveMatch(text: string, pattern: string): number[] { const n = text.length; const m = pattern.length; const occurrences: number[] = []; let totalComparisons = 0; console.log(`Text: "${text}" (length ${n})`); console.log(`Pattern: "${pattern}" (length ${m})`); console.log(`Windows to check: ${n - m + 1} (positions 0 to ${n - m})`); console.log("\n" + "═".repeat(60) + "\n"); for (let i = 0; i <= n - m; i++) { // Visual: show the sliding window const before = text.substring(0, i); const window = text.substring(i, i + m); const after = text.substring(i + m); console.log(`Position ${i}:`); console.log(` Text: ${before}[${window}]${after}`); console.log(` Window: "${window}"`); console.log(` Pattern: "${pattern}"`); // Character-by-character comparison let j = 0; const comparisonLog: string[] = []; while (j < m && text[i + j] === pattern[j]) { comparisonLog.push(`${text[i + j]}=${pattern[j]} ✓`); totalComparisons++; j++; } if (j < m) { // Mismatch occurred comparisonLog.push(`${text[i + j]}≠${pattern[j]} ✗`); totalComparisons++; console.log(` Comparisons: ${comparisonLog.join(", ")}`); console.log(` Result: MISMATCH at pattern index ${j}`); } else { // Full match console.log(` Comparisons: ${comparisonLog.join(", ")}`); console.log(` Result: ★ MATCH FOUND at position ${i} ★`); occurrences.push(i); } console.log(); } console.log("═".repeat(60)); console.log(`Summary:`); console.log(` Total windows checked: ${n - m + 1}`); console.log(` Total character comparisons: ${totalComparisons}`); console.log(` Matches found: ${occurrences.length}`); console.log(` Match positions: [${occurrences.join(", ")}]`); return occurrences;} // Example usageconsole.log("\n🔍 Example 1: Multiple Matches\n");verboseNaiveMatch("AABAACAADAABAABA", "AABA"); console.log("\n🔍 Example 2: Worst Case Pattern\n");verboseNaiveMatch("AAAAAB", "AAB");The definitive mental model for naive pattern matching:
Imagine the following setup:
The Text Track: A long horizontal track with characters written on tiles, numbered 0, 1, 2, ..., n-1.
The Pattern Slider: A transparent ruler with m slots, each showing a pattern character. The slider can move along the track.
The Comparison Light: Each slot on the slider has a light. When the slider is positioned, each light checks if its pattern character matches the tile beneath it.
The Match Bell: If all m lights are green (all match), a bell rings, signaling a match at this position.
The algorithm's execution:
Why this model is powerful:
When debugging pattern matching code, simulate this mental model by hand. Ask: 'What position is my slider at? Which lights have I checked? Did I slide correctly after a mismatch?' This catches off-by-one errors and incorrect shift logic.
We've now developed a deep understanding of how the naive algorithm slides the pattern over the text. Let's consolidate:
What's next:
Now that we understand how the algorithm works, we need to analyze its efficiency. The next page dives deep into the time complexity analysis, proving the O(n × m) worst-case bound and examining when this worst case actually occurs.
You now have a vivid mental model of the sliding window behavior. You understand exactly how the pattern moves across the text, why we shift by one, and where redundant work occurs. Next, we'll put numbers to this inefficiency.