Naive Pattern Matching - Learning Module

Loading content...

0/276

Sliding the Pattern Over the Text — Visualization and Mechanics

The Sliding Window Perspective

The brute force pattern matching algorithm can be understood through many lenses, but perhaps the most intuitive is the sliding window perspective. Imagine you have a transparent sheet with the pattern printed on it. You place this sheet over the text and check if the characters align. If they don't match perfectly, you slide the sheet one position to the right and check again.

This mental model isn't just a pedagogical convenience—it's the actual computational behavior of the algorithm. The "window" is the section of text currently being compared to the pattern, and it slides from position 0 to position n - m, checking for matches at every stop.

In this page, we'll dive deep into the mechanics of this sliding behavior, visualize it step by step, and understand the implications for both performance and correctness.

What You Will Learn

By the end of this page, you will have a vivid mental model of how the naive algorithm explores the text. You'll understand exactly which characters are compared, why we shift by exactly one position, and what patterns in the text cause best-case versus worst-case behavior.

The Window Concept

Let's formalize what we mean by a "window" in the context of pattern matching.

Definition: Alignment Window

At position i, the alignment window is the substring T[i..i+m-1]—the m consecutive characters of the text starting at position i. The naive algorithm compares this window against the pattern P[0..m-1].

Properties of the window:

Size: Always exactly m characters (the length of the pattern)
Starting positions: 0, 1, 2, ..., up to n - m (inclusive)
Total windows: Exactly n - m + 1 windows to check
Overlap: Consecutive windows share m - 1 characters

Example: Text = "ALGORITHM" (n=9), Pattern = "GOR" (m=3)

Window #	Position i	Window Contents	Characters
1	0	T[0..2]	`ALG`
2	1	T[1..3]	`LGO`
3	2	T[2..4]	`GOR`
4	3	T[3..5]	`ORI`
5	4	T[4..6]	`RIT`
6	5	T[5..7]	`ITH`
7	6	T[6..8]	`THM`

The pattern "GOR" matches window #3 at position 2.

Window Overlap Is Information

Consecutive windows overlap by m-1 characters. In the example above, windows 1 and 2 share 'LG', windows 2 and 3 share 'GO', etc. This overlap represents information that the naive algorithm ignores but optimized algorithms exploit.

Visualizing the Slide

Let's trace through a complete example with a visual representation of the window sliding across the text.

Text: T = "ABCABCD"
Pattern: P = "BCD"

We'll show the text, the current window position (highlighted), and what comparisons occur at each step.

Step 0: Window at position 0

Text:    A B C A B C D
         ─────
Window:  [A B C]
Pattern:  B C D

Compare: A vs B → MISMATCH at j=0
Result:  No match. Slide right.

Step 1: Window at position 1

Text:    A B C A B C D
           ─────
Window:    [B C A]
Pattern:    B C D

Compare: B=B ✓, C=C ✓, A vs D → MISMATCH at j=2
Result:  Near match (2/3 chars). Slide right.

Step 2: Window at position 2

Text:    A B C A B C D
             ─────
Window:      [C A B]
Pattern:      B C D

Compare: C vs B → MISMATCH at j=0
Result:  No match. Slide right.

Step 3: Window at position 3

Text:    A B C A B C D
               ─────
Window:        [A B C]
Pattern:        B C D

Compare: A vs B → MISMATCH at j=0
Result:  No match. Slide right.

Step 4: Window at position 4

Text:    A B C A B C D
                 ─────
Window:          [B C D]
Pattern:          B C D

Compare: B=B ✓, C=C ✓, D=D ✓ → MATCH!
Result:  Pattern found at position 4!

Summary of this trace:

Position	Comparisons Made	Result
0	1	Mismatch at first character
1	3	Near-match, failed at third character
2	1	Mismatch at first character
3	1	Mismatch at first character
4	3	Complete match

Total comparisons: 1 + 3 + 1 + 1 + 3 = 9 comparisons

Notice how the work varies dramatically by position. Position 1 required 3 comparisons to discover a mismatch that position 0 found with just 1 comparison. This variability is key to understanding the algorithm's performance characteristics.

Why Shift by Exactly One?

A natural question arises: Why does the naive algorithm shift the window by exactly one position after each comparison? Why not shift by more?

The answer is correctness through exhaustiveness. Shifting by one guarantees that no potential match position is skipped.

Proof by counterexample:

Suppose we tried to be "clever" and shift by 2 positions instead of 1:

Text: "ABAAB" Pattern: "AAB"

With a shift of 2:

Check position 0: ABA vs AAB → mismatch at j=1
Skip position 1, check position 2: AAB vs AAB → match!

We found the match at position 2. But what if the match was at position 1?

Text: "AAABB" Pattern: "AAB"

With a shift of 2:

Check position 0: AAA vs AAB → mismatch at j=2
Skip position 1, check position 2: ABB vs AAB → mismatch at j=1

We missed the match at position 1 ("AAB" at T[1..3])!

Conclusion: Shifting by more than 1 can miss occurrences. The naive algorithm's shift of exactly 1 ensures completeness.

The KMP Insight

The Knuth-Morris-Pratt algorithm's genius is proving that certain shifts are safe—guaranteed not to skip any matches—based on analyzing the pattern's internal structure. This allows larger shifts in specific situations without sacrificing correctness.

The conservative approach:

The naive algorithm takes the maximally conservative approach: assume nothing, skip nothing. This guarantees correctness but at the cost of efficiency.

Mathematical formulation:

Let match(i) be true if the pattern occurs at position i. The naive algorithm computes:

for each i from 0 to n-m:
    if match(i): record i

This is a brute force search over the space of possible positions. Every element of the search space is examined. No heuristics, no pruning, no shortcuts.

The Character-by-Character Comparison

At each window position, the naive algorithm performs a character-by-character comparison between the pattern and the text window. Let's analyze this comparison process in detail.

The comparison sequence at position i:

Compare T[i+0] with P[0]
If equal, compare T[i+1] with P[1]
If equal, compare T[i+2] with P[2]
...
Continue until a mismatch or until all m characters are compared

Terminology:

Match depth or match length at position i: The number of characters that matched before the first mismatch (or m if all matched)
Mismatch position or failure point: The index j where T[i+j] ≠ P[j]

Example analysis:

Text = "ABABABC", Pattern = "ABABC"

Position i	Comparisons	Match depth	Mismatch character
0	A=A, B=B, A=A, B=B, A≠C	4	C (P[4])
1	B≠A	0	A (P[0])
2	A=A, B=B, A=A, B=B, C=C	5 (complete)	None (match!)

At position 0, we matched 4 characters before discovering the pattern doesn't fit. Those 4 successful comparisons represent work that didn't lead to a match.

The Wasted Work Problem

At position 0, we verified that T[0..3] = 'ABAB' = P[0..3]. When we move to position 2, we compare T[2..3] = 'AB' with P[0..1] = 'AB'. But we already know T[2..3] = 'AB' from the previous comparison! We're re-verifying information we already have. This redundancy is the core inefficiency of the naive algorithm.

Overlapping Windows and Redundant Comparisons

The overlap between consecutive windows is the source of the naive algorithm's inefficiency. Let's quantify this precisely.

Overlap quantity:

Window at position i covers T[i..i+m-1]. Window at position i+1 covers T[i+1..i+m].

The overlap is T[i+1..i+m-1], which has m - 1 characters.

Percentage overlap:

For a pattern of length m, consecutive windows share (m-1)/m of their characters:

m = 3: 66.7% overlap
m = 10: 90% overlap
m = 100: 99% overlap
m = 1000: 99.9% overlap

As patterns get longer, the overlap percentage approaches 100%.

Redundant comparison example:

Consider matching pattern "ABCD" starting at position 0, where we get a match depth of 3 (ABC matches, D doesn't).

What we learn from position 0:

T[0] = A
T[1] = B
T[2] = C
T[3] ≠ D (but we don't know what T[3] actually is, just that it's not D)

What we need to check at position 1:

Compare T[1] with P[0] = A: Is T[1] = A?
If so, compare T[2] with P[1] = B: Is T[2] = B?
...

The redundancy:

At position 1, we compare T[1] with A. But we already know T[1] = B from position 0! Since B ≠ A, position 1 will immediately fail—and we could have known this without any comparison.

Similarly, if both T[1] = A and T[2] = B (which we might know from position 0's partial match), we could skip directly to comparing new characters.

Information Waste Analysis
Scenario	Info from Position i	Reused at Position i+1	Info Reused
Match depth 0	T[i] ≠ P[0]	Nothing useful	0%
Match depth 1	T[i] = P[0]	Know T[i+1]... wait, no	0%
Match depth k	T[i..i+k-1] = P[0..k-1]	Could know T[i+1..i+k-1]	Depends on pattern
Complete match (k=m)	Window is pattern	Overlap definitely known	100% of overlap

The Key Optimization Opportunity

The pattern P[0..k-1] that we matched... if part of it matches a suffix of itself, then we can deduce things about T[i+1..] without re-comparing. This self-similarity in the pattern is exactly what the KMP failure function captures.

Visualization of Text Coverage

Let's visualize which characters of the text are accessed during a complete run of the naive algorithm.

Example: Text = "ABCDE" (n=5), Pattern = "CD" (m=2)

Windows checked: positions 0, 1, 2, 3 (four windows)

Position 0: compare T[0] and T[1] with P
             A B C D E
             ^ ^

Position 1: compare T[1] and T[2] with P
             A B C D E
               ^ ^

Position 2: compare T[2] and T[3] with P (MATCH!)
             A B C D E
                 ^ ^

Position 3: compare T[3] and T[4] with P
             A B C D E
                   ^ ^

Character access counts:

Character	Position	Times Accessed
A	0	1
B	1	2
C	2	2
D	3	2
E	4	1

Middle characters are accessed more often than edge characters because they appear in more windows.

General pattern:

For a text of length n and pattern of length m:

Characters at positions 0 to m-2: accessed 1 to m times (ramping up)
Characters at positions m-1 to n-m: accessed m times each (steady state)
Characters at positions n-m+1 to n-1: accessed m down to 1 times (ramping down)

Maximum Character Accesses

In the worst case, each character in the "middle" of the text is accessed m times (once per window it appears in). This multi-access pattern is the root cause of the O(n×m) worst-case complexity.

The Alignment Grid Perspective

Another powerful way to visualize the naive algorithm is through an alignment grid showing all possible (position, pattern index) pairs.

Example: Text = "AABA", Pattern = "ABA"

The alignment grid is a 2D matrix where:

Rows represent text positions (i = 0, 1)
Columns represent pattern positions (j = 0, 1, 2)
Cell (i, j) represents comparing T[i+j] with P[j]

              Pattern index j
              0   1   2
          ┌───┬───┬───┐
Pos i=0   │T[0]│T[1]│T[2]│   A  A  B
          │ A │ A │ B │   vs vs vs
          │=A?│=B?│=A?│   A  B  A
          ├───┼───┼───┤
Pos i=1   │T[1]│T[2]│T[3]│   A  B  A
          │ A │ B │ A │   vs vs vs
          │=A?│=B?│=A?│   A  B  A
          └───┴───┴───┘

Naive algorithm's traversal:

Row 0: Compare (0,0): A=A ✓ → (0,1): A≠B ✗ → Stop row 0 Row 1: Compare (1,0): A=A ✓ → (1,1): B=B ✓ → (1,2): A=A ✓ → Match at position 1!

The grid reveals:

We don't necessarily visit all cells (early termination on mismatch)
Each row is processed independently left-to-right
A complete row traversal (all m cells) means a match
Partial row traversal means a mismatch somewhere

Grid View for Optimization

The alignment grid shows that cells in the same column compare the same pattern character (P[j]) with different text characters (T[i+j] for various i). Cells in the same diagonal share the same text character. Optimized algorithms exploit these relationships to avoid redundant cell visits.

Detailed Sliding Implementation

Let's implement a version of the naive algorithm that explicitly tracks and reports the sliding behavior for educational purposes:

verbose_naive_matching
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
/**
 * Verbose Naive Pattern Matching with detailed sliding visualization.
 * This version prints the state at each window position for educational purposes.
 */
function verboseNaiveMatch(text: string, pattern: string): number[] {
    const n = text.length;
    const m = pattern.length;
    const occurrences: number[] = [];
    let totalComparisons = 0;
    
    console.log(`Text:    "${text}" (length ${n})`);
    console.log(`Pattern: "${pattern}" (length ${m})`);
    console.log(`Windows to check: ${n - m + 1} (positions 0 to ${n - m})`);
    console.log("\n" + "═".repeat(60) + "\n");
    
    for (let i = 0; i <= n - m; i++) {
        // Visual: show the sliding window
        const before = text.substring(0, i);
        const window = text.substring(i, i + m);
        const after = text.substring(i + m);
        
        console.log(`Position ${i}:`);
        console.log(`  Text:    ${before}[${window}]${after}`);
        console.log(`  Window:  "${window}"`);
        console.log(`  Pattern: "${pattern}"`);
        
        // Character-by-character comparison
        let j = 0;
        const comparisonLog: string[] = [];
        
        while (j < m && text[i + j] === pattern[j]) {
            comparisonLog.push(`${text[i + j]}=${pattern[j]} ✓`);
            totalComparisons++;
            j++;
        }
        
        if (j < m) {
            // Mismatch occurred
            comparisonLog.push(`${text[i + j]}≠${pattern[j]} ✗`);
            totalComparisons++;
            console.log(`  Comparisons: ${comparisonLog.join(", ")}`);
            console.log(`  Result: MISMATCH at pattern index ${j}`);
        } else {
            // Full match
            console.log(`  Comparisons: ${comparisonLog.join(", ")}`);
            console.log(`  Result: ★ MATCH FOUND at position ${i} ★`);
            occurrences.push(i);
        }
        console.log();
    }
    
    console.log("═".repeat(60));
    console.log(`Summary:`);
    console.log(`  Total windows checked: ${n - m + 1}`);
    console.log(`  Total character comparisons: ${totalComparisons}`);
    console.log(`  Matches found: ${occurrences.length}`);
    console.log(`  Match positions: [${occurrences.join(", ")}]`);
    
    return occurrences;
}
 
// Example usage
console.log("\n🔍 Example 1: Multiple Matches\n");
verboseNaiveMatch("AABAACAADAABAABA", "AABA");
 
console.log("\n🔍 Example 2: Worst Case Pattern\n");
verboseNaiveMatch("AAAAAB", "AAB");

Building the Mental Model

The definitive mental model for naive pattern matching:

Imagine the following setup:

The Text Track: A long horizontal track with characters written on tiles, numbered 0, 1, 2, ..., n-1.
The Pattern Slider: A transparent ruler with m slots, each showing a pattern character. The slider can move along the track.
The Comparison Light: Each slot on the slider has a light. When the slider is positioned, each light checks if its pattern character matches the tile beneath it.
The Match Bell: If all m lights are green (all match), a bell rings, signaling a match at this position.

The algorithm's execution:

Place the slider at position 0 (leftmost valid position)
Turn on lights left-to-right until a red light (mismatch) or all green
If all green: ring the bell and record the position
Slide one position right
Repeat until the slider reaches position n-m

Why this model is powerful:

It makes the "sliding" aspect visual and intuitive
It clearly shows that each position is treated independently
It reveals that lights are checked left-to-right, stopping at first mismatch
It emphasizes that sliding by 1 is the only safe move to guarantee completeness

Use This Model for Debugging

When debugging pattern matching code, simulate this mental model by hand. Ask: 'What position is my slider at? Which lights have I checked? Did I slide correctly after a mismatch?' This catches off-by-one errors and incorrect shift logic.

Summary and Key Takeaways

We've now developed a deep understanding of how the naive algorithm slides the pattern over the text. Let's consolidate:

Key Takeaways

•The window is a fixed-size view — Always exactly m characters wide, checking one alignment at a time.
•Consecutive windows overlap by m-1 characters — This overlap is wasted information in the naive approach.
•Shifting by exactly 1 guarantees completeness — We never miss a potential match position.
•Each window is processed left-to-right — Early termination on first mismatch saves some work.
•Middle characters are accessed most often — Characters in the text's middle appear in up to m windows each.
•The alignment grid reveals redundancy — The same text characters are compared multiple times across different windows.

What's next:

Now that we understand how the algorithm works, we need to analyze its efficiency. The next page dives deep into the time complexity analysis, proving the O(n × m) worst-case bound and examining when this worst case actually occurs.

Page Complete

You now have a vivid mental model of the sliding window behavior. You understand exactly how the pattern moves across the text, why we shift by one, and where redundant work occurs. Next, we'll put numbers to this inefficiency.

Sliding the Pattern Over the Text — Visualization and Mechanics

The Sliding Window Perspective

In this page, we'll dive deep into the mechanics of this sliding behavior, visualize it step by step, and understand the implications for both performance and correctness.

What You Will Learn

The Window Concept

Let's formalize what we mean by a "window" in the context of pattern matching.

Definition: Alignment Window

Properties of the window:

Size: Always exactly m characters (the length of the pattern)
Starting positions: 0, 1, 2, ..., up to n - m (inclusive)
Total windows: Exactly n - m + 1 windows to check
Overlap: Consecutive windows share m - 1 characters

Example: Text = "ALGORITHM" (n=9), Pattern = "GOR" (m=3)

Window #	Position i	Window Contents	Characters
1	0	T[0..2]	`ALG`
2	1	T[1..3]	`LGO`
3	2	T[2..4]	`GOR`
4	3	T[3..5]	`ORI`
5	4	T[4..6]	`RIT`
6	5	T[5..7]	`ITH`
7	6	T[6..8]	`THM`

The pattern "GOR" matches window #3 at position 2.

Window Overlap Is Information

Visualizing the Slide

Let's trace through a complete example with a visual representation of the window sliding across the text.

Text: T = "ABCABCD"
Pattern: P = "BCD"

We'll show the text, the current window position (highlighted), and what comparisons occur at each step.

Step 0: Window at position 0

Text:    A B C A B C D
         ─────
Window:  [A B C]
Pattern:  B C D

Compare: A vs B → MISMATCH at j=0
Result:  No match. Slide right.

Step 1: Window at position 1

Text:    A B C A B C D
           ─────
Window:    [B C A]
Pattern:    B C D

Compare: B=B ✓, C=C ✓, A vs D → MISMATCH at j=2
Result:  Near match (2/3 chars). Slide right.

Step 2: Window at position 2

Text:    A B C A B C D
             ─────
Window:      [C A B]
Pattern:      B C D

Compare: C vs B → MISMATCH at j=0
Result:  No match. Slide right.

Step 3: Window at position 3

Text:    A B C A B C D
               ─────
Window:        [A B C]
Pattern:        B C D

Compare: A vs B → MISMATCH at j=0
Result:  No match. Slide right.

Step 4: Window at position 4

Text:    A B C A B C D
                 ─────
Window:          [B C D]
Pattern:          B C D

Compare: B=B ✓, C=C ✓, D=D ✓ → MATCH!
Result:  Pattern found at position 4!

Summary of this trace:

Position	Comparisons Made	Result
0	1	Mismatch at first character
1	3	Near-match, failed at third character
2	1	Mismatch at first character
3	1	Mismatch at first character
4	3	Complete match

Total comparisons: 1 + 3 + 1 + 1 + 3 = 9 comparisons

Why Shift by Exactly One?

A natural question arises: Why does the naive algorithm shift the window by exactly one position after each comparison? Why not shift by more?

The answer is correctness through exhaustiveness. Shifting by one guarantees that no potential match position is skipped.

Proof by counterexample:

Suppose we tried to be "clever" and shift by 2 positions instead of 1:

Text: "ABAAB" Pattern: "AAB"

With a shift of 2:

Check position 0: ABA vs AAB → mismatch at j=1
Skip position 1, check position 2: AAB vs AAB → match!

We found the match at position 2. But what if the match was at position 1?

Text: "AAABB" Pattern: "AAB"

With a shift of 2:

Check position 0: AAA vs AAB → mismatch at j=2
Skip position 1, check position 2: ABB vs AAB → mismatch at j=1

We missed the match at position 1 ("AAB" at T[1..3])!

Conclusion: Shifting by more than 1 can miss occurrences. The naive algorithm's shift of exactly 1 ensures completeness.

The KMP Insight

The conservative approach:

The naive algorithm takes the maximally conservative approach: assume nothing, skip nothing. This guarantees correctness but at the cost of efficiency.

Mathematical formulation:

Let match(i) be true if the pattern occurs at position i. The naive algorithm computes:

for each i from 0 to n-m:
    if match(i): record i

This is a brute force search over the space of possible positions. Every element of the search space is examined. No heuristics, no pruning, no shortcuts.

The Character-by-Character Comparison

At each window position, the naive algorithm performs a character-by-character comparison between the pattern and the text window. Let's analyze this comparison process in detail.

The comparison sequence at position i:

Compare T[i+0] with P[0]
If equal, compare T[i+1] with P[1]
If equal, compare T[i+2] with P[2]
...
Continue until a mismatch or until all m characters are compared

Terminology:

Match depth or match length at position i: The number of characters that matched before the first mismatch (or m if all matched)
Mismatch position or failure point: The index j where T[i+j] ≠ P[j]

Example analysis:

Text = "ABABABC", Pattern = "ABABC"

Position i	Comparisons	Match depth	Mismatch character
0	A=A, B=B, A=A, B=B, A≠C	4	C (P[4])
1	B≠A	0	A (P[0])
2	A=A, B=B, A=A, B=B, C=C	5 (complete)	None (match!)

At position 0, we matched 4 characters before discovering the pattern doesn't fit. Those 4 successful comparisons represent work that didn't lead to a match.

The Wasted Work Problem

Overlapping Windows and Redundant Comparisons

The overlap between consecutive windows is the source of the naive algorithm's inefficiency. Let's quantify this precisely.

Overlap quantity:

Window at position i covers T[i..i+m-1]. Window at position i+1 covers T[i+1..i+m].

The overlap is T[i+1..i+m-1], which has m - 1 characters.

Percentage overlap:

For a pattern of length m, consecutive windows share (m-1)/m of their characters:

m = 3: 66.7% overlap
m = 10: 90% overlap
m = 100: 99% overlap
m = 1000: 99.9% overlap

As patterns get longer, the overlap percentage approaches 100%.

Redundant comparison example:

Consider matching pattern "ABCD" starting at position 0, where we get a match depth of 3 (ABC matches, D doesn't).

What we learn from position 0:

T[0] = A
T[1] = B
T[2] = C
T[3] ≠ D (but we don't know what T[3] actually is, just that it's not D)

What we need to check at position 1:

Compare T[1] with P[0] = A: Is T[1] = A?
If so, compare T[2] with P[1] = B: Is T[2] = B?
...

The redundancy:

At position 1, we compare T[1] with A. But we already know T[1] = B from position 0! Since B ≠ A, position 1 will immediately fail—and we could have known this without any comparison.

Similarly, if both T[1] = A and T[2] = B (which we might know from position 0's partial match), we could skip directly to comparing new characters.

Information Waste Analysis
Scenario	Info from Position i	Reused at Position i+1	Info Reused
Match depth 0	T[i] ≠ P[0]	Nothing useful	0%
Match depth 1	T[i] = P[0]	Know T[i+1]... wait, no	0%
Match depth k	T[i..i+k-1] = P[0..k-1]	Could know T[i+1..i+k-1]	Depends on pattern
Complete match (k=m)	Window is pattern	Overlap definitely known	100% of overlap

The Key Optimization Opportunity

Visualization of Text Coverage

Let's visualize which characters of the text are accessed during a complete run of the naive algorithm.

Example: Text = "ABCDE" (n=5), Pattern = "CD" (m=2)

Windows checked: positions 0, 1, 2, 3 (four windows)

Position 0: compare T[0] and T[1] with P
             A B C D E
             ^ ^

Position 1: compare T[1] and T[2] with P
             A B C D E
               ^ ^

Position 2: compare T[2] and T[3] with P (MATCH!)
             A B C D E
                 ^ ^

Position 3: compare T[3] and T[4] with P
             A B C D E
                   ^ ^

Character access counts:

Character	Position	Times Accessed
A	0	1
B	1	2
C	2	2
D	3	2
E	4	1

Middle characters are accessed more often than edge characters because they appear in more windows.

General pattern:

For a text of length n and pattern of length m:

Characters at positions 0 to m-2: accessed 1 to m times (ramping up)
Characters at positions m-1 to n-m: accessed m times each (steady state)
Characters at positions n-m+1 to n-1: accessed m down to 1 times (ramping down)

Maximum Character Accesses

In the worst case, each character in the "middle" of the text is accessed m times (once per window it appears in). This multi-access pattern is the root cause of the O(n×m) worst-case complexity.

The Alignment Grid Perspective

Another powerful way to visualize the naive algorithm is through an alignment grid showing all possible (position, pattern index) pairs.

Example: Text = "AABA", Pattern = "ABA"

The alignment grid is a 2D matrix where:

Rows represent text positions (i = 0, 1)
Columns represent pattern positions (j = 0, 1, 2)
Cell (i, j) represents comparing T[i+j] with P[j]

              Pattern index j
              0   1   2
          ┌───┬───┬───┐
Pos i=0   │T[0]│T[1]│T[2]│   A  A  B
          │ A │ A │ B │   vs vs vs
          │=A?│=B?│=A?│   A  B  A
          ├───┼───┼───┤
Pos i=1   │T[1]│T[2]│T[3]│   A  B  A
          │ A │ B │ A │   vs vs vs
          │=A?│=B?│=A?│   A  B  A
          └───┴───┴───┘

Naive algorithm's traversal:

Row 0: Compare (0,0): A=A ✓ → (0,1): A≠B ✗ → Stop row 0 Row 1: Compare (1,0): A=A ✓ → (1,1): B=B ✓ → (1,2): A=A ✓ → Match at position 1!

The grid reveals:

We don't necessarily visit all cells (early termination on mismatch)
Each row is processed independently left-to-right
A complete row traversal (all m cells) means a match
Partial row traversal means a mismatch somewhere

Grid View for Optimization

Detailed Sliding Implementation

Let's implement a version of the naive algorithm that explicitly tracks and reports the sliding behavior for educational purposes:

verbose_naive_matching
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
/**
 * Verbose Naive Pattern Matching with detailed sliding visualization.
 * This version prints the state at each window position for educational purposes.
 */
function verboseNaiveMatch(text: string, pattern: string): number[] {
    const n = text.length;
    const m = pattern.length;
    const occurrences: number[] = [];
    let totalComparisons = 0;
    
    console.log(`Text:    "${text}" (length ${n})`);
    console.log(`Pattern: "${pattern}" (length ${m})`);
    console.log(`Windows to check: ${n - m + 1} (positions 0 to ${n - m})`);
    console.log("\n" + "═".repeat(60) + "\n");
    
    for (let i = 0; i <= n - m; i++) {
        // Visual: show the sliding window
        const before = text.substring(0, i);
        const window = text.substring(i, i + m);
        const after = text.substring(i + m);
        
        console.log(`Position ${i}:`);
        console.log(`  Text:    ${before}[${window}]${after}`);
        console.log(`  Window:  "${window}"`);
        console.log(`  Pattern: "${pattern}"`);
        
        // Character-by-character comparison
        let j = 0;
        const comparisonLog: string[] = [];
        
        while (j < m && text[i + j] === pattern[j]) {
            comparisonLog.push(`${text[i + j]}=${pattern[j]} ✓`);
            totalComparisons++;
            j++;
        }
        
        if (j < m) {
            // Mismatch occurred
            comparisonLog.push(`${text[i + j]}≠${pattern[j]} ✗`);
            totalComparisons++;
            console.log(`  Comparisons: ${comparisonLog.join(", ")}`);
            console.log(`  Result: MISMATCH at pattern index ${j}`);
        } else {
            // Full match
            console.log(`  Comparisons: ${comparisonLog.join(", ")}`);
            console.log(`  Result: ★ MATCH FOUND at position ${i} ★`);
            occurrences.push(i);
        }
        console.log();
    }
    
    console.log("═".repeat(60));
    console.log(`Summary:`);
    console.log(`  Total windows checked: ${n - m + 1}`);
    console.log(`  Total character comparisons: ${totalComparisons}`);
    console.log(`  Matches found: ${occurrences.length}`);
    console.log(`  Match positions: [${occurrences.join(", ")}]`);
    
    return occurrences;
}
 
// Example usage
console.log("\n🔍 Example 1: Multiple Matches\n");
verboseNaiveMatch("AABAACAADAABAABA", "AABA");
 
console.log("\n🔍 Example 2: Worst Case Pattern\n");
verboseNaiveMatch("AAAAAB", "AAB");

Building the Mental Model

The definitive mental model for naive pattern matching:

Imagine the following setup:

The Text Track: A long horizontal track with characters written on tiles, numbered 0, 1, 2, ..., n-1.
The Pattern Slider: A transparent ruler with m slots, each showing a pattern character. The slider can move along the track.
The Comparison Light: Each slot on the slider has a light. When the slider is positioned, each light checks if its pattern character matches the tile beneath it.
The Match Bell: If all m lights are green (all match), a bell rings, signaling a match at this position.

The algorithm's execution:

Place the slider at position 0 (leftmost valid position)
Turn on lights left-to-right until a red light (mismatch) or all green
If all green: ring the bell and record the position
Slide one position right
Repeat until the slider reaches position n-m

Why this model is powerful:

It makes the "sliding" aspect visual and intuitive
It clearly shows that each position is treated independently
It reveals that lights are checked left-to-right, stopping at first mismatch
It emphasizes that sliding by 1 is the only safe move to guarantee completeness

Use This Model for Debugging

Summary and Key Takeaways

We've now developed a deep understanding of how the naive algorithm slides the pattern over the text. Let's consolidate:

Key Takeaways

•The window is a fixed-size view — Always exactly m characters wide, checking one alignment at a time.
•Consecutive windows overlap by m-1 characters — This overlap is wasted information in the naive approach.
•Shifting by exactly 1 guarantees completeness — We never miss a potential match position.
•Each window is processed left-to-right — Early termination on first mismatch saves some work.
•Middle characters are accessed most often — Characters in the text's middle appear in up to m windows each.
•The alignment grid reveals redundancy — The same text characters are compared multiple times across different windows.

What's next:

Page Complete