Naive Pattern Matching - Learning Module

Loading content...

0/276

Brute Force Approach — The Foundation of Pattern Matching

The Most Natural Algorithm

Imagine you're searching for a specific word in a book. Without any index or table of contents, what would you do? You'd start at the beginning, check if the first few characters match your word, and if not, move forward slightly and try again. This intuitive process—checking every possible position systematically—is exactly what the brute force pattern matching algorithm does.

The brute force approach (also called the naive algorithm) is the simplest conceivable solution to the pattern matching problem. It requires no preprocessing, no auxiliary data structures, and no mathematical cleverness. It simply answers the question: "Does the pattern occur here? No? Then what about here? And here?"

Yet this simplicity is precisely what makes it so important. The brute force algorithm serves as the conceptual baseline against which all optimized pattern matching algorithms are measured. You cannot truly appreciate why KMP achieves linear time or why Rabin-Karp uses hashing without first understanding what naive matching does—and what it does wrong.

What You Will Learn

By the end of this page, you will deeply understand the brute force pattern matching algorithm—its mechanics, its intuition, its implementation, and most importantly, the fundamental inefficiency that motivates every advanced string algorithm you'll encounter.

The Pattern Matching Problem Restated

Before diving into the algorithm, let's crystallize the problem we're solving:

Given:

A text string T of length n
A pattern string P of length m
Where typically m ≤ n (the pattern is no longer than the text)

Find:

All positions i in T where P occurs as a substring
An occurrence at position i means T[i..i+m-1] = P[0..m-1]

Example:

Text: "ABABDABACDABABCABAB"
Pattern: "ABABCABAB"
Answer: Pattern occurs at position 10 (0-indexed)

The brute force algorithm attacks this problem in the most direct way possible: try every starting position in the text, and for each, check if the pattern matches character by character.

Indexing Convention

Throughout this module, we use 0-indexed strings. Position 0 is the first character. When we say the pattern occurs at position i, we mean T[i] is the first character of the occurrence, and T[i+m-1] is the last.

The Algorithm's Core Idea

The brute force algorithm embodies a simple principle: exhaustive verification at every valid position.

Valid positions are those where the pattern could potentially fit within the text. If the text has length n and the pattern has length m, then:

The pattern can start at position 0, 1, 2, ..., up to position n - m
There are exactly n - m + 1 valid starting positions
At position n - m, the pattern extends from T[n-m] to T[n-1], using exactly the last m characters

The algorithm in plain English:

Start at text position 0
Compare P[0] with T[0], then P[1] with T[1], and so on
If all m characters match, record this as an occurrence
If any character mismatches, stop comparing at this position
Move to the next text position and repeat
Continue until you've tried all n - m + 1 positions

The "Brute Force" Philosophy

The term 'brute force' doesn't mean 'stupid'—it means 'complete and direct.' The algorithm makes no assumptions, uses no shortcuts, and guarantees correctness through exhaustive checking. This is often the right first approach: get something correct, then optimize.

Why "at every valid position"?

Consider searching for pattern "ABC" in text "AABCD" (n=5, m=3):

Position	Substring Checked	Match?
0	`AAB`	No
1	`ABC`	Yes ✓
2	`BCD`	No

We can't start at position 3 or beyond because T[3..5] wouldn't have enough characters to form a length-3 pattern. Hence, we check positions 0 through 5 - 3 = 2 (inclusive).

This exhaustive checking is the algorithm's strength (it never misses an occurrence) and weakness (it never skips an unnecessary check).

Formal Algorithm Description

Let's formalize the brute force algorithm with precise pseudocode and then implement it in code:

Pseudocode:

NAIVE-PATTERN-MATCH(T, P)
    n ← length(T)
    m ← length(P)
    occurrences ← empty list
    
    for i ← 0 to n - m do
        j ← 0
        while j < m and T[i + j] = P[j] do
            j ← j + 1
        end while
        
        if j = m then
            append i to occurrences
        end if
    end for
    
    return occurrences

Explanation of the pseudocode:

Outer loop (for i): Iterates through all valid starting positions in the text (0 to n-m inclusive)
Inner loop (while j): Compares pattern characters with text characters starting at position i
Match condition (j = m): If we successfully matched all m characters, j will equal m
Mismatch: The inner while loop exits early when T[i+j] ≠ P[j], meaning a mismatch occurred at pattern position j

naive_pattern_matching
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
/**
 * Naive (Brute Force) Pattern Matching Algorithm
 * 
 * Searches for all occurrences of pattern P in text T.
 * Returns an array of starting indices where P is found.
 * 
 * Time Complexity: O(n * m) worst case, O(n) best case
 * Space Complexity: O(1) auxiliary (excluding output storage)
 * 
 * @param text - The text string to search within
 * @param pattern - The pattern string to search for
 * @returns Array of starting positions (0-indexed) where pattern occurs
 */
function naivePatternMatch(text: string, pattern: string): number[] {
    const n = text.length;
    const m = pattern.length;
    const occurrences: number[] = [];
    
    // Edge cases
    if (m === 0) {
        // Empty pattern matches at every position (including after last char)
        return Array.from({ length: n + 1 }, (_, i) => i);
    }
    if (m > n) {
        // Pattern longer than text: no possible match
        return occurrences;
    }
    
    // Try each valid starting position
    for (let i = 0; i <= n - m; i++) {
        let j = 0;
        
        // Compare pattern with text starting at position i
        while (j < m && text[i + j] === pattern[j]) {
            j++;
        }
        
        // If we matched all m characters, record this occurrence
        if (j === m) {
            occurrences.push(i);
        }
    }
    
    return occurrences;
}
 
// Example usage
const text = "ABABDABACDABABCABAB";
const pattern = "ABABCABAB";
console.log(`Text: "${text}"`);
console.log(`Pattern: "${pattern}"`);
console.log(`Occurrences at: ${naivePatternMatch(text, pattern)}`);
// Output: Occurrences at: [10]

Tracing Through an Example

Understanding the algorithm deeply requires tracing through an example step by step. Let's work through a complete example that reveals the algorithm's behavior:

Text: T = "AABCAAB"
Pattern: P = "AAB"

Here, n = 7 and m = 3, so we'll check positions 0 through 4 (inclusive).

Complete Trace of Naive Pattern Matching
Position i	Substring T[i..i+2]	Comparison Steps	Result
0	AAB	A=A ✓, A=A ✓, B=B ✓ → j reaches 3	MATCH at position 0
1	ABC	A=A ✓, B≠A ✗ → mismatch at j=1	No match
2	BCA	B≠A ✗ → mismatch at j=0	No match
3	CAA	C≠A ✗ → mismatch at j=0	No match
4	AAB	A=A ✓, A=A ✓, B=B ✓ → j reaches 3	MATCH at position 4

Result: Pattern "AAB" occurs at positions 0 and 4.

Observations from this trace:

Total comparisons performed: 3 + 2 + 1 + 1 + 3 = 10 comparisons
Matches found: 2 occurrences where all 3 characters matched
Early termination: Positions 2 and 3 terminated after just 1 comparison (first character mismatch)
No position skipped: Even though we found a match at position 0, we still checked position 1, 2, 3, and 4 independently

This last observation is crucial. The naive algorithm treats every position independently. It doesn't "remember" anything from previous comparisons. We'll see in later pages how this lack of memory creates the opportunity for optimization.

Wasted Work Preview

At position 0, we verified that T[0..2] = "AAB". This tells us T[1..2] = "AB". When we move to position 1, we then compare T[1] with P[0] = 'A'—but we already know T[1] = 'A' from the previous match! The naive algorithm discards this information. Smarter algorithms exploit it.

The Inner Loop Mechanics

The inner loop of the naive algorithm deserves careful examination because its behavior determines both correctness and efficiency.

The inner loop:

while j < m and T[i + j] = P[j] do
    j ← j + 1
end while

Two-condition exit:

The loop can exit for exactly two reasons:

j = m (complete match): We've compared all m characters of the pattern, and every single one matched. This means P occurs at position i in T.
T[i + j] ≠ P[j] (mismatch): We found a character that doesn't match. No point continuing—this position cannot be a match.

Why this two-condition design is elegant:

It handles both success and failure in a single construct
The j < m check comes first, preventing out-of-bounds access
After the loop, j tells us exactly how many characters matched (useful for debugging and optimization)

Visualizing match depth:

Let's denote the match depth at position i as the value of j when the inner loop exits. This tells us how far into the pattern we successfully matched.

Example: Text = "ABAAC", Pattern = "AAC"

Position i	Characters compared	Match depth j	Why loop stopped
0	A=A	1	A≠A (T[1]='B')
1	...	0	B≠A (first char)
2	A=A, A=A, C=C	3	j=m (full match!)

At position 0, we matched 1 character before failing. At position 1, we matched 0 characters. At position 2, we matched all 3 characters—a complete match.

Key insight: The match depth varies dramatically across positions. When many characters match before a failure, we've done significant work that is then "thrown away" when we move to the next position. This wasted work is the fundamental inefficiency we'll address in later algorithms.

Debugging Tip

When implementing pattern matching, tracking the match depth at each position is invaluable for debugging. If you're getting wrong answers, print the match depth at each position—it immediately reveals where comparisons are going wrong.

Correctness Argument

Why is the naive algorithm correct? This might seem obvious, but stating it precisely builds intuition for more complex algorithms.

Theorem: The naive pattern matching algorithm finds all and only the positions where pattern P occurs in text T.

Proof (informal but rigorous):

Completeness (finds all occurrences):

An occurrence of P at position i means T[i+j] = P[j] for all j ∈ [0, m-1]
The outer loop considers every position i from 0 to n-m
At the correct position i, the inner loop will match all m characters
Thus, j will reach m, and we record position i
No occurrence is missed because we check every valid position

Soundness (only finds actual occurrences):

We record position i only when j = m after the inner loop
j = m means the inner loop executed m iterations without breaking
Each iteration verified T[i+j] = P[j] for one value of j
Therefore, T[i..i+m-1] = P[0..m-1] (definition of occurrence)
No false positive is possible

Edge cases:

If P is empty (m = 0), we define it to match at every position (including position n)
If m > n, there are no valid positions, and we correctly return empty

Why Completeness Matters

When we later study optimized algorithms like KMP, we'll need to prove they're complete—that they don't accidentally skip valid occurrences while avoiding redundant work. The naive algorithm's completeness is trivial because it checks everything. For KMP, proving completeness requires careful reasoning about the failure function.

Alternative Formulations

The naive algorithm can be implemented in several equivalent ways. Understanding these alternatives deepens comprehension and prepares you for variations in interview problems.

Formulation 1: Using Substring Comparison

Instead of comparing character by character, compare entire substrings:

naive_substring
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/**
 * Naive pattern matching using substring comparison.
 * Conceptually cleaner but typically slower due to substring creation.
 */
function naiveWithSubstring(text: string, pattern: string): number[] {
    const n = text.length;
    const m = pattern.length;
    const occurrences: number[] = [];
    
    for (let i = 0; i <= n - m; i++) {
        // Extract substring and compare
        if (text.substring(i, i + m) === pattern) {
            occurrences.push(i);
        }
    }
    
    return occurrences;
}
 
// Note: substring() creates a new string each iteration,
// which adds O(m) time and space per position.
// Total: O(n * m) time, O(m) extra space per iteration.

Formulation 2: Right-to-Left Comparison

Compare characters from the end of the pattern instead of the beginning:

naive_right_to_left
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/**
 * Naive pattern matching comparing right-to-left.
 * Same complexity, but different behavior on mismatches.
 * This direction choice inspires the Boyer-Moore algorithm.
 */
function naiveRightToLeft(text: string, pattern: string): number[] {
    const n = text.length;
    const m = pattern.length;
    const occurrences: number[] = [];
    
    for (let i = 0; i <= n - m; i++) {
        let j = m - 1;  // Start from the last character
        
        // Compare from right to left
        while (j >= 0 && text[i + j] === pattern[j]) {
            j--;
        }
        
        // If j went below 0, all characters matched
        if (j < 0) {
            occurrences.push(i);
        }
    }
    
    return occurrences;
}
 
// This direction doesn't change worst-case complexity,
// but it's the foundation for the Boyer-Moore algorithm,
// which uses the mismatch position to make smarter skips.

Direction Matters for Optimization

Right-to-left comparison is the foundation of the Boyer-Moore algorithm. When we find a mismatch at position j, Boyer-Moore uses information about which character caused the mismatch to skip multiple positions at once. This is harder to exploit with left-to-right comparison.

Summary and Key Takeaways

We've now established a complete understanding of the brute force pattern matching algorithm. Let's consolidate the key points:

Key Takeaways

•The naive algorithm is the natural baseline — It does exactly what intuition suggests: check every position systematically.
•Outer loop scans positions, inner loop verifies matches — The two-loop structure cleanly separates "where to check" from "how to check."
•Complete correctness is guaranteed — We find all occurrences and only actual occurrences because we check everything.
•No preprocessing required — We can start matching immediately without building any auxiliary structures.
•Each position is treated independently — The algorithm doesn't use information from previous comparisons, which is both its simplicity and its weakness.
•Match depth varies by position — Some positions fail immediately (first character mismatch), others fail late (near-matches), and some succeed completely.

What's next:

In the next page, we'll visualize how the pattern "slides" over the text during the algorithm's execution. This sliding window perspective is crucial for understanding both the algorithm's behavior and the optimization opportunities that exist.

Page Complete

You now deeply understand the mechanics of brute force pattern matching. You can implement it, trace through examples, and explain why it's correct. Next, we'll visualize the sliding window behavior that characterizes this algorithm.