Data Structures & AlgorithmsKMP Algorithm

KMP Algorithm — Failure Function & Matching

LevelIntermediate

Duration75 mins

TopicKMP Algorithm

1 / 5

Key Insight — Avoiding Redundant Comparisons

The Wasted Work Problem

In 1974, three computer scientists—Donald Knuth, James Morris, and Vaughan Pratt—independently discovered the same profound insight about string matching. Their discovery, now known as the KMP algorithm, doesn't just improve pattern matching by a constant factor or shave off a few operations. It fundamentally transforms how we think about the problem itself.

The KMP algorithm's genius lies in a single, elegant observation: every mismatch carries information we can exploit. When the naive algorithm encounters a mismatch, it discards everything it has learned about the text and starts fresh at the next position. This is spectacularly wasteful. KMP asks: What if we remembered what we've seen?

What You Will Learn

By the end of this page, you will understand the fundamental insight that powers the KMP algorithm: that mismatches during pattern matching aren't failures to be forgotten—they're information to be leveraged. You'll see exactly why the naive algorithm does redundant work and precisely how KMP avoids it.

The Naive Algorithm's Fatal Flaw

Before we can appreciate KMP's insight, we need to deeply understand the problem it solves. The naive pattern matching algorithm is intuitive and easy to implement:

Align the pattern at position 0 of the text
Compare characters one by one from left to right
If all characters match, record a match
If a mismatch occurs, shift the pattern right by one position
Repeat until the pattern falls off the end of the text

This approach works, but it has a fundamental inefficiency that becomes catastrophic in worst-case scenarios.

naive_pattern_match.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def naive_search(text: str, pattern: str) -> list[int]:
    """
    Naive pattern matching algorithm.
    Returns all starting positions where pattern occurs in text.
    
    Time Complexity: O(n × m) in the worst case
    Space Complexity: O(1) excluding output
    """
    n, m = len(text), len(pattern)
    matches = []
    
    for i in range(n - m + 1):  # Try each starting position
        j = 0
        while j < m and text[i + j] == pattern[j]:
            j += 1
        
        if j == m:  # Full match found
            matches.append(i)
        # If mismatch: we just move to next position, 
        # FORGETTING everything we learned about text[i+1..i+j-1]
    
    return matches

The hidden catastrophe:

Consider searching for the pattern "AAAAAAB" in the text "AAAAAAAAAAAAAAAAB". Watch what happens:

Text:    A A A A A A A A A A A A A A A A B
Pattern: A A A A A A B
         ✓ ✓ ✓ ✓ ✓ ✓ ✗  → Mismatch at position 6

Shift pattern by 1:
         A A A A A A B
           ✓ ✓ ✓ ✓ ✓ ✗  → Mismatch at position 6 (again!)

Shift pattern by 1:
             A A A A A A B
               ✓ ✓ ✓ ✓ ✗  → Mismatch at position 6 (again!)

Do you see the problem? We're comparing many of the same characters over and over. We already know that text[1..5] contains all As—we compared them in the first alignment! Yet we compare them again, and again, and again.

This redundancy leads to O(n × m) time complexity in the worst case, where n is the text length and m is the pattern length.

The O(n × m) Worst Case Is Real

This isn't a theoretical concern. DNA sequences, repetitive log files, and network protocol parsing regularly encounter patterns with high self-similarity. A naive search for 'AAAAAAAT' in a megabyte of 'A's would perform billions of redundant character comparisons.

What Information Are We Wasting?

Let's analyze precisely what the naive algorithm forgets. Consider this scenario:

Text:    A B C A B C A B D
Pattern: A B C A B D
         ✓ ✓ ✓ ✓ ✓ ✗  → Mismatch at index 5 (pattern 'D' vs text 'C')

At the moment of mismatch, we know the following:

We matched "ABCAB": The first 5 characters of the pattern matched the text at positions 0-4.
Therefore, text[0..4] = "ABCAB": This is certain—we just verified it character by character.
Now observe something crucial: The suffix "AB" of what we matched ("ABC**AB**") is also a prefix of the pattern ("**AB**CABD")!

This last observation is the key insight. When we shift the pattern right by one position and start comparing again from scratch, we're ignoring the fact that we already know some characters will match.

The Core Insight

When a mismatch occurs after matching k characters, we don't need to start over from scratch. If the matched portion has a suffix that's also a prefix of the pattern, we can 'slide' the pattern forward to align that prefix with the matching suffix—and continue comparing from there.

Let's visualize this:

Step 1: After mismatch at position 5
Text:    A B C A B C A B D
Pattern: A B C A B D
         └─┬─┘ └─┬─┘ ✗
          prefix  suffix (of matched part) matches!
         
Step 2: SMART shift - align the matching prefix with the suffix
Text:    A B C A B C A B D
Pattern:       A B C A B D
               └─┬─┘
           These are already known to match!

Instead of starting comparison from the beginning of the pattern at the new position, we can resume from index 2 of the pattern (right after the prefix "AB" that we know matches). We skip 2 comparisons that we already know would succeed.

Comparison: Naive vs Smart Shift After Mismatch
Approach	Action After Mismatch	Characters Re-compared	Efficient?
Naive	Shift pattern by 1, restart comparison from index 0	All previous matches (potentially m-1)	No
Smart (KMP)	Shift pattern to align prefix with matching suffix, resume from alignment point	Zero	Yes

The Prefix-Suffix Connection

The key to KMP is recognizing a relationship that exists purely within the pattern itself. We need to precompute, for each position in the pattern: "If a mismatch occurs after matching up to this position, how far back in the pattern can we safely resume?"

The answer depends on finding proper prefixes that are also suffixes of the matched portion.

Definition:

A prefix of string s is any substring starting at index 0: s[0..k] for some k
A suffix of string s is any substring ending at the last index: s[j..len-1] for some j
A proper prefix/suffix excludes the entire string itself

prefix_suffix_examples.txt
Pattern: "ABCAB"
 
Prefixes:  "", "A", "AB", "ABC", "ABCA", "ABCAB"
Suffixes:  "", "B", "AB", "CAB", "BCAB", "ABCAB"
 
Proper prefixes that are also suffixes:
- "" (empty string, trivial, length 0)
- "AB" (length 2) ✓ This is what we care about!
 
The LONGEST proper prefix that is also a suffix of "ABCAB" is "AB", length 2.
 
Pattern: "AAAA"
 
Proper prefixes that are also suffixes:
- "" (length 0)
- "A" (length 1)
- "AA" (length 2)
- "AAA" (length 3) ✓ Longest
 
Pattern: "ABCD"
 
Proper prefixes that are also suffixes:
- "" (length 0) ✓ Only the empty string qualifies
 
No non-trivial overlap, so if mismatch occurs, we must restart from 0.

Why does this matter?

When we've matched pattern[0..j-1] and encounter a mismatch at pattern[j]:

We know text[i-j..i-1] equals pattern[0..j-1] (we just matched them)
If pattern[0..j-1] has a proper prefix of length k that equals its suffix of length k...
Then text[i-k..i-1] (the last k characters we matched in text) equals pattern[0..k-1] (the first k characters of pattern)
We can shift the pattern so these align, and resume comparison at pattern[k]!

The Failure Function Preview

This length—the longest proper prefix that's also a suffix—is exactly what the 'failure function' (or 'LPS array' for Longest Proper Prefix which is also Suffix) computes. For each position j in the pattern, it tells us: 'If you mismatch at position j, resume at position lps[j-1]'. We'll build this array in detail in the next pages.

Visualizing the Shift Strategy

Let's trace through a complete example to see how the smart shift strategy works. We'll search for "ABABAC" in "ABABABAC".

Pattern Analysis First:

Pattern: A B A B A C
Index:   0 1 2 3 4 5

For prefix "ABABA" (indices 0-4), what's the longest proper prefix = suffix?
  Prefixes: A, AB, ABA, ABAB
  Suffixes: A, BA, ABA, BABA
  Common:   A (length 1), ABA (length 3)
  Longest:  ABA (length 3)

So if we mismatch at index 5, we can resume at index 3!

Now the search:

Step 1:
Text:    A B A B A B A C
Pattern: A B A B A C
         ✓ ✓ ✓ ✓ ✓ ✗
         
Mismatch at pattern index 5 (C ≠ B)
We matched "ABABA" (indices 0-4)
Longest prefix-suffix: "ABA" (length 3)

Step 2: SMART SHIFT
Text:    A B A B A B A C
Pattern:     A B A B A C
             ↑
         Resume comparing from pattern index 3!
         
We DON'T compare indices 0,1,2 - we KNOW they match:
- text[2..4] = "ABA" = pattern[0..2]

Step 3: Continue
Text:    A B A B A B A C
Pattern:     A B A B A C
             - - - ✓ ✓ ✓
                   (we start here at j=3)
                   
All remaining characters match! Match found at position 2.

Naive Approach

•First alignment: 6 comparisons
•Shift to position 1, compare from start
•Second alignment: 1 comparison (mismatch at index 0)
•Shift to position 2, compare from start
•Third alignment: 6 comparisons (match!)
•Total: 13+ comparisons

KMP Approach

•First alignment: 6 comparisons
•Smart shift: resume at index 3
•Second alignment: 3 comparisons (match!)
•Skip positions 1-2 entirely (proven impossible)
•Total: 9 comparisons
•Zero redundant comparisons!

The Magic of Zero Backtracking

Notice that in KMP, the text pointer NEVER moves backward. Once we've compared text[i], we never compare it again. This single property guarantees O(n) text traversal. Combined with O(m) pattern preprocessing, we achieve O(n + m) total time—a dramatic improvement over O(n × m).

Why This Works — The Mathematical Foundation

Let's formalize the insight to see why KMP is correct (never misses a match) and complete (the shift is maximal—we don't skip valid match positions).

Theorem (Correctness): When KMP shifts the pattern after a mismatch, any starting positions skipped over cannot yield a valid match.

Proof Sketch:

Suppose we've matched pattern[0..j-1] against text[i-j..i-1], and mismatch occurs at pattern[j] vs text[i]. Consider any starting position s where i-j < s < i-j+k (positions we skip). For a match at position s:

pattern[0] must match text[s]
pattern[1] must match text[s+1]
...
pattern[i-s-1] must match text[i-1]

But this means pattern[0..i-s-1] would have to equal text[s..i-1]. Since we know text[s..i-1] equals pattern[i-j-s..j-1] (part of our matched substring), we'd need:

pattern[0..i-s-1] = pattern[j-(i-s)..j-1]

This is saying: a prefix of pattern (length i-s) equals a suffix of pattern[0..j-1] of the same length. If this length exceeds our computed LPS value, it contradicts the definition of LPS as the longest such prefix-suffix!

The LPS Guarantees Maximum Safe Shift

The LPS (Longest Proper Prefix which is also Suffix) value is exactly the right amount: shifting any less would mean re-checking positions we've already proven work; shifting any more would risk skipping a valid match position. LPS finds the perfect balance.

The Key Invariant:

KMP maintains a crucial invariant throughout its execution:

At every step, if we're comparing pattern[j] with text[i], then pattern[0..j-1] is guaranteed to match text[i-j..i-1].

This invariant means:

We never compare a character we've already matched successfully
When we mismatch, we use the LPS to find the longest prefix we know still matches
We resume from there, maintaining the invariant

The text pointer i only moves forward (never backward), and for each position of i, the pattern pointer j can decrease (via LPS lookups), but the total decreases are bounded by the total increases. This leads to the amortized analysis showing O(n) text processing.

KMP Properties That Guarantee Correctness
Property	What It Means	Why It Matters
No text backtracking	Text pointer only moves forward	Each text character examined at most once → O(n)
LPS maximality	LPS gives longest valid prefix-suffix overlap	We never skip valid match positions
LPS minimality	We shift as much as safely possible	No redundant comparisons with same text position
Invariant preservation	pattern[0..j-1] always matches text[i-j..i-1]	Algorithm state is always valid for correct matching

The Paradigm Shift — Pattern Information vs Text Information

KMP represents a fundamental paradigm shift in how we think about string matching. The naive algorithm treats the pattern and text symmetrically—just characters to compare. KMP recognizes a crucial asymmetry:

The pattern is known in advance. The text is not.

This asymmetry is the key. Since we know the pattern beforehand, we can precompute everything interesting about its structure. We can analyze which prefixes equal which suffixes, and encode this in a lookup table (the failure function / LPS array).

When we then scan the text, we're not just matching characters—we're leveraging our precomputed knowledge to make globally optimal decisions at each mismatch.

The KMP Philosophy

•Invest in preprocessing. Time spent analyzing the pattern pays dividends during the search.
•Every mismatch teaches you something. Don't discard information—use it to guide your next move.
•The pattern's structure determines the search strategy. Highly repetitive patterns (like 'AAAA') allow aggressive shifts; patterns with no internal repetition (like 'ABCD') require conservative shifts.
•Separate concerns. Pattern analysis is independent of text scanning—we can reuse the same LPS array to search multiple texts.
•Never go backward. Once you've processed a text character, move forward. The past is fully encoded in your current state (the pattern position j).

A Design Pattern for Algorithms

This 'precompute table from known data, then use table with unknown data' pattern appears throughout computer science: in dynamic programming, in compiler optimization, in database query planning. KMP is one of the cleanest examples of this principle in action.

Practical Implications

Understanding why KMP works helps you make informed decisions about when to use it and how it fits into larger systems.

When KMP's Insight Matters Most
Scenario	Pattern Characteristics	KMP Benefit
DNA sequence search	Long patterns with repetitive motifs (ATAT, CGCG)	Massive reduction in comparisons for similar sequences
Log file analysis	Searching for error codes, timestamps	Handles repetitive log line structures efficiently
Network protocol parsing	Fixed headers with repeated bytes	O(n) guarantee prevents timeout attacks
Text editor find	User-specified patterns, any structure	Consistent performance regardless of pattern type
Intrusion detection	Signature matching in traffic	Predictable, bounded time per byte scanned

When KMP's insight is less impactful:

Short patterns: For patterns of length 3-4, the preprocessing overhead may not pay off.
Patterns with no repetition: For "ABCDEF", the LPS array is all zeros—every mismatch restarts from the beginning anyway. KMP degenerates to naive behavior (but never worse!).
Single search in small text: If you're searching once in a small string, the constant factors in naive search may actually be lower.

However, even in these cases, KMP never performs worse than naive asymptotically, and in practice, the overhead is minimal. Many real-world implementations use KMP as the default for longer patterns.

Security Consideration

In network and security applications, worst-case guarantees matter. An attacker could craft input to trigger naive algorithm's O(n×m) behavior, causing denial-of-service. KMP's O(n+m) guarantee is a security feature, not just a performance optimization.

Summary: The Key Insight

We've established the fundamental insight that powers the KMP algorithm. Let's consolidate what we've learned:

Key Takeaways

•The naive algorithm's O(n×m) worst case comes from redundant comparisons — it forgets what it learned after each mismatch.
•Mismatches carry information — when we mismatch at position j, we know exactly what the last j characters of text were.
•Proper prefixes that are also suffixes enable safe skipping — if 'AB' is both a prefix and suffix of our matched portion, we can align the prefix with where the suffix is.
•The LPS (Longest Proper Prefix = Suffix) value determines the optimal shift — it's the maximum safe shift that doesn't skip valid matches.
•KMP never backtracks in the text — each text character is compared at most once, guaranteeing O(n) text processing.
•Pattern preprocessing enables text-time efficiency — we do O(m) work once, then benefit on arbitrarily long texts.

What's next:

Now that we understand why KMP works, we need to understand how it works in detail. The next page explores the failure function (LPS array)—the central data structure that encodes prefix-suffix relationships. We'll see exactly what values it contains, why each value is correct, and how to interpret it during matching.

Page Complete

You now understand the fundamental insight behind the KMP algorithm: that pattern matching can be made efficient by recognizing and exploiting the internal structure of the pattern. This insight—that mismatches provide information rather than just representing failure—is one of the most elegant ideas in algorithm design.

1 / 5

Loading learning content...

Data Structures & AlgorithmsKMP Algorithm

KMP Algorithm — Failure Function & Matching

LevelIntermediate

Duration75 mins

TopicKMP Algorithm

1 / 5

Key Insight — Avoiding Redundant Comparisons

The Wasted Work Problem

What You Will Learn

The Naive Algorithm's Fatal Flaw

Before we can appreciate KMP's insight, we need to deeply understand the problem it solves. The naive pattern matching algorithm is intuitive and easy to implement:

Align the pattern at position 0 of the text
Compare characters one by one from left to right
If all characters match, record a match
If a mismatch occurs, shift the pattern right by one position
Repeat until the pattern falls off the end of the text

This approach works, but it has a fundamental inefficiency that becomes catastrophic in worst-case scenarios.

naive_pattern_match.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def naive_search(text: str, pattern: str) -> list[int]:
    """
    Naive pattern matching algorithm.
    Returns all starting positions where pattern occurs in text.
    
    Time Complexity: O(n × m) in the worst case
    Space Complexity: O(1) excluding output
    """
    n, m = len(text), len(pattern)
    matches = []
    
    for i in range(n - m + 1):  # Try each starting position
        j = 0
        while j < m and text[i + j] == pattern[j]:
            j += 1
        
        if j == m:  # Full match found
            matches.append(i)
        # If mismatch: we just move to next position, 
        # FORGETTING everything we learned about text[i+1..i+j-1]
    
    return matches

The hidden catastrophe:

Consider searching for the pattern "AAAAAAB" in the text "AAAAAAAAAAAAAAAAB". Watch what happens:

Text:    A A A A A A A A A A A A A A A A B
Pattern: A A A A A A B
         ✓ ✓ ✓ ✓ ✓ ✓ ✗  → Mismatch at position 6

Shift pattern by 1:
         A A A A A A B
           ✓ ✓ ✓ ✓ ✓ ✗  → Mismatch at position 6 (again!)

Shift pattern by 1:
             A A A A A A B
               ✓ ✓ ✓ ✓ ✗  → Mismatch at position 6 (again!)

This redundancy leads to O(n × m) time complexity in the worst case, where n is the text length and m is the pattern length.

The O(n × m) Worst Case Is Real

What Information Are We Wasting?

Let's analyze precisely what the naive algorithm forgets. Consider this scenario:

Text:    A B C A B C A B D
Pattern: A B C A B D
         ✓ ✓ ✓ ✓ ✓ ✗  → Mismatch at index 5 (pattern 'D' vs text 'C')

At the moment of mismatch, we know the following:

We matched "ABCAB": The first 5 characters of the pattern matched the text at positions 0-4.
Therefore, text[0..4] = "ABCAB": This is certain—we just verified it character by character.
Now observe something crucial: The suffix "AB" of what we matched ("ABC**AB**") is also a prefix of the pattern ("**AB**CABD")!

The Core Insight

Let's visualize this:

Step 1: After mismatch at position 5
Text:    A B C A B C A B D
Pattern: A B C A B D
         └─┬─┘ └─┬─┘ ✗
          prefix  suffix (of matched part) matches!
         
Step 2: SMART shift - align the matching prefix with the suffix
Text:    A B C A B C A B D
Pattern:       A B C A B D
               └─┬─┘
           These are already known to match!

Comparison: Naive vs Smart Shift After Mismatch
Approach	Action After Mismatch	Characters Re-compared	Efficient?
Naive	Shift pattern by 1, restart comparison from index 0	All previous matches (potentially m-1)	No
Smart (KMP)	Shift pattern to align prefix with matching suffix, resume from alignment point	Zero	Yes

The Prefix-Suffix Connection

The answer depends on finding proper prefixes that are also suffixes of the matched portion.

Definition:

A prefix of string s is any substring starting at index 0: s[0..k] for some k
A suffix of string s is any substring ending at the last index: s[j..len-1] for some j
A proper prefix/suffix excludes the entire string itself

prefix_suffix_examples.txt
Pattern: "ABCAB"
 
Prefixes:  "", "A", "AB", "ABC", "ABCA", "ABCAB"
Suffixes:  "", "B", "AB", "CAB", "BCAB", "ABCAB"
 
Proper prefixes that are also suffixes:
- "" (empty string, trivial, length 0)
- "AB" (length 2) ✓ This is what we care about!
 
The LONGEST proper prefix that is also a suffix of "ABCAB" is "AB", length 2.
 
Pattern: "AAAA"
 
Proper prefixes that are also suffixes:
- "" (length 0)
- "A" (length 1)
- "AA" (length 2)
- "AAA" (length 3) ✓ Longest
 
Pattern: "ABCD"
 
Proper prefixes that are also suffixes:
- "" (length 0) ✓ Only the empty string qualifies
 
No non-trivial overlap, so if mismatch occurs, we must restart from 0.

Why does this matter?

When we've matched pattern[0..j-1] and encounter a mismatch at pattern[j]:

We know text[i-j..i-1] equals pattern[0..j-1] (we just matched them)
If pattern[0..j-1] has a proper prefix of length k that equals its suffix of length k...
Then text[i-k..i-1] (the last k characters we matched in text) equals pattern[0..k-1] (the first k characters of pattern)
We can shift the pattern so these align, and resume comparison at pattern[k]!

The Failure Function Preview

Visualizing the Shift Strategy

Let's trace through a complete example to see how the smart shift strategy works. We'll search for "ABABAC" in "ABABABAC".

Pattern Analysis First:

Pattern: A B A B A C
Index:   0 1 2 3 4 5

For prefix "ABABA" (indices 0-4), what's the longest proper prefix = suffix?
  Prefixes: A, AB, ABA, ABAB
  Suffixes: A, BA, ABA, BABA
  Common:   A (length 1), ABA (length 3)
  Longest:  ABA (length 3)

So if we mismatch at index 5, we can resume at index 3!

Now the search:

Step 1:
Text:    A B A B A B A C
Pattern: A B A B A C
         ✓ ✓ ✓ ✓ ✓ ✗
         
Mismatch at pattern index 5 (C ≠ B)
We matched "ABABA" (indices 0-4)
Longest prefix-suffix: "ABA" (length 3)

Step 2: SMART SHIFT
Text:    A B A B A B A C
Pattern:     A B A B A C
             ↑
         Resume comparing from pattern index 3!
         
We DON'T compare indices 0,1,2 - we KNOW they match:
- text[2..4] = "ABA" = pattern[0..2]

Step 3: Continue
Text:    A B A B A B A C
Pattern:     A B A B A C
             - - - ✓ ✓ ✓
                   (we start here at j=3)
                   
All remaining characters match! Match found at position 2.

Naive Approach

•First alignment: 6 comparisons
•Shift to position 1, compare from start
•Second alignment: 1 comparison (mismatch at index 0)
•Shift to position 2, compare from start
•Third alignment: 6 comparisons (match!)
•Total: 13+ comparisons

KMP Approach

•First alignment: 6 comparisons
•Smart shift: resume at index 3
•Second alignment: 3 comparisons (match!)
•Skip positions 1-2 entirely (proven impossible)
•Total: 9 comparisons
•Zero redundant comparisons!

The Magic of Zero Backtracking

Why This Works — The Mathematical Foundation

Let's formalize the insight to see why KMP is correct (never misses a match) and complete (the shift is maximal—we don't skip valid match positions).

Theorem (Correctness): When KMP shifts the pattern after a mismatch, any starting positions skipped over cannot yield a valid match.

Proof Sketch:

pattern[0] must match text[s]
pattern[1] must match text[s+1]
...
pattern[i-s-1] must match text[i-1]

But this means pattern[0..i-s-1] would have to equal text[s..i-1]. Since we know text[s..i-1] equals pattern[i-j-s..j-1] (part of our matched substring), we'd need:

pattern[0..i-s-1] = pattern[j-(i-s)..j-1]

The LPS Guarantees Maximum Safe Shift

The Key Invariant:

KMP maintains a crucial invariant throughout its execution:

At every step, if we're comparing pattern[j] with text[i], then pattern[0..j-1] is guaranteed to match text[i-j..i-1].

This invariant means:

We never compare a character we've already matched successfully
When we mismatch, we use the LPS to find the longest prefix we know still matches
We resume from there, maintaining the invariant

KMP Properties That Guarantee Correctness
Property	What It Means	Why It Matters
No text backtracking	Text pointer only moves forward	Each text character examined at most once → O(n)
LPS maximality	LPS gives longest valid prefix-suffix overlap	We never skip valid match positions
LPS minimality	We shift as much as safely possible	No redundant comparisons with same text position
Invariant preservation	pattern[0..j-1] always matches text[i-j..i-1]	Algorithm state is always valid for correct matching

The Paradigm Shift — Pattern Information vs Text Information

The pattern is known in advance. The text is not.

When we then scan the text, we're not just matching characters—we're leveraging our precomputed knowledge to make globally optimal decisions at each mismatch.

The KMP Philosophy

•Invest in preprocessing. Time spent analyzing the pattern pays dividends during the search.
•Every mismatch teaches you something. Don't discard information—use it to guide your next move.
•The pattern's structure determines the search strategy. Highly repetitive patterns (like 'AAAA') allow aggressive shifts; patterns with no internal repetition (like 'ABCD') require conservative shifts.
•Separate concerns. Pattern analysis is independent of text scanning—we can reuse the same LPS array to search multiple texts.
•Never go backward. Once you've processed a text character, move forward. The past is fully encoded in your current state (the pattern position j).

A Design Pattern for Algorithms

Practical Implications

Understanding why KMP works helps you make informed decisions about when to use it and how it fits into larger systems.

When KMP's Insight Matters Most
Scenario	Pattern Characteristics	KMP Benefit
DNA sequence search	Long patterns with repetitive motifs (ATAT, CGCG)	Massive reduction in comparisons for similar sequences
Log file analysis	Searching for error codes, timestamps	Handles repetitive log line structures efficiently
Network protocol parsing	Fixed headers with repeated bytes	O(n) guarantee prevents timeout attacks
Text editor find	User-specified patterns, any structure	Consistent performance regardless of pattern type
Intrusion detection	Signature matching in traffic	Predictable, bounded time per byte scanned

When KMP's insight is less impactful:

Short patterns: For patterns of length 3-4, the preprocessing overhead may not pay off.
Patterns with no repetition: For "ABCDEF", the LPS array is all zeros—every mismatch restarts from the beginning anyway. KMP degenerates to naive behavior (but never worse!).
Single search in small text: If you're searching once in a small string, the constant factors in naive search may actually be lower.

Security Consideration

Summary: The Key Insight

We've established the fundamental insight that powers the KMP algorithm. Let's consolidate what we've learned:

Key Takeaways

•The naive algorithm's O(n×m) worst case comes from redundant comparisons — it forgets what it learned after each mismatch.
•Mismatches carry information — when we mismatch at position j, we know exactly what the last j characters of text were.
•Proper prefixes that are also suffixes enable safe skipping — if 'AB' is both a prefix and suffix of our matched portion, we can align the prefix with where the suffix is.
•The LPS (Longest Proper Prefix = Suffix) value determines the optimal shift — it's the maximum safe shift that doesn't skip valid matches.
•KMP never backtracks in the text — each text character is compared at most once, guaranteeing O(n) text processing.
•Pattern preprocessing enables text-time efficiency — we do O(m) work once, then benefit on arbitrarily long texts.

What's next:

Page Complete

1 / 5