Data Structures & AlgorithmsKMP Algorithm

KMP Algorithm — Failure Function & Matching

LevelIntermediate

Duration75 mins

TopicKMP Algorithm

3 / 5

How the Failure Function Enables Skipping

From Theory to Mechanism

We've established what the LPS array contains and why prefix-suffix relationships matter. Now we need to understand the precise mechanism by which these relationships enable efficient skipping during pattern matching.

This page connects the abstract concept ("longest proper prefix = suffix") to concrete actions ("skip these positions, resume here"). By the end, you'll not only understand that KMP skips work, but why each skip is:

Safe — we don't miss any valid matches
Optimal — we skip as much as possible
Deterministic — exactly one correct action at each step

What You Will Learn

By the end of this page, you will understand the complete skipping mechanism: how LPS values translate to shift distances, why each shift is maximal yet safe, and how the pattern's structure determines the algorithm's behavior. You'll be able to trace KMP execution completely.

The Skip Distance Formula

When we mismatch at pattern position j (having matched positions 0 through j-1), how far do we shift the pattern? Let's derive this carefully.

Before mismatch:

Text positions [i-j, i-j+1, ..., i-1] match pattern[0, 1, ..., j-1]
Text position i does NOT match pattern[j]

After consulting LPS[j-1] = k:

We know that pattern[0..k-1] equals pattern[j-k..j-1]
Since pattern[j-k..j-1] matches text[i-k..i-1] (part of our matched region)...
...we have pattern[0..k-1] matching text[i-k..i-1]

New alignment:

Pattern now "starts" at text position i - k
We resume comparing at pattern[k] vs text[i]

skip_formula.txt
BEFORE MISMATCH (matched j characters, mismatch at pattern[j]):
 
Text:    ... X X X X X X X X X X X X ...
Position:    i-j     ...       i-1  i
              └─── matched ───┘    └─ mismatch
 
Pattern: P[0] P[1] ... P[j-1] P[j]
              └─── matched ──┘   └─ mismatch
 
AFTER LPS LOOKUP (LPS[j-1] = k):
 
Text:    ... X X X X X X X X X X X X ...
                        i-k ... i-1  i
                         └─ known match ─┘
                         
Pattern:            P[0] ... P[k-1] P[k]
                     └─ known match ─┘   └─ resume here
 
EFFECTIVE SHIFT = j - k positions
 
The pattern has moved right by (j - k) positions.
We've skipped (j - k) text positions that cannot be match starts.

The Key Formula

Shift distance = j - LPS[j-1]. When LPS[j-1] is large (pattern has long prefix-suffix overlap), we shift less. When LPS[j-1] is 0 (no overlap), we shift by j positions (maximum possible given what we matched).

Skip Examples for Pattern 'ABCAB' (LPS = [0,0,0,1,2])
Mismatch at j	Matched Chars	LPS[j-1]	Shift Distance	Resume at
j = 1	1 ('A')	LPS[0] = 0	1 - 0 = 1	pattern[0]
j = 2	2 ('AB')	LPS[1] = 0	2 - 0 = 2	pattern[0]
j = 3	3 ('ABC')	LPS[2] = 0	3 - 0 = 3	pattern[0]
j = 4	4 ('ABCA')	LPS[3] = 1	4 - 1 = 3	pattern[1]
j = 5	5 ('ABCAB')	LPS[4] = 2	5 - 2 = 3	pattern[2]

Proof of Safety — Why We Never Miss Matches

The shift might seem aggressive. How do we know we're not skipping over a valid match? Let's prove this rigorously.

Theorem: When KMP shifts the pattern after a mismatch at position j, any starting position skipped over cannot yield a complete match.

Proof:

Suppose we matched pattern[0..j-1] against text[i-j..i-1], then mismatched at position j. Let LPS[j-1] = k.

We shift from starting position (i-j) to starting position (i-k). This skips starting positions (i-j+1), (i-j+2), ..., (i-k-1).

Claim: None of these skipped positions can be the start of a valid match.

Proof of claim:

Consider any skipped starting position s where (i-j) < s < (i-k), i.e., s = (i-j+t) for some 0 < t < (j-k).

For the pattern to match starting at position s:

pattern[0] must equal text[s] = text[i-j+t]
pattern[1] must equal text[s+1] = text[i-j+t+1]
...
pattern[j-t-1] must equal text[i-1]

But we know text[i-j+t .. i-1] = pattern[t .. j-1] (since these text characters matched this part of the pattern).

So for a match at position s, we'd need:

pattern[0 .. j-t-1] = pattern[t .. j-1]

This says: a prefix of length (j-t) equals a suffix of pattern[0..j-1] of length (j-t).

But k = LPS[j-1] is the LONGEST such prefix-suffix! If there were a longer one of length (j-t) > k, the LPS value would be wrong.

Since t < (j-k), we have (j-t) > k, which would contradict LPS[j-1] = k being maximal.

Therefore, no skipped position can be a valid match start. ∎

LPS Guarantees Correctness

The proof shows that LPS being the LONGEST proper prefix-suffix is essential. Any longer overlap would mean we could safely skip less (but we don't need to). Any shorter overlap would risk missing matches. The LPS value is exactly right.

Proof of Optimality — Why We Can't Skip More

We've shown KMP's skip is safe. But is it the maximum safe skip? Yes—and here's why.

Theorem: The shift to position (i - LPS[j-1]) is the maximum safe shift.

Proof:

Suppose we tried to shift further, to starting position (i - k + 1) or beyond, for some valid k.

At the new starting position (i - k), we need pattern[0..k-1] to match text[i-k..i-1].

Since LPS[j-1] = k, by definition pattern[0..k-1] = pattern[j-k..j-1].

And we know pattern[j-k..j-1] matched text[i-k..i-1].

So pattern[0..k-1] DOES match text[i-k..i-1].

This alignment is VALID and CANNOT be skipped.

Therefore, we must land at position (i-k) and cannot go further. ∎

The key insight:

LPS[j-1] = k tells us there IS a prefix of length k that matches. We MUST check this position—it might lead to a complete match. So we can't skip it.

But we also know from the previous proof that anything between (i-j+1) and (i-k-1) CAN'T match. So we skip exactly those positions.

The skip is maximum (we skip everything that's provably useless) and safe (we don't skip anything that could match).

What We Skip

•Starting positions where no prefix could match
•Positions (i-j+1) through (i-k-1)
•Exactly (j - k - 1) positions
•These are PROVEN impossible

What We Don't Skip

•Position (i-k) where prefix matches
•Any position where LPS indicates possible match
•The current text position i
•These MIGHT lead to matches

Tracing Complete Examples

Let's trace through complete examples to see the skipping mechanism in action.

Example 1: Pattern "ABABAC" in Text "ABABABAC"

First, compute LPS for "ABABAC":

Pattern: A B A B A C
Index:   0 1 2 3 4 5
LPS:     0 0 1 2 3 0

trace_example_1.txt
Pattern: ABABAC (LPS = [0, 0, 1, 2, 3, 0])
Text:    ABABABAC
 
STEP 1: Compare from beginning
Text:    A B A B A B A C
Pattern: A B A B A C
         ✓ ✓ ✓ ✓ ✓ ✗
         
Mismatch at j=5: pattern[5]='C' ≠ text[5]='B'
Matched: "ABABA" (5 characters)
LPS[4] = 3, so shift by (5 - 3) = 2 positions
Resume at pattern[3]
 
STEP 2: After shift (resume at pattern[3])
Text:    A B A B A B A C
             └─────────→ Text pointer still at position 5
Pattern:     A B A B A C
             0 1 2 3 4 5
                   ↑
             Resume here (j=3)
             
Pattern[0..2] = "ABA" is ALREADY KNOWN to match text[2..4]
Compare pattern[3]='B' vs text[5]='B' ✓
Compare pattern[4]='A' vs text[6]='A' ✓
Compare pattern[5]='C' vs text[7]='C' ✓
 
j = 6 = m, MATCH FOUND at position 7 - 6 = 2
 
ANALYSIS:
- Naive would have tried positions 0, 1, 2 with fresh comparisons each time
- KMP tried position 0, then jumped directly to position 2
- Skipped position 1 entirely (proven impossible)
- Reused 3 comparisons from first attempt (the "ABA" prefix)
- Total comparisons: 6 + 3 = 9 instead of potentially 6 + 2 + 6 = 14+

Example 2: Pattern "AAAB" in Text "AAAAAAAB"

LPS for "AAAB":

Pattern: A A A B
Index:   0 1 2 3
LPS:     0 1 2 0

trace_example_2.txt
Pattern: AAAB (LPS = [0, 1, 2, 0])
Text:    AAAAAAAB
 
STEP 1:
Text:    A A A A A A A B
Pattern: A A A B
         ✓ ✓ ✓ ✗
         
Mismatch at j=3: 'B' ≠ 'A'
LPS[2] = 2, shift by (3 - 2) = 1
Resume at pattern[2]
 
STEP 2:
Text:    A A A A A A A B
           ↑ (text position 3)
Pattern:   A A A B
               ↑ (j=2)
               
Compare pattern[2]='A' vs text[3]='A' ✓
Compare pattern[3]='B' vs text[4]='A' ✗
 
Mismatch at j=4? No wait, j would be 3 after the match.
Actually: after matching pattern[2], j becomes 3, then:
pattern[3]='B' vs text[4]='A' ✗
 
LPS[2] = 2, shift by (3 - 2) = 1
Resume at pattern[2]
 
STEP 3:
Text:    A A A A A A A B
             ↑ (text position 4)  
Pattern:     A A A B
                 ↑ (j=2)
                 
pattern[2]='A' vs text[4]='A' ✓
pattern[3]='B' vs text[5]='A' ✗
 
LPS[2] = 2, shift again...
 
STEP 4:
Text:    A A A A A A A B
               ↑ (text position 5)
pattern[2]='A' vs text[5]='A' ✓
pattern[3]='B' vs text[6]='A' ✗
 
STEP 5:
Text:    A A A A A A A B
                 ↑ (text position 6)
pattern[2]='A' vs text[6]='A' ✓
pattern[3]='B' vs text[7]='B' ✓
 
MATCH FOUND at position 4
 
ANALYSIS:
- Text has 8 characters, pattern has 4
- Total comparisons: 3 + 2 + 2 + 2 + 2 = 11 (some of these are "reused")
- Actually analyzing carefully: we compare each text char at most once with 
  pattern[2] or [3], and the first 3 were normal comparisons
- Naive would do: 4 + 4 + 4 + 4 + 4 = 20 comparisons (5 starting positions, 
  each potentially matching 3 A's before failing)
- KMP's guarantee: at most n + m = 12 comparisons (we did 11)

The Worst Case for KMP is Benign

Even in the 'AAAB' in 'AAAAAAAB' case—which is adversarial for naive search—KMP maintains its O(n+m) guarantee. The text pointer never moves backward, and the pattern pointer's total movement is bounded by n+m.

The Cascade Effect — Multiple LPS Lookups

Sometimes a single LPS lookup isn't enough. After jumping to position k = LPS[j-1], we might immediately mismatch again. Do we need another LPS lookup?

Yes! And this is perfectly handled by the algorithm:

while j > 0 and text[i] != pattern[j]:
    j = lps[j-1]  # Keep falling back until we find a match or hit 0

This "cascading" of LPS lookups happens when the pattern has nested overlaps.

Example: Pattern "AABAAAB" searching in Text "AABAAACAABAAAB"

Pattern: A A B A A A B
Index:   0 1 2 3 4 5 6
LPS:     0 1 0 1 2 2 3

When we mismatch at position 6 after matching "AABAAA":

LPS[5] = 2 → try pattern[2]
If that mismatches too, LPS[1] = 1 → try pattern[1]
If that mismatches, LPS[0] = 0 → try pattern[0]
If that mismatches, increment text pointer

This cascade finds the next-best alignment opportunity at each step.

cascade_trace.txt
Pattern: AABAAAB (LPS = [0, 1, 0, 1, 2, 2, 3])
Text:    AABAAACAABAAAB
 
STEP 1:
Text:    A A B A A A C A A B A A A B
Pattern: A A B A A A B
         ✓ ✓ ✓ ✓ ✓ ✓ ✗
         
Mismatch at j=6: 'B' ≠ 'C'
LPS[5] = 2, so j becomes 2
 
STEP 2 (cascade check):
j = 2, compare pattern[2]='B' vs text[6]='C' ✗
Still mismatch! LPS[1] = 1, so j becomes 1
 
STEP 3 (cascade continues):
j = 1, compare pattern[1]='A' vs text[6]='C' ✗
Still mismatch! LPS[0] = 0, so j becomes 0
 
STEP 4:
j = 0, compare pattern[0]='A' vs text[6]='C' ✗
j = 0, so increment text pointer: i becomes 7
 
STEP 5:
Now at text[7] = 'A', j = 0
pattern[0]='A' vs text[7]='A' ✓
Continue matching...
 
Eventually finds match at position 7.
 
NOTE: The cascade at position 6 tried pattern positions 2, 1, 0 in sequence.
Each is a valid "fallback" alignment. We exhausted all possibilities before 
moving the text pointer.

Cascade Doesn't Hurt Complexity

You might worry that cascading LPS lookups could cause O(m) work per text character. But the amortized analysis shows this is fine: each decrease in j from cascading corresponds to a previous increase that we 'paid for' already. Total j increases ≤ n, so total j decreases ≤ n, giving O(n) overall for all cascades.

Skipping Visualized — Pattern Movement

Let's visualize exactly how the pattern "moves" and which starting positions are evaluated vs skipped.

Pattern: "ABAB" in Text: "ABABABAB"

LPS for ABAB = [0, 0, 1, 2]

movement_visual.txt
Text:    A B A B A B A B
Index:   0 1 2 3 4 5 6 7
 
═══════════════════════════════════════════════════════════════
 
ALIGNMENT 1: Start at position 0
Text:    A B A B A B A B
Pattern: A B A B
         ✓ ✓ ✓ ✓ → MATCH at position 0!
 
After match, j = LPS[3] = 2, keep text pointer at i = 4
 
═══════════════════════════════════════════════════════════════
 
ALIGNMENT 2: Implied start at position 2 (4 - 2 = 2)
Text:    A B A B A B A B
Pattern:     A B A B
             ↑ ↑
             │ └── resume comparing here (j=2)
             └──── these are known to match (AB = AB)
             
Compare pattern[2]='A' vs text[4]='A' ✓
Compare pattern[3]='B' vs text[5]='B' ✓
MATCH at position 2!
 
After match, j = LPS[3] = 2, keep text pointer at i = 6
 
═══════════════════════════════════════════════════════════════
 
ALIGNMENT 3: Implied start at position 4 (6 - 2 = 4)
Text:    A B A B A B A B
Pattern:         A B A B
         
Compare pattern[2]='A' vs text[6]='A' ✓
Compare pattern[3]='B' vs text[7]='B' ✓
MATCH at position 4!
 
After match, j = LPS[3] = 2, but i = 8 = n, so we're done.
 
═══════════════════════════════════════════════════════════════
 
SUMMARY:
Starting positions checked: 0, 2, 4
Starting positions SKIPPED: 1, 3  ← proven impossible by LPS structure
Matches found: 3 (at positions 0, 2, 4)
Pattern overlaps with itself in text at AB boundaries

Starting Position Analysis
Position	Status	Why
0	Checked ✓	Initial position, always checked
1	Skipped	No valid prefix-suffix enables alignment here
2	Checked ✓	LPS[3]=2 means pattern[0..1] matches text[2..3]
3	Skipped	No valid prefix-suffix enables alignment here
4	Checked ✓	LPS[3]=2 means pattern[0..1] matches text[4..5]

Overlapping Matches Found Efficiently

Notice how KMP naturally finds all three overlapping matches. After each match, the LPS value tells us exactly where the next potential match could start—which in this periodic pattern is every 2 positions.

The Invariant Maintained Throughout

KMP maintains a crucial invariant that makes reasoning about correctness straightforward:

Invariant: At every moment during execution, if the pattern pointer is at position j and the text pointer is at position i, then:

text[i-j .. i-1] = pattern[0 .. j-1]

In other words, the last j characters we've "confirmed" in the text exactly match the first j characters of the pattern.

This invariant is:

True initially: i=0, j=0, and the statement is vacuously true (no characters confirmed).
Preserved on match: When text[i] = pattern[j], we increment both. The new rightmost confirmed character extends the match.
Preserved on LPS fallback: When we set j = LPS[j-1], we're saying "the last k=LPS[j-1] characters of what we matched ARE the first k characters of the pattern." The invariant holds with the smaller j value.
Preserved on text advance: When j=0 and we mismatch, we increment i. The invariant is again vacuously true.

Why Invariants Matter

This invariant is the key to understanding KMP's correctness. At any point, the state (i, j) encodes complete information about what we've matched. We never 're-compare' text we've advanced past, and the pattern prefix we've 'accumulated' in j is always valid.

State as Accumulated Knowledge:

Think of j as representing accumulated knowledge about the text we've processed:

j = 0: "I know nothing about whether a pattern match is building"
j = 3: "I know the last 3 text characters were pattern[0..2], so if the next character is pattern[3], we're making progress"
j = m: "I know the last m text characters were pattern[0..m-1], i.e., a complete match!"

The LPS array tells us: "If you have knowledge level j and see a non-matching character, what's the best knowledge level you can preserve?" The answer is LPS[j-1]—the longest prefix of pattern that we can still claim matches a suffix of what we've seen.

invariant_trace.txt
Pattern: ABCAB (LPS = [0, 0, 0, 1, 2])
Text:    XYZABCABCABD
 
Trace with invariant verification:
 
State (i=0, j=0): Invariant: text[-0..-1] = "" = pattern[-0..-1] ✓ (vacuous)
Compare text[0]='X' vs pattern[0]='A': mismatch
j = 0, so i++ → (i=1, j=0)
 
State (i=1, j=0): Invariant holds (vacuous)
Compare text[1]='Y' vs pattern[0]='A': mismatch
i++ → (i=2, j=0)
 
State (i=2, j=0): Compare 'Z' vs 'A': mismatch
i++ → (i=3, j=0)
 
State (i=3, j=0): Compare 'A' vs 'A': match!
i++, j++ → (i=4, j=1)
Invariant: text[3..3] = "A" = pattern[0..0] ✓
 
State (i=4, j=1): Compare 'B' vs 'B': match!
→ (i=5, j=2)
Invariant: text[3..4] = "AB" = pattern[0..1] ✓
 
State (i=5, j=2): Compare 'C' vs 'C': match!
→ (i=6, j=3)
Invariant: text[3..5] = "ABC" = pattern[0..2] ✓
 
State (i=6, j=3): Compare 'A' vs 'A': match!
→ (i=7, j=4)
Invariant: text[3..6] = "ABCA" = pattern[0..3] ✓
 
State (i=7, j=4): Compare 'B' vs 'B': match!
→ (i=8, j=5)
j = 5 = m, MATCH FOUND at position 3! 
j = LPS[4] = 2 → (i=8, j=2)
Invariant: text[6..7] = "AB" = pattern[0..1] ✓
 
...and continues

Summary: How Skipping Works

We've now fully understood the mechanism by which the LPS array enables efficient skipping. Let's consolidate:

Key Takeaways

•Skip distance = j - LPS[j-1] — the larger the prefix-suffix overlap, the smaller the shift (because more is preserved).
•The skip is SAFE — LPS being maximal means no valid match exists at skipped positions (proven rigorously).
•The skip is OPTIMAL — we can't skip further because LPS[j-1] characters DO match at the new position.
•Cascading LPS lookups handle nested overlaps — the while loop finds the best fallback position.
•The invariant (text[i-j..i-1] = pattern[0..j-1]) is always maintained — our state encodes accumulated knowledge.
•Text pointer never retreats — this is the heart of O(n) text processing.

What's next:

We've seen how the LPS array enables efficient skipping during matching. But we haven't yet shown how to build the LPS array efficiently. The next page covers this—and the beautiful insight is that building the LPS array uses the same technique as matching, applied to the pattern matching against itself!

Page Complete

You now fully understand the skipping mechanism in KMP: how LPS values translate to shift distances, why each shift is simultaneously safe and maximal, how cascading handles complex patterns, and how the invariant ensures correctness. The matching algorithm is now completely transparent to you.

3 / 5

Loading learning content...

Data Structures & AlgorithmsKMP Algorithm

KMP Algorithm — Failure Function & Matching

LevelIntermediate

Duration75 mins

TopicKMP Algorithm

3 / 5

How the Failure Function Enables Skipping

From Theory to Mechanism

Safe — we don't miss any valid matches
Optimal — we skip as much as possible
Deterministic — exactly one correct action at each step

What You Will Learn

The Skip Distance Formula

When we mismatch at pattern position j (having matched positions 0 through j-1), how far do we shift the pattern? Let's derive this carefully.

Before mismatch:

Text positions [i-j, i-j+1, ..., i-1] match pattern[0, 1, ..., j-1]
Text position i does NOT match pattern[j]

After consulting LPS[j-1] = k:

We know that pattern[0..k-1] equals pattern[j-k..j-1]
Since pattern[j-k..j-1] matches text[i-k..i-1] (part of our matched region)...
...we have pattern[0..k-1] matching text[i-k..i-1]

New alignment:

Pattern now "starts" at text position i - k
We resume comparing at pattern[k] vs text[i]

skip_formula.txt
BEFORE MISMATCH (matched j characters, mismatch at pattern[j]):
 
Text:    ... X X X X X X X X X X X X ...
Position:    i-j     ...       i-1  i
              └─── matched ───┘    └─ mismatch
 
Pattern: P[0] P[1] ... P[j-1] P[j]
              └─── matched ──┘   └─ mismatch
 
AFTER LPS LOOKUP (LPS[j-1] = k):
 
Text:    ... X X X X X X X X X X X X ...
                        i-k ... i-1  i
                         └─ known match ─┘
                         
Pattern:            P[0] ... P[k-1] P[k]
                     └─ known match ─┘   └─ resume here
 
EFFECTIVE SHIFT = j - k positions
 
The pattern has moved right by (j - k) positions.
We've skipped (j - k) text positions that cannot be match starts.

The Key Formula

Skip Examples for Pattern 'ABCAB' (LPS = [0,0,0,1,2])
Mismatch at j	Matched Chars	LPS[j-1]	Shift Distance	Resume at
j = 1	1 ('A')	LPS[0] = 0	1 - 0 = 1	pattern[0]
j = 2	2 ('AB')	LPS[1] = 0	2 - 0 = 2	pattern[0]
j = 3	3 ('ABC')	LPS[2] = 0	3 - 0 = 3	pattern[0]
j = 4	4 ('ABCA')	LPS[3] = 1	4 - 1 = 3	pattern[1]
j = 5	5 ('ABCAB')	LPS[4] = 2	5 - 2 = 3	pattern[2]

Proof of Safety — Why We Never Miss Matches

The shift might seem aggressive. How do we know we're not skipping over a valid match? Let's prove this rigorously.

Theorem: When KMP shifts the pattern after a mismatch at position j, any starting position skipped over cannot yield a complete match.

Proof:

Suppose we matched pattern[0..j-1] against text[i-j..i-1], then mismatched at position j. Let LPS[j-1] = k.

We shift from starting position (i-j) to starting position (i-k). This skips starting positions (i-j+1), (i-j+2), ..., (i-k-1).

Claim: None of these skipped positions can be the start of a valid match.

Proof of claim:

Consider any skipped starting position s where (i-j) < s < (i-k), i.e., s = (i-j+t) for some 0 < t < (j-k).

For the pattern to match starting at position s:

pattern[0] must equal text[s] = text[i-j+t]
pattern[1] must equal text[s+1] = text[i-j+t+1]
...
pattern[j-t-1] must equal text[i-1]

But we know text[i-j+t .. i-1] = pattern[t .. j-1] (since these text characters matched this part of the pattern).

So for a match at position s, we'd need:

pattern[0 .. j-t-1] = pattern[t .. j-1]

This says: a prefix of length (j-t) equals a suffix of pattern[0..j-1] of length (j-t).

But k = LPS[j-1] is the LONGEST such prefix-suffix! If there were a longer one of length (j-t) > k, the LPS value would be wrong.

Since t < (j-k), we have (j-t) > k, which would contradict LPS[j-1] = k being maximal.

Therefore, no skipped position can be a valid match start. ∎

LPS Guarantees Correctness

Proof of Optimality — Why We Can't Skip More

We've shown KMP's skip is safe. But is it the maximum safe skip? Yes—and here's why.

Theorem: The shift to position (i - LPS[j-1]) is the maximum safe shift.

Proof:

Suppose we tried to shift further, to starting position (i - k + 1) or beyond, for some valid k.

At the new starting position (i - k), we need pattern[0..k-1] to match text[i-k..i-1].

Since LPS[j-1] = k, by definition pattern[0..k-1] = pattern[j-k..j-1].

And we know pattern[j-k..j-1] matched text[i-k..i-1].

So pattern[0..k-1] DOES match text[i-k..i-1].

This alignment is VALID and CANNOT be skipped.

Therefore, we must land at position (i-k) and cannot go further. ∎

The key insight:

LPS[j-1] = k tells us there IS a prefix of length k that matches. We MUST check this position—it might lead to a complete match. So we can't skip it.

But we also know from the previous proof that anything between (i-j+1) and (i-k-1) CAN'T match. So we skip exactly those positions.

The skip is maximum (we skip everything that's provably useless) and safe (we don't skip anything that could match).

What We Skip

•Starting positions where no prefix could match
•Positions (i-j+1) through (i-k-1)
•Exactly (j - k - 1) positions
•These are PROVEN impossible

What We Don't Skip

•Position (i-k) where prefix matches
•Any position where LPS indicates possible match
•The current text position i
•These MIGHT lead to matches

Tracing Complete Examples

Let's trace through complete examples to see the skipping mechanism in action.

Example 1: Pattern "ABABAC" in Text "ABABABAC"

First, compute LPS for "ABABAC":

Pattern: A B A B A C
Index:   0 1 2 3 4 5
LPS:     0 0 1 2 3 0

trace_example_1.txt
Pattern: ABABAC (LPS = [0, 0, 1, 2, 3, 0])
Text:    ABABABAC
 
STEP 1: Compare from beginning
Text:    A B A B A B A C
Pattern: A B A B A C
         ✓ ✓ ✓ ✓ ✓ ✗
         
Mismatch at j=5: pattern[5]='C' ≠ text[5]='B'
Matched: "ABABA" (5 characters)
LPS[4] = 3, so shift by (5 - 3) = 2 positions
Resume at pattern[3]
 
STEP 2: After shift (resume at pattern[3])
Text:    A B A B A B A C
             └─────────→ Text pointer still at position 5
Pattern:     A B A B A C
             0 1 2 3 4 5
                   ↑
             Resume here (j=3)
             
Pattern[0..2] = "ABA" is ALREADY KNOWN to match text[2..4]
Compare pattern[3]='B' vs text[5]='B' ✓
Compare pattern[4]='A' vs text[6]='A' ✓
Compare pattern[5]='C' vs text[7]='C' ✓
 
j = 6 = m, MATCH FOUND at position 7 - 6 = 2
 
ANALYSIS:
- Naive would have tried positions 0, 1, 2 with fresh comparisons each time
- KMP tried position 0, then jumped directly to position 2
- Skipped position 1 entirely (proven impossible)
- Reused 3 comparisons from first attempt (the "ABA" prefix)
- Total comparisons: 6 + 3 = 9 instead of potentially 6 + 2 + 6 = 14+

Example 2: Pattern "AAAB" in Text "AAAAAAAB"

LPS for "AAAB":

Pattern: A A A B
Index:   0 1 2 3
LPS:     0 1 2 0

trace_example_2.txt
Pattern: AAAB (LPS = [0, 1, 2, 0])
Text:    AAAAAAAB
 
STEP 1:
Text:    A A A A A A A B
Pattern: A A A B
         ✓ ✓ ✓ ✗
         
Mismatch at j=3: 'B' ≠ 'A'
LPS[2] = 2, shift by (3 - 2) = 1
Resume at pattern[2]
 
STEP 2:
Text:    A A A A A A A B
           ↑ (text position 3)
Pattern:   A A A B
               ↑ (j=2)
               
Compare pattern[2]='A' vs text[3]='A' ✓
Compare pattern[3]='B' vs text[4]='A' ✗
 
Mismatch at j=4? No wait, j would be 3 after the match.
Actually: after matching pattern[2], j becomes 3, then:
pattern[3]='B' vs text[4]='A' ✗
 
LPS[2] = 2, shift by (3 - 2) = 1
Resume at pattern[2]
 
STEP 3:
Text:    A A A A A A A B
             ↑ (text position 4)  
Pattern:     A A A B
                 ↑ (j=2)
                 
pattern[2]='A' vs text[4]='A' ✓
pattern[3]='B' vs text[5]='A' ✗
 
LPS[2] = 2, shift again...
 
STEP 4:
Text:    A A A A A A A B
               ↑ (text position 5)
pattern[2]='A' vs text[5]='A' ✓
pattern[3]='B' vs text[6]='A' ✗
 
STEP 5:
Text:    A A A A A A A B
                 ↑ (text position 6)
pattern[2]='A' vs text[6]='A' ✓
pattern[3]='B' vs text[7]='B' ✓
 
MATCH FOUND at position 4
 
ANALYSIS:
- Text has 8 characters, pattern has 4
- Total comparisons: 3 + 2 + 2 + 2 + 2 = 11 (some of these are "reused")
- Actually analyzing carefully: we compare each text char at most once with 
  pattern[2] or [3], and the first 3 were normal comparisons
- Naive would do: 4 + 4 + 4 + 4 + 4 = 20 comparisons (5 starting positions, 
  each potentially matching 3 A's before failing)
- KMP's guarantee: at most n + m = 12 comparisons (we did 11)

The Worst Case for KMP is Benign

The Cascade Effect — Multiple LPS Lookups

Sometimes a single LPS lookup isn't enough. After jumping to position k = LPS[j-1], we might immediately mismatch again. Do we need another LPS lookup?

Yes! And this is perfectly handled by the algorithm:

while j > 0 and text[i] != pattern[j]:
    j = lps[j-1]  # Keep falling back until we find a match or hit 0

This "cascading" of LPS lookups happens when the pattern has nested overlaps.

Example: Pattern "AABAAAB" searching in Text "AABAAACAABAAAB"

Pattern: A A B A A A B
Index:   0 1 2 3 4 5 6
LPS:     0 1 0 1 2 2 3

When we mismatch at position 6 after matching "AABAAA":

LPS[5] = 2 → try pattern[2]
If that mismatches too, LPS[1] = 1 → try pattern[1]
If that mismatches, LPS[0] = 0 → try pattern[0]
If that mismatches, increment text pointer

This cascade finds the next-best alignment opportunity at each step.

cascade_trace.txt
Pattern: AABAAAB (LPS = [0, 1, 0, 1, 2, 2, 3])
Text:    AABAAACAABAAAB
 
STEP 1:
Text:    A A B A A A C A A B A A A B
Pattern: A A B A A A B
         ✓ ✓ ✓ ✓ ✓ ✓ ✗
         
Mismatch at j=6: 'B' ≠ 'C'
LPS[5] = 2, so j becomes 2
 
STEP 2 (cascade check):
j = 2, compare pattern[2]='B' vs text[6]='C' ✗
Still mismatch! LPS[1] = 1, so j becomes 1
 
STEP 3 (cascade continues):
j = 1, compare pattern[1]='A' vs text[6]='C' ✗
Still mismatch! LPS[0] = 0, so j becomes 0
 
STEP 4:
j = 0, compare pattern[0]='A' vs text[6]='C' ✗
j = 0, so increment text pointer: i becomes 7
 
STEP 5:
Now at text[7] = 'A', j = 0
pattern[0]='A' vs text[7]='A' ✓
Continue matching...
 
Eventually finds match at position 7.
 
NOTE: The cascade at position 6 tried pattern positions 2, 1, 0 in sequence.
Each is a valid "fallback" alignment. We exhausted all possibilities before 
moving the text pointer.

Cascade Doesn't Hurt Complexity

Skipping Visualized — Pattern Movement

Let's visualize exactly how the pattern "moves" and which starting positions are evaluated vs skipped.

Pattern: "ABAB" in Text: "ABABABAB"

LPS for ABAB = [0, 0, 1, 2]

movement_visual.txt
Text:    A B A B A B A B
Index:   0 1 2 3 4 5 6 7
 
═══════════════════════════════════════════════════════════════
 
ALIGNMENT 1: Start at position 0
Text:    A B A B A B A B
Pattern: A B A B
         ✓ ✓ ✓ ✓ → MATCH at position 0!
 
After match, j = LPS[3] = 2, keep text pointer at i = 4
 
═══════════════════════════════════════════════════════════════
 
ALIGNMENT 2: Implied start at position 2 (4 - 2 = 2)
Text:    A B A B A B A B
Pattern:     A B A B
             ↑ ↑
             │ └── resume comparing here (j=2)
             └──── these are known to match (AB = AB)
             
Compare pattern[2]='A' vs text[4]='A' ✓
Compare pattern[3]='B' vs text[5]='B' ✓
MATCH at position 2!
 
After match, j = LPS[3] = 2, keep text pointer at i = 6
 
═══════════════════════════════════════════════════════════════
 
ALIGNMENT 3: Implied start at position 4 (6 - 2 = 4)
Text:    A B A B A B A B
Pattern:         A B A B
         
Compare pattern[2]='A' vs text[6]='A' ✓
Compare pattern[3]='B' vs text[7]='B' ✓
MATCH at position 4!
 
After match, j = LPS[3] = 2, but i = 8 = n, so we're done.
 
═══════════════════════════════════════════════════════════════
 
SUMMARY:
Starting positions checked: 0, 2, 4
Starting positions SKIPPED: 1, 3  ← proven impossible by LPS structure
Matches found: 3 (at positions 0, 2, 4)
Pattern overlaps with itself in text at AB boundaries

Starting Position Analysis
Position	Status	Why
0	Checked ✓	Initial position, always checked
1	Skipped	No valid prefix-suffix enables alignment here
2	Checked ✓	LPS[3]=2 means pattern[0..1] matches text[2..3]
3	Skipped	No valid prefix-suffix enables alignment here
4	Checked ✓	LPS[3]=2 means pattern[0..1] matches text[4..5]

Overlapping Matches Found Efficiently

The Invariant Maintained Throughout

KMP maintains a crucial invariant that makes reasoning about correctness straightforward:

Invariant: At every moment during execution, if the pattern pointer is at position j and the text pointer is at position i, then:

text[i-j .. i-1] = pattern[0 .. j-1]

In other words, the last j characters we've "confirmed" in the text exactly match the first j characters of the pattern.

This invariant is:

True initially: i=0, j=0, and the statement is vacuously true (no characters confirmed).
Preserved on match: When text[i] = pattern[j], we increment both. The new rightmost confirmed character extends the match.
Preserved on LPS fallback: When we set j = LPS[j-1], we're saying "the last k=LPS[j-1] characters of what we matched ARE the first k characters of the pattern." The invariant holds with the smaller j value.
Preserved on text advance: When j=0 and we mismatch, we increment i. The invariant is again vacuously true.

Why Invariants Matter

State as Accumulated Knowledge:

Think of j as representing accumulated knowledge about the text we've processed:

j = 0: "I know nothing about whether a pattern match is building"
j = 3: "I know the last 3 text characters were pattern[0..2], so if the next character is pattern[3], we're making progress"
j = m: "I know the last m text characters were pattern[0..m-1], i.e., a complete match!"

invariant_trace.txt
Pattern: ABCAB (LPS = [0, 0, 0, 1, 2])
Text:    XYZABCABCABD
 
Trace with invariant verification:
 
State (i=0, j=0): Invariant: text[-0..-1] = "" = pattern[-0..-1] ✓ (vacuous)
Compare text[0]='X' vs pattern[0]='A': mismatch
j = 0, so i++ → (i=1, j=0)
 
State (i=1, j=0): Invariant holds (vacuous)
Compare text[1]='Y' vs pattern[0]='A': mismatch
i++ → (i=2, j=0)
 
State (i=2, j=0): Compare 'Z' vs 'A': mismatch
i++ → (i=3, j=0)
 
State (i=3, j=0): Compare 'A' vs 'A': match!
i++, j++ → (i=4, j=1)
Invariant: text[3..3] = "A" = pattern[0..0] ✓
 
State (i=4, j=1): Compare 'B' vs 'B': match!
→ (i=5, j=2)
Invariant: text[3..4] = "AB" = pattern[0..1] ✓
 
State (i=5, j=2): Compare 'C' vs 'C': match!
→ (i=6, j=3)
Invariant: text[3..5] = "ABC" = pattern[0..2] ✓
 
State (i=6, j=3): Compare 'A' vs 'A': match!
→ (i=7, j=4)
Invariant: text[3..6] = "ABCA" = pattern[0..3] ✓
 
State (i=7, j=4): Compare 'B' vs 'B': match!
→ (i=8, j=5)
j = 5 = m, MATCH FOUND at position 3! 
j = LPS[4] = 2 → (i=8, j=2)
Invariant: text[6..7] = "AB" = pattern[0..1] ✓
 
...and continues

Summary: How Skipping Works

We've now fully understood the mechanism by which the LPS array enables efficient skipping. Let's consolidate:

Key Takeaways

•Skip distance = j - LPS[j-1] — the larger the prefix-suffix overlap, the smaller the shift (because more is preserved).
•The skip is SAFE — LPS being maximal means no valid match exists at skipped positions (proven rigorously).
•The skip is OPTIMAL — we can't skip further because LPS[j-1] characters DO match at the new position.
•Cascading LPS lookups handle nested overlaps — the while loop finds the best fallback position.
•The invariant (text[i-j..i-1] = pattern[0..j-1]) is always maintained — our state encodes accumulated knowledge.
•Text pointer never retreats — this is the heart of O(n) text processing.

What's next:

Page Complete

3 / 5