Loading learning content...
We've established what the LPS array contains and why prefix-suffix relationships matter. Now we need to understand the precise mechanism by which these relationships enable efficient skipping during pattern matching.
This page connects the abstract concept ("longest proper prefix = suffix") to concrete actions ("skip these positions, resume here"). By the end, you'll not only understand that KMP skips work, but why each skip is:
By the end of this page, you will understand the complete skipping mechanism: how LPS values translate to shift distances, why each shift is maximal yet safe, and how the pattern's structure determines the algorithm's behavior. You'll be able to trace KMP execution completely.
When we mismatch at pattern position j (having matched positions 0 through j-1), how far do we shift the pattern? Let's derive this carefully.
Before mismatch:
[i-j, i-j+1, ..., i-1] match pattern[0, 1, ..., j-1]i does NOT match pattern[j]After consulting LPS[j-1] = k:
pattern[0..k-1] equals pattern[j-k..j-1]pattern[j-k..j-1] matches text[i-k..i-1] (part of our matched region)...pattern[0..k-1] matching text[i-k..i-1]New alignment:
i - kpattern[k] vs text[i]BEFORE MISMATCH (matched j characters, mismatch at pattern[j]): Text: ... X X X X X X X X X X X X ...Position: i-j ... i-1 i └─── matched ───┘ └─ mismatch Pattern: P[0] P[1] ... P[j-1] P[j] └─── matched ──┘ └─ mismatch AFTER LPS LOOKUP (LPS[j-1] = k): Text: ... X X X X X X X X X X X X ... i-k ... i-1 i └─ known match ─┘ Pattern: P[0] ... P[k-1] P[k] └─ known match ─┘ └─ resume here EFFECTIVE SHIFT = j - k positions The pattern has moved right by (j - k) positions.We've skipped (j - k) text positions that cannot be match starts.Shift distance = j - LPS[j-1]. When LPS[j-1] is large (pattern has long prefix-suffix overlap), we shift less. When LPS[j-1] is 0 (no overlap), we shift by j positions (maximum possible given what we matched).
| Mismatch at j | Matched Chars | LPS[j-1] | Shift Distance | Resume at |
|---|---|---|---|---|
| j = 1 | 1 ('A') | LPS[0] = 0 | 1 - 0 = 1 | pattern[0] |
| j = 2 | 2 ('AB') | LPS[1] = 0 | 2 - 0 = 2 | pattern[0] |
| j = 3 | 3 ('ABC') | LPS[2] = 0 | 3 - 0 = 3 | pattern[0] |
| j = 4 | 4 ('ABCA') | LPS[3] = 1 | 4 - 1 = 3 | pattern[1] |
| j = 5 | 5 ('ABCAB') | LPS[4] = 2 | 5 - 2 = 3 | pattern[2] |
The shift might seem aggressive. How do we know we're not skipping over a valid match? Let's prove this rigorously.
Theorem: When KMP shifts the pattern after a mismatch at position j, any starting position skipped over cannot yield a complete match.
Proof:
Suppose we matched pattern[0..j-1] against text[i-j..i-1], then mismatched at position j. Let LPS[j-1] = k.
We shift from starting position (i-j) to starting position (i-k). This skips starting positions (i-j+1), (i-j+2), ..., (i-k-1).
Claim: None of these skipped positions can be the start of a valid match.
Proof of claim:
Consider any skipped starting position s where (i-j) < s < (i-k), i.e., s = (i-j+t) for some 0 < t < (j-k).
For the pattern to match starting at position s:
But we know text[i-j+t .. i-1] = pattern[t .. j-1] (since these text characters matched this part of the pattern).
So for a match at position s, we'd need:
pattern[0 .. j-t-1] = pattern[t .. j-1]
This says: a prefix of length (j-t) equals a suffix of pattern[0..j-1] of length (j-t).
But k = LPS[j-1] is the LONGEST such prefix-suffix! If there were a longer one of length (j-t) > k, the LPS value would be wrong.
Since t < (j-k), we have (j-t) > k, which would contradict LPS[j-1] = k being maximal.
Therefore, no skipped position can be a valid match start. ∎
The proof shows that LPS being the LONGEST proper prefix-suffix is essential. Any longer overlap would mean we could safely skip less (but we don't need to). Any shorter overlap would risk missing matches. The LPS value is exactly right.
We've shown KMP's skip is safe. But is it the maximum safe skip? Yes—and here's why.
Theorem: The shift to position (i - LPS[j-1]) is the maximum safe shift.
Proof:
Suppose we tried to shift further, to starting position (i - k + 1) or beyond, for some valid k.
At the new starting position (i - k), we need pattern[0..k-1] to match text[i-k..i-1].
Since LPS[j-1] = k, by definition pattern[0..k-1] = pattern[j-k..j-1].
And we know pattern[j-k..j-1] matched text[i-k..i-1].
So pattern[0..k-1] DOES match text[i-k..i-1].
This alignment is VALID and CANNOT be skipped.
Therefore, we must land at position (i-k) and cannot go further. ∎
The key insight:
LPS[j-1] = k tells us there IS a prefix of length k that matches. We MUST check this position—it might lead to a complete match. So we can't skip it.
But we also know from the previous proof that anything between (i-j+1) and (i-k-1) CAN'T match. So we skip exactly those positions.
The skip is maximum (we skip everything that's provably useless) and safe (we don't skip anything that could match).
Let's trace through complete examples to see the skipping mechanism in action.
Example 1: Pattern "ABABAC" in Text "ABABABAC"
First, compute LPS for "ABABAC":
Pattern: A B A B A C
Index: 0 1 2 3 4 5
LPS: 0 0 1 2 3 0
Pattern: ABABAC (LPS = [0, 0, 1, 2, 3, 0])Text: ABABABAC STEP 1: Compare from beginningText: A B A B A B A CPattern: A B A B A C ✓ ✓ ✓ ✓ ✓ ✗ Mismatch at j=5: pattern[5]='C' ≠ text[5]='B'Matched: "ABABA" (5 characters)LPS[4] = 3, so shift by (5 - 3) = 2 positionsResume at pattern[3] STEP 2: After shift (resume at pattern[3])Text: A B A B A B A C └─────────→ Text pointer still at position 5Pattern: A B A B A C 0 1 2 3 4 5 ↑ Resume here (j=3) Pattern[0..2] = "ABA" is ALREADY KNOWN to match text[2..4]Compare pattern[3]='B' vs text[5]='B' ✓Compare pattern[4]='A' vs text[6]='A' ✓Compare pattern[5]='C' vs text[7]='C' ✓ j = 6 = m, MATCH FOUND at position 7 - 6 = 2 ANALYSIS:- Naive would have tried positions 0, 1, 2 with fresh comparisons each time- KMP tried position 0, then jumped directly to position 2- Skipped position 1 entirely (proven impossible)- Reused 3 comparisons from first attempt (the "ABA" prefix)- Total comparisons: 6 + 3 = 9 instead of potentially 6 + 2 + 6 = 14+Example 2: Pattern "AAAB" in Text "AAAAAAAB"
LPS for "AAAB":
Pattern: A A A B
Index: 0 1 2 3
LPS: 0 1 2 0
Pattern: AAAB (LPS = [0, 1, 2, 0])Text: AAAAAAAB STEP 1:Text: A A A A A A A BPattern: A A A B ✓ ✓ ✓ ✗ Mismatch at j=3: 'B' ≠ 'A'LPS[2] = 2, shift by (3 - 2) = 1Resume at pattern[2] STEP 2:Text: A A A A A A A B ↑ (text position 3)Pattern: A A A B ↑ (j=2) Compare pattern[2]='A' vs text[3]='A' ✓Compare pattern[3]='B' vs text[4]='A' ✗ Mismatch at j=4? No wait, j would be 3 after the match.Actually: after matching pattern[2], j becomes 3, then:pattern[3]='B' vs text[4]='A' ✗ LPS[2] = 2, shift by (3 - 2) = 1Resume at pattern[2] STEP 3:Text: A A A A A A A B ↑ (text position 4) Pattern: A A A B ↑ (j=2) pattern[2]='A' vs text[4]='A' ✓pattern[3]='B' vs text[5]='A' ✗ LPS[2] = 2, shift again... STEP 4:Text: A A A A A A A B ↑ (text position 5)pattern[2]='A' vs text[5]='A' ✓pattern[3]='B' vs text[6]='A' ✗ STEP 5:Text: A A A A A A A B ↑ (text position 6)pattern[2]='A' vs text[6]='A' ✓pattern[3]='B' vs text[7]='B' ✓ MATCH FOUND at position 4 ANALYSIS:- Text has 8 characters, pattern has 4- Total comparisons: 3 + 2 + 2 + 2 + 2 = 11 (some of these are "reused")- Actually analyzing carefully: we compare each text char at most once with pattern[2] or [3], and the first 3 were normal comparisons- Naive would do: 4 + 4 + 4 + 4 + 4 = 20 comparisons (5 starting positions, each potentially matching 3 A's before failing)- KMP's guarantee: at most n + m = 12 comparisons (we did 11)Even in the 'AAAB' in 'AAAAAAAB' case—which is adversarial for naive search—KMP maintains its O(n+m) guarantee. The text pointer never moves backward, and the pattern pointer's total movement is bounded by n+m.
Sometimes a single LPS lookup isn't enough. After jumping to position k = LPS[j-1], we might immediately mismatch again. Do we need another LPS lookup?
Yes! And this is perfectly handled by the algorithm:
while j > 0 and text[i] != pattern[j]:
j = lps[j-1] # Keep falling back until we find a match or hit 0
This "cascading" of LPS lookups happens when the pattern has nested overlaps.
Example: Pattern "AABAAAB" searching in Text "AABAAACAABAAAB"
Pattern: A A B A A A B
Index: 0 1 2 3 4 5 6
LPS: 0 1 0 1 2 2 3
When we mismatch at position 6 after matching "AABAAA":
This cascade finds the next-best alignment opportunity at each step.
Pattern: AABAAAB (LPS = [0, 1, 0, 1, 2, 2, 3])Text: AABAAACAABAAAB STEP 1:Text: A A B A A A C A A B A A A BPattern: A A B A A A B ✓ ✓ ✓ ✓ ✓ ✓ ✗ Mismatch at j=6: 'B' ≠ 'C'LPS[5] = 2, so j becomes 2 STEP 2 (cascade check):j = 2, compare pattern[2]='B' vs text[6]='C' ✗Still mismatch! LPS[1] = 1, so j becomes 1 STEP 3 (cascade continues):j = 1, compare pattern[1]='A' vs text[6]='C' ✗Still mismatch! LPS[0] = 0, so j becomes 0 STEP 4:j = 0, compare pattern[0]='A' vs text[6]='C' ✗j = 0, so increment text pointer: i becomes 7 STEP 5:Now at text[7] = 'A', j = 0pattern[0]='A' vs text[7]='A' ✓Continue matching... Eventually finds match at position 7. NOTE: The cascade at position 6 tried pattern positions 2, 1, 0 in sequence.Each is a valid "fallback" alignment. We exhausted all possibilities before moving the text pointer.You might worry that cascading LPS lookups could cause O(m) work per text character. But the amortized analysis shows this is fine: each decrease in j from cascading corresponds to a previous increase that we 'paid for' already. Total j increases ≤ n, so total j decreases ≤ n, giving O(n) overall for all cascades.
Let's visualize exactly how the pattern "moves" and which starting positions are evaluated vs skipped.
Pattern: "ABAB" in Text: "ABABABAB"
LPS for ABAB = [0, 0, 1, 2]
Text: A B A B A B A BIndex: 0 1 2 3 4 5 6 7 ═══════════════════════════════════════════════════════════════ ALIGNMENT 1: Start at position 0Text: A B A B A B A BPattern: A B A B ✓ ✓ ✓ ✓ → MATCH at position 0! After match, j = LPS[3] = 2, keep text pointer at i = 4 ═══════════════════════════════════════════════════════════════ ALIGNMENT 2: Implied start at position 2 (4 - 2 = 2)Text: A B A B A B A BPattern: A B A B ↑ ↑ │ └── resume comparing here (j=2) └──── these are known to match (AB = AB) Compare pattern[2]='A' vs text[4]='A' ✓Compare pattern[3]='B' vs text[5]='B' ✓MATCH at position 2! After match, j = LPS[3] = 2, keep text pointer at i = 6 ═══════════════════════════════════════════════════════════════ ALIGNMENT 3: Implied start at position 4 (6 - 2 = 4)Text: A B A B A B A BPattern: A B A B Compare pattern[2]='A' vs text[6]='A' ✓Compare pattern[3]='B' vs text[7]='B' ✓MATCH at position 4! After match, j = LPS[3] = 2, but i = 8 = n, so we're done. ═══════════════════════════════════════════════════════════════ SUMMARY:Starting positions checked: 0, 2, 4Starting positions SKIPPED: 1, 3 ← proven impossible by LPS structureMatches found: 3 (at positions 0, 2, 4)Pattern overlaps with itself in text at AB boundaries| Position | Status | Why |
|---|---|---|
| 0 | Checked ✓ | Initial position, always checked |
| 1 | Skipped | No valid prefix-suffix enables alignment here |
| 2 | Checked ✓ | LPS[3]=2 means pattern[0..1] matches text[2..3] |
| 3 | Skipped | No valid prefix-suffix enables alignment here |
| 4 | Checked ✓ | LPS[3]=2 means pattern[0..1] matches text[4..5] |
Notice how KMP naturally finds all three overlapping matches. After each match, the LPS value tells us exactly where the next potential match could start—which in this periodic pattern is every 2 positions.
KMP maintains a crucial invariant that makes reasoning about correctness straightforward:
Invariant: At every moment during execution, if the pattern pointer is at position j and the text pointer is at position i, then:
text[i-j .. i-1] = pattern[0 .. j-1]In other words, the last j characters we've "confirmed" in the text exactly match the first j characters of the pattern.
This invariant is:
This invariant is the key to understanding KMP's correctness. At any point, the state (i, j) encodes complete information about what we've matched. We never 're-compare' text we've advanced past, and the pattern prefix we've 'accumulated' in j is always valid.
State as Accumulated Knowledge:
Think of j as representing accumulated knowledge about the text we've processed:
The LPS array tells us: "If you have knowledge level j and see a non-matching character, what's the best knowledge level you can preserve?" The answer is LPS[j-1]—the longest prefix of pattern that we can still claim matches a suffix of what we've seen.
Pattern: ABCAB (LPS = [0, 0, 0, 1, 2])Text: XYZABCABCABD Trace with invariant verification: State (i=0, j=0): Invariant: text[-0..-1] = "" = pattern[-0..-1] ✓ (vacuous)Compare text[0]='X' vs pattern[0]='A': mismatchj = 0, so i++ → (i=1, j=0) State (i=1, j=0): Invariant holds (vacuous)Compare text[1]='Y' vs pattern[0]='A': mismatchi++ → (i=2, j=0) State (i=2, j=0): Compare 'Z' vs 'A': mismatchi++ → (i=3, j=0) State (i=3, j=0): Compare 'A' vs 'A': match!i++, j++ → (i=4, j=1)Invariant: text[3..3] = "A" = pattern[0..0] ✓ State (i=4, j=1): Compare 'B' vs 'B': match!→ (i=5, j=2)Invariant: text[3..4] = "AB" = pattern[0..1] ✓ State (i=5, j=2): Compare 'C' vs 'C': match!→ (i=6, j=3)Invariant: text[3..5] = "ABC" = pattern[0..2] ✓ State (i=6, j=3): Compare 'A' vs 'A': match!→ (i=7, j=4)Invariant: text[3..6] = "ABCA" = pattern[0..3] ✓ State (i=7, j=4): Compare 'B' vs 'B': match!→ (i=8, j=5)j = 5 = m, MATCH FOUND at position 3! j = LPS[4] = 2 → (i=8, j=2)Invariant: text[6..7] = "AB" = pattern[0..1] ✓ ...and continuesWe've now fully understood the mechanism by which the LPS array enables efficient skipping. Let's consolidate:
What's next:
We've seen how the LPS array enables efficient skipping during matching. But we haven't yet shown how to build the LPS array efficiently. The next page covers this—and the beautiful insight is that building the LPS array uses the same technique as matching, applied to the pattern matching against itself!
You now fully understand the skipping mechanism in KMP: how LPS values translate to shift distances, why each shift is simultaneously safe and maximal, how cascading handles complex patterns, and how the invariant ensures correctness. The matching algorithm is now completely transparent to you.