Data Structures & AlgorithmsKMP Algorithm

KMP Algorithm — Failure Function & Matching

LevelIntermediate

Duration75 mins

TopicKMP Algorithm

2 / 5

The Failure (LPS) Array

The Heart of KMP

The KMP algorithm's power is concentrated in a single data structure: the failure function, also known as the LPS array (Longest Proper Prefix which is also Suffix). This array, computed during preprocessing, encodes everything the algorithm needs to know about the pattern's internal structure.

Understanding the LPS array is essential because:

It's used directly during pattern matching to determine where to resume after a mismatch
It's the sole output of the preprocessing phase
Its construction uses the same insight recursively—an elegant self-reference that reveals deep structure

Once you truly understand the LPS array, the entire KMP algorithm becomes transparent.

What You Will Learn

By the end of this page, you will understand exactly what the LPS array represents, how to compute its values by hand, and how to interpret its values during matching. You'll see multiple examples and develop intuition for the array's structure.

Formal Definition of the LPS Array

Let's establish precise definitions before we explore examples.

Definitions:

Pattern P: A string of length m, indexed from 0 to m-1
Prefix of P[0..i]: Any substring P[0..k] where 0 ≤ k ≤ i
Suffix of P[0..i]: Any substring P[j..i] where 0 ≤ j ≤ i
Proper prefix/suffix: Excludes the entire string P[0..i] itself (k < i for prefix, j > 0 for suffix)
LPS[i]: The length of the longest proper prefix of P[0..i] that is also a suffix of P[0..i]

Why 'Proper'?

We require 'proper' prefix/suffix (not the whole string) because any string is trivially both a prefix and suffix of itself. If we allowed improper matches, LPS[i] would always equal i+1, which tells us nothing useful. The proper constraint forces us to find genuine internal repetition.

Formal Definition:

LPS[i] = max { k : P[0..k-1] = P[i-k+1..i] and k < i+1 }

In words: LPS[i] is the length of the longest string that is both:

A prefix of the pattern (starts at index 0)
A suffix of P[0..i] (ends at index i)
Strictly shorter than the entire substring P[0..i]

If no non-trivial match exists, LPS[i] = 0.

LPS Array Dimensions
Property	Value	Explanation
Size	m (pattern length)	One entry for each pattern position
LPS[i] range	0 to i	Can't exceed i (that would be the whole substring)
LPS[0]	Always 0	Single character has no proper prefix-suffix
LPS[i] meaning	Resume position after mismatch	If mismatch at i+1, resume comparing at LPS[i]

Computing LPS Values By Hand

Let's compute LPS arrays for several patterns to build intuition. We'll be thorough and methodical.

Example 1: Pattern "ABCAB"

Index:     0   1   2   3   4
Pattern:   A   B   C   A   B

LPS[0]: Consider P[0..0] = "A"

Proper prefixes: "" (empty)
Proper suffixes: "" (empty)
Common: "" (length 0)
LPS[0] = 0

LPS[1]: Consider P[0..1] = "AB"

Proper prefixes: "", "A"
Proper suffixes: "", "B"
Common: "" only
LPS[1] = 0

LPS[2]: Consider P[0..2] = "ABC"

Proper prefixes: "", "A", "AB"
Proper suffixes: "", "C", "BC"
"A" ≠ "C", "AB" ≠ "BC"
Common: "" only
LPS[2] = 0

LPS[3]: Consider P[0..3] = "ABCA"

Proper prefixes: "", "A", "AB", "ABC"
Proper suffixes: "", "A", "CA", "BCA"
"A" = "A" ✓ (length 1)
"AB" ≠ "CA"
LPS[3] = 1

LPS[4]: Consider P[0..4] = "ABCAB"

Proper prefixes: "", "A", "AB", "ABC", "ABCA"
Proper suffixes: "", "B", "AB", "CAB", "BCAB"
"A" ≠ "B"
"AB" = "AB" ✓ (length 2)
"ABC" ≠ "CAB"
LPS[4] = 2

lps_example_1.txt
Pattern: A B C A B
Index:   0 1 2 3 4
LPS:     0 0 0 1 2
 
Interpretation:
- LPS[3] = 1: "ABCA" has "A" as a proper prefix that equals its suffix
- LPS[4] = 2: "ABCAB" has "AB" as a proper prefix that equals its suffix

Example 2: Pattern "AAAA" (highly repetitive)

Index:     0   1   2   3
Pattern:   A   A   A   A

LPS[0]: P[0..0] = "A" → No proper prefix-suffix → LPS[0] = 0

LPS[1]: P[0..1] = "AA"

Proper prefix "A" = Proper suffix "A" ✓
LPS[1] = 1

LPS[2]: P[0..2] = "AAA"

"A" = "A" ✓ (length 1)
"AA" = "AA" ✓ (length 2) ← longest!
LPS[2] = 2

LPS[3]: P[0..3] = "AAAA"

"A" = "A" ✓ (length 1)
"AA" = "AA" ✓ (length 2)
"AAA" = "AAA" ✓ (length 3) ← longest!
LPS[3] = 3

lps_example_2.txt
Pattern: A A A A
Index:   0 1 2 3
LPS:     0 1 2 3
 
Interpretation:
This pattern has maximum possible LPS values (except LPS[0] which is always 0).
Each position i has LPS[i] = i, meaning:
- "AA" has 1-overlap
- "AAA" has 2-overlap  
- "AAAA" has 3-overlap
 
This allows aggressive shifting during mismatch - after matching k A's and hitting 
a mismatch, we can resume at position k-1 immediately!

Example 3: Pattern "ABCDABD"

Index:     0   1   2   3   4   5   6
Pattern:   A   B   C   D   A   B   D

I'll compute each value:

LPS[0] = 0 ("A" - no proper prefix-suffix)
LPS[1] = 0 ("AB" - "A" ≠ "B")
LPS[2] = 0 ("ABC" - no match)
LPS[3] = 0 ("ABCD" - no match)
LPS[4] = 1 ("ABCDA" - prefix "A" = suffix "A")
LPS[5] = 2 ("ABCDAB" - prefix "AB" = suffix "AB")
LPS[6] = 0 ("ABCDABD" - "A" ≠ "D", "AB" ≠ "BD", "ABD" ≠ "ABD"... wait, let's check: prefix "ABD" doesn't exist. So no match.)

lps_example_3.txt
Pattern: A B C D A B D
Index:   0 1 2 3 4 5 6
LPS:     0 0 0 0 1 2 0
 
Interpretation:
- The pattern has no repetition until position 4, where "A" reappears
- Positions 4-5 build up an overlap ("A", then "AB")
- Position 6 breaks the pattern (D ≠ C), so overlap reset to 0
 
This means: if we mismatch at position 6, we can resume at position 2 (LPS[5]=2)
But if we mismatch at position 6 and position 0 works, we restart from 0.

Visualizing LPS Structure

The LPS array reveals the internal symmetry of a pattern. Let's visualize what LPS values tell us about pattern structure.

Pattern: "ABABAB" (periodic pattern)

Index:     0   1   2   3   4   5
Pattern:   A   B   A   B   A   B

         ┌───────────────────────┐
         │ A │ B │ A │ B │ A │ B │
         └───────────────────────┘
           ↑       ↑       ↑
           └───────────────│───── These A's at positions 0,2,4 are "aligned"
                   ↑       ↑
                   └───────│───── These AB's are "aligned" prefixes/suffixes
                           ↑
                     ABAB's are aligned at positions 0-3 and 2-5

LPS:     0   0   1   2   3   4

LPS[2]=1: prefix "A" matches suffix "A" of "ABA"
LPS[3]=2: prefix "AB" matches suffix "AB" of "ABAB"
LPS[4]=3: prefix "ABA" matches suffix "ABA" of "ABABA"
LPS[5]=4: prefix "ABAB" matches suffix "ABAB" of "ABABAB"

Periodic Patterns and LPS

A pattern with period p (repeating every p characters) will have LPS[i] = i - p + 1 for i ≥ p. The LPS array effectively encodes the period length! For 'ABABAB', period = 2, and we see LPS[5] = 5 - 2 + 1 = 4 ✓

Pattern: "AABAAAB" (mixed repetition)

Index:     0   1   2   3   4   5   6
Pattern:   A   A   B   A   A   A   B

LPS[0] = 0   "A" - no proper prefix-suffix
LPS[1] = 1   "AA" - prefix "A" = suffix "A"
LPS[2] = 0   "AAB" - neither "A" nor "AA" equals "B" or "AB"
LPS[3] = 1   "AABA" - prefix "A" = suffix "A"
LPS[4] = 2   "AABAA" - prefix "AA" = suffix "AA"
LPS[5] = 2   "AABAAA" - prefix "AA" = suffix "AA" 
             (not "AAA" because prefix "AAA" doesn't exist - pattern is AAB...)
LPS[6] = 3   "AABAAAB" - prefix "AAB" = suffix "AAB"

         A A B A A A B
         └─┬─┘     └─┬─┘
          AAB  =   AAB   → LPS[6] = 3

lps_visual.txt
Pattern: A A B A A A B
Index:   0 1 2 3 4 5 6
LPS:     0 1 0 1 2 2 3
 
LPS Visualization (showing what overlaps):
 
Position 1 (LPS=1):  [A|A]      prefix "A" = suffix "A"
Position 4 (LPS=2):  [A A|B A A]  prefix "AA" = suffix "AA"  
Position 6 (LPS=3):  [A A B|A A A B]  prefix "AAB" = suffix "AAB"
 
The LPS values form a kind of "contour" showing how structure builds up:
  Index: 0 1 2 3 4 5 6
  LPS:   0 1 0 1 2 2 3
         └┬┘ └┬┘ └─┬─┘
          Pattern resets and rebuilds overlap multiple times

LPS Patterns and Their Meaning
LPS Pattern	Example Pattern	Interpretation
All zeros	ABCDEF	No internal repetition - every mismatch restarts at 0
Increasing 0,1,2,3...	AAAA	Highly repetitive - aggressive shifts possible
Saw-tooth (up then down)	ABCABD	Partial repetition breaks mid-way
Periodic staircase	ABABAB	Regular periodic pattern detected
Sparse non-zeros	ABXABY	Limited overlap opportunities

How LPS Guides Pattern Matching

Now let's see exactly how the LPS array is used during pattern matching. The rule is simple:

When you mismatch at pattern position j (after matching positions 0 through j-1), look up LPS[j-1]. That's your new pattern position to resume comparison.

Why LPS[j-1]? Because LPS[j-1] tells us the longest prefix of the pattern that we know still matches the text (since it equals a suffix of what we just matched). We resume comparing from position LPS[j-1] in the pattern.

Complete Trace Example:

Pattern: "ABCDABD" with LPS = [0,0,0,0,1,2,0] Text: "ABCDABCDABD"

Step 1: Align at position 0
Text:    A B C D A B C D A B D
Pattern: A B C D A B D
         ✓ ✓ ✓ ✓ ✓ ✓ ✗   (mismatch at j=6: pattern[6]='D', text[6]='C')

j = 6, so we look up LPS[5] = 2
This means: the prefix pattern[0..1] = "AB" matches what we just saw at text[4..5]

Step 2: Resume with j = 2 (NOT shift pattern, keep text pointer!)
Text:    A B C D A B C D A B D
         - - - - - - ↑     
                     │     We're still at text position 6
Pattern:         A B C D A B D
                     ↑
                     Compare pattern[2]='C' vs text[6]='C' ✓

then: pattern[3]='D' vs text[7]='D' ✓
then: pattern[4]='A' vs text[8]='A' ✓
then: pattern[5]='B' vs text[9]='B' ✓
then: pattern[6]='D' vs text[10]='D' ✓

j reached 7 = m, MATCH FOUND at position 10 - 7 + 1 = 4

The Text Pointer Never Moves Back

Crucially, notice that we never re-examine text positions 0-5. After the mismatch at text position 6, we stay at position 6 and just change our pattern position. The text pointer only moves forward, guaranteeing O(n) text traversal.

The Matching Algorithm Using LPS:

function kmp_search(text, pattern, lps):
    n = length(text)
    m = length(pattern)
    i = 0  // text pointer
    j = 0  // pattern pointer
    
    while i < n:
        if text[i] == pattern[j]:
            i += 1
            j += 1
            if j == m:  // full match!
                report_match(i - m)
                j = lps[j-1]  // look for next occurrence
        else:
            if j > 0:
                j = lps[j-1]  // use LPS to skip
            else:
                i += 1  // no prefix matches, move text pointer

The elegance is in the j = lps[j-1] line. When we mismatch:

If j > 0, we've matched some characters. The longest prefix we can preserve is lps[j-1].
If j = 0, even the first character didn't match. Move to the next text position.

LPS Lookup During Mismatch
When Mismatch At	LPS Value Used	Action
j = 0	N/A	Increment text pointer (i++), pattern stays at 0
j > 0	LPS[j-1]	Set j = LPS[j-1], keep text pointer unchanged
After full match (j = m)	LPS[m-1]	Report match, then j = LPS[m-1] to find overlapping matches

Edge Cases and Special Patterns

Understanding how LPS behaves in edge cases deepens our understanding and helps debug implementations.

Case 1: Pattern with no repetition ("ABCDEF")

Pattern: A B C D E F
LPS:     0 0 0 0 0 0

Behavior: Any mismatch at position j > 0 sends us to LPS[j-1] = 0.
This degenerates to naive behavior, but KMP still works correctly.
No benefit from LPS, but no penalty either.

Case 2: Single character pattern ("A")

Pattern: A
LPS:     0

When matching: Compare text[i] with 'A'.
- Match: i++, j++ → j=1=m, report match, j = LPS[0] = 0
- Mismatch: j=0, so i++

This is just a linear scan - O(n) as expected.

Case 3: All same character ("AAAA")

Pattern: A A A A
LPS:     0 1 2 3

Searching in "AAAAAAAA":

Step 1: Match A A A A at positions 0-3. j=4=m, match at 0!
        j = LPS[3] = 3 (we know text[1..3] = "AAA" = pattern[0..2])
        
Step 2: Compare pattern[3]='A' with text[4]='A' ✓
        j=4=m, match at 1!
        j = LPS[3] = 3
        
Step 3: Compare pattern[3]='A' with text[5]='A' ✓
        j=4=m, match at 2!
        ...and so on

We find all 5 matches (at positions 0,1,2,3,4) in O(n) time!
Without KMP, naive would take O(n×m) to find overlapping matches.

Overlapping Matches Handled Correctly

The LPS array naturally handles overlapping matches. After finding a match, j = LPS[m-1] positions us to continue looking for a new match that might overlap with the one we just found. This is a major advantage over simpler algorithms that might miss overlapping occurrences.

Case 4: Pattern longer than text

Pattern: "ABCDEFGH" (length 8)
Text: "ABC" (length 3)

The algorithm simply never finds j = m = 8, so it reports no matches.
No special handling needed - the loop terminates when i reaches n = 3.

Case 5: Nested repetition ("AABAAB")

Pattern: A A B A A B
Index:   0 1 2 3 4 5

LPS[0] = 0  ("A")
LPS[1] = 1  ("AA" → "A" = "A")
LPS[2] = 0  ("AAB" → no match)
LPS[3] = 1  ("AABA" → "A" = "A")
LPS[4] = 2  ("AABAA" → "AA" = "AA")
LPS[5] = 3  ("AABAAB" → "AAB" = "AAB")

LPS: 0 1 0 1 2 3

Notice LPS[5] = 3, meaning half the pattern ("AAB") is a valid prefix-suffix.
After a full match, we can resume at position 3, effectively checking for
 patterns that share "AAB" with the previous occurrence.

The LPS Array as a State Machine

There's another way to understand the LPS array that provides deeper insight: the pattern matcher is a finite automaton, and the LPS array encodes its failure transitions.

State Machine View:

States: 0, 1, 2, ..., m (where state j means "we've matched pattern[0..j-1]")
Start state: 0
Accepting state: m (full match)
Success transition: From state j on character pattern[j], go to j+1
Failure transition: From state j on any other character, go to LPS[j-1] (or 0 if j=0)

kmp_automaton.txt
Pattern: "ABABC" with LPS = [0, 0, 1, 2, 0]
 
State Diagram:
 
     A          B          A          B          C
[0] ───→ [1] ───→ [2] ───→ [3] ───→ [4] ───→ [5] (Accept!)
 │        │        │        │        │
 │        │        │        │        └── on anything except C: → state LPS[4-1] = LPS[3] = 2
 │        │        │        │           (we know "AB" still matches)
 │        │        │        │
 │        │        │        └─────────── on anything except B: → state LPS[3-1] = LPS[2] = 1  
 │        │        │                     (we know "A" still matches)
 │        │        │
 │        │        └──────────────────── on anything except A: → state LPS[2-1] = LPS[1] = 0
 │        │                              (no prefix matches)
 │        │
 │        └───────────────────────────── on anything except B: → state LPS[1-1] = LPS[0] = 0
 │                                        
 └────────────────────────────────────── on anything except A: stay at state 0

Why the State Machine View Matters:

Determinism: At any state, there's exactly one transition for each possible input character. The KMP automaton is a deterministic finite automaton (DFA).
O(1) Per Character: Once we've built the automaton (computed LPS), each text character triggers exactly one state transition. This is why KMP is O(n) for matching.
Streaming Capability: We can process the text as a stream. At any point, our state tells us how much of the pattern we've matched. This makes KMP ideal for real-time applications.
Pattern Independence: The state machine is determined entirely by the pattern. We can precompute it once and use it for multiple texts, or even in parallel.

KMP and Regular Expression Engines

Production regex engines use similar automaton-based approaches. The KMP automaton is a special case of a DFA for fixed string matching. More complex patterns require NFAs (non-deterministic automata), but the core idea—precompute pattern structure, then stream through text—remains the same.

Alternative Names and Notations

The LPS array goes by several names in different textbooks and contexts. Understanding these variations helps when reading other resources.

Common Names for the Same Concept:

Alternative Names for the LPS Array
Name	Full Form	Context/Source
LPS Array	Longest Proper Prefix which is also Suffix	Common in competitive programming
Failure Function	f(j) or fail[]	Original KMP paper, academic literature
Prefix Function	π (pi) function	CLRS textbook, theoretical CS
Partial Match Table	—	Some implementation guides
Border Array	—	String algorithmics literature
Next Array	next[]	Some implementations (offset by 1)

Watch for Off-by-One Variations

Some sources define the failure function with different indexing. Some use f(j) = LPS[j-1], others define next[j] = LPS[j] + 1, etc. When reading code or papers, always check the exact definition being used. The core concept is the same, but indexing variations can cause subtle bugs.

The "Border" Terminology:

In formal string algorithmics, a border of string S is a string that is both a proper prefix and proper suffix of S. The LPS value at position i is the length of the longest border of pattern[0..i].

This terminology makes some theorems cleaner to state:

"Every string has a unique sequence of borders"
"The borders of a string form a chain"
"The LPS array encodes the border chain for all prefixes"

We'll use "LPS" in this course for consistency, but know that "failure function," "prefix function," and "border array" all refer to the same structure.

Summary: The LPS Array

We've thoroughly examined the LPS (failure function) array—the central data structure of the KMP algorithm. Let's consolidate our understanding:

Key Takeaways

•LPS[i] = length of longest proper prefix of pattern[0..i] that equals a suffix — this single value encodes the overlap structure.
•LPS[0] is always 0 — a single character has no proper prefix-suffix.
•LPS enables intelligent skipping — after mismatch at j, resume at LPS[j-1] instead of 0.
•The LPS array is computed once, used forever — O(m) preprocessing paying off across arbitrarily long texts.
•LPS can be visualized as a state machine — each pattern position is a state, LPS encodes failure transitions.
•LPS handles overlapping matches naturally — after a full match, LPS[m-1] positions us to find overlaps.

What's next:

We've seen what the LPS array contains and how it's used during matching. But we haven't yet seen how to compute the LPS array efficiently. A naive approach (checking all prefixes against all suffixes) would take O(m²) or even O(m³) time, undermining the whole point of KMP.

The next page reveals the elegant algorithm for building the LPS array in O(m) time—using the same insight recursively that makes KMP matching efficient.

Page Complete

You now understand the LPS array deeply: what it represents, how to compute values by hand, how it guides the matching process, and how it can be viewed as encoding a finite automaton. This foundation prepares you to understand the elegant recursive algorithm for building the LPS array efficiently.

2 / 5

Loading learning content...

Data Structures & AlgorithmsKMP Algorithm

KMP Algorithm — Failure Function & Matching

LevelIntermediate

Duration75 mins

TopicKMP Algorithm

2 / 5

The Failure (LPS) Array

The Heart of KMP

Understanding the LPS array is essential because:

It's used directly during pattern matching to determine where to resume after a mismatch
It's the sole output of the preprocessing phase
Its construction uses the same insight recursively—an elegant self-reference that reveals deep structure

Once you truly understand the LPS array, the entire KMP algorithm becomes transparent.

What You Will Learn

Formal Definition of the LPS Array

Let's establish precise definitions before we explore examples.

Definitions:

Pattern P: A string of length m, indexed from 0 to m-1
Prefix of P[0..i]: Any substring P[0..k] where 0 ≤ k ≤ i
Suffix of P[0..i]: Any substring P[j..i] where 0 ≤ j ≤ i
Proper prefix/suffix: Excludes the entire string P[0..i] itself (k < i for prefix, j > 0 for suffix)
LPS[i]: The length of the longest proper prefix of P[0..i] that is also a suffix of P[0..i]

Why 'Proper'?

Formal Definition:

LPS[i] = max { k : P[0..k-1] = P[i-k+1..i] and k < i+1 }

In words: LPS[i] is the length of the longest string that is both:

A prefix of the pattern (starts at index 0)
A suffix of P[0..i] (ends at index i)
Strictly shorter than the entire substring P[0..i]

If no non-trivial match exists, LPS[i] = 0.

LPS Array Dimensions
Property	Value	Explanation
Size	m (pattern length)	One entry for each pattern position
LPS[i] range	0 to i	Can't exceed i (that would be the whole substring)
LPS[0]	Always 0	Single character has no proper prefix-suffix
LPS[i] meaning	Resume position after mismatch	If mismatch at i+1, resume comparing at LPS[i]

Computing LPS Values By Hand

Let's compute LPS arrays for several patterns to build intuition. We'll be thorough and methodical.

Example 1: Pattern "ABCAB"

Index:     0   1   2   3   4
Pattern:   A   B   C   A   B

LPS[0]: Consider P[0..0] = "A"

Proper prefixes: "" (empty)
Proper suffixes: "" (empty)
Common: "" (length 0)
LPS[0] = 0

LPS[1]: Consider P[0..1] = "AB"

Proper prefixes: "", "A"
Proper suffixes: "", "B"
Common: "" only
LPS[1] = 0

LPS[2]: Consider P[0..2] = "ABC"

Proper prefixes: "", "A", "AB"
Proper suffixes: "", "C", "BC"
"A" ≠ "C", "AB" ≠ "BC"
Common: "" only
LPS[2] = 0

LPS[3]: Consider P[0..3] = "ABCA"

Proper prefixes: "", "A", "AB", "ABC"
Proper suffixes: "", "A", "CA", "BCA"
"A" = "A" ✓ (length 1)
"AB" ≠ "CA"
LPS[3] = 1

LPS[4]: Consider P[0..4] = "ABCAB"

Proper prefixes: "", "A", "AB", "ABC", "ABCA"
Proper suffixes: "", "B", "AB", "CAB", "BCAB"
"A" ≠ "B"
"AB" = "AB" ✓ (length 2)
"ABC" ≠ "CAB"
LPS[4] = 2

lps_example_1.txt
Pattern: A B C A B
Index:   0 1 2 3 4
LPS:     0 0 0 1 2
 
Interpretation:
- LPS[3] = 1: "ABCA" has "A" as a proper prefix that equals its suffix
- LPS[4] = 2: "ABCAB" has "AB" as a proper prefix that equals its suffix

Example 2: Pattern "AAAA" (highly repetitive)

Index:     0   1   2   3
Pattern:   A   A   A   A

LPS[0]: P[0..0] = "A" → No proper prefix-suffix → LPS[0] = 0

LPS[1]: P[0..1] = "AA"

Proper prefix "A" = Proper suffix "A" ✓
LPS[1] = 1

LPS[2]: P[0..2] = "AAA"

"A" = "A" ✓ (length 1)
"AA" = "AA" ✓ (length 2) ← longest!
LPS[2] = 2

LPS[3]: P[0..3] = "AAAA"

"A" = "A" ✓ (length 1)
"AA" = "AA" ✓ (length 2)
"AAA" = "AAA" ✓ (length 3) ← longest!
LPS[3] = 3

lps_example_2.txt
Pattern: A A A A
Index:   0 1 2 3
LPS:     0 1 2 3
 
Interpretation:
This pattern has maximum possible LPS values (except LPS[0] which is always 0).
Each position i has LPS[i] = i, meaning:
- "AA" has 1-overlap
- "AAA" has 2-overlap  
- "AAAA" has 3-overlap
 
This allows aggressive shifting during mismatch - after matching k A's and hitting 
a mismatch, we can resume at position k-1 immediately!

Example 3: Pattern "ABCDABD"

Index:     0   1   2   3   4   5   6
Pattern:   A   B   C   D   A   B   D

I'll compute each value:

LPS[0] = 0 ("A" - no proper prefix-suffix)
LPS[1] = 0 ("AB" - "A" ≠ "B")
LPS[2] = 0 ("ABC" - no match)
LPS[3] = 0 ("ABCD" - no match)
LPS[4] = 1 ("ABCDA" - prefix "A" = suffix "A")
LPS[5] = 2 ("ABCDAB" - prefix "AB" = suffix "AB")
LPS[6] = 0 ("ABCDABD" - "A" ≠ "D", "AB" ≠ "BD", "ABD" ≠ "ABD"... wait, let's check: prefix "ABD" doesn't exist. So no match.)

lps_example_3.txt
Pattern: A B C D A B D
Index:   0 1 2 3 4 5 6
LPS:     0 0 0 0 1 2 0
 
Interpretation:
- The pattern has no repetition until position 4, where "A" reappears
- Positions 4-5 build up an overlap ("A", then "AB")
- Position 6 breaks the pattern (D ≠ C), so overlap reset to 0
 
This means: if we mismatch at position 6, we can resume at position 2 (LPS[5]=2)
But if we mismatch at position 6 and position 0 works, we restart from 0.

Visualizing LPS Structure

The LPS array reveals the internal symmetry of a pattern. Let's visualize what LPS values tell us about pattern structure.

Pattern: "ABABAB" (periodic pattern)

Index:     0   1   2   3   4   5
Pattern:   A   B   A   B   A   B

         ┌───────────────────────┐
         │ A │ B │ A │ B │ A │ B │
         └───────────────────────┘
           ↑       ↑       ↑
           └───────────────│───── These A's at positions 0,2,4 are "aligned"
                   ↑       ↑
                   └───────│───── These AB's are "aligned" prefixes/suffixes
                           ↑
                     ABAB's are aligned at positions 0-3 and 2-5

LPS:     0   0   1   2   3   4

LPS[2]=1: prefix "A" matches suffix "A" of "ABA"
LPS[3]=2: prefix "AB" matches suffix "AB" of "ABAB"
LPS[4]=3: prefix "ABA" matches suffix "ABA" of "ABABA"
LPS[5]=4: prefix "ABAB" matches suffix "ABAB" of "ABABAB"

Periodic Patterns and LPS

Pattern: "AABAAAB" (mixed repetition)

Index:     0   1   2   3   4   5   6
Pattern:   A   A   B   A   A   A   B

LPS[0] = 0   "A" - no proper prefix-suffix
LPS[1] = 1   "AA" - prefix "A" = suffix "A"
LPS[2] = 0   "AAB" - neither "A" nor "AA" equals "B" or "AB"
LPS[3] = 1   "AABA" - prefix "A" = suffix "A"
LPS[4] = 2   "AABAA" - prefix "AA" = suffix "AA"
LPS[5] = 2   "AABAAA" - prefix "AA" = suffix "AA" 
             (not "AAA" because prefix "AAA" doesn't exist - pattern is AAB...)
LPS[6] = 3   "AABAAAB" - prefix "AAB" = suffix "AAB"

         A A B A A A B
         └─┬─┘     └─┬─┘
          AAB  =   AAB   → LPS[6] = 3

lps_visual.txt
Pattern: A A B A A A B
Index:   0 1 2 3 4 5 6
LPS:     0 1 0 1 2 2 3
 
LPS Visualization (showing what overlaps):
 
Position 1 (LPS=1):  [A|A]      prefix "A" = suffix "A"
Position 4 (LPS=2):  [A A|B A A]  prefix "AA" = suffix "AA"  
Position 6 (LPS=3):  [A A B|A A A B]  prefix "AAB" = suffix "AAB"
 
The LPS values form a kind of "contour" showing how structure builds up:
  Index: 0 1 2 3 4 5 6
  LPS:   0 1 0 1 2 2 3
         └┬┘ └┬┘ └─┬─┘
          Pattern resets and rebuilds overlap multiple times

LPS Patterns and Their Meaning
LPS Pattern	Example Pattern	Interpretation
All zeros	ABCDEF	No internal repetition - every mismatch restarts at 0
Increasing 0,1,2,3...	AAAA	Highly repetitive - aggressive shifts possible
Saw-tooth (up then down)	ABCABD	Partial repetition breaks mid-way
Periodic staircase	ABABAB	Regular periodic pattern detected
Sparse non-zeros	ABXABY	Limited overlap opportunities

How LPS Guides Pattern Matching

Now let's see exactly how the LPS array is used during pattern matching. The rule is simple:

When you mismatch at pattern position j (after matching positions 0 through j-1), look up LPS[j-1]. That's your new pattern position to resume comparison.

Complete Trace Example:

Pattern: "ABCDABD" with LPS = [0,0,0,0,1,2,0] Text: "ABCDABCDABD"

Step 1: Align at position 0
Text:    A B C D A B C D A B D
Pattern: A B C D A B D
         ✓ ✓ ✓ ✓ ✓ ✓ ✗   (mismatch at j=6: pattern[6]='D', text[6]='C')

j = 6, so we look up LPS[5] = 2
This means: the prefix pattern[0..1] = "AB" matches what we just saw at text[4..5]

Step 2: Resume with j = 2 (NOT shift pattern, keep text pointer!)
Text:    A B C D A B C D A B D
         - - - - - - ↑     
                     │     We're still at text position 6
Pattern:         A B C D A B D
                     ↑
                     Compare pattern[2]='C' vs text[6]='C' ✓

then: pattern[3]='D' vs text[7]='D' ✓
then: pattern[4]='A' vs text[8]='A' ✓
then: pattern[5]='B' vs text[9]='B' ✓
then: pattern[6]='D' vs text[10]='D' ✓

j reached 7 = m, MATCH FOUND at position 10 - 7 + 1 = 4

The Text Pointer Never Moves Back

The Matching Algorithm Using LPS:

function kmp_search(text, pattern, lps):
    n = length(text)
    m = length(pattern)
    i = 0  // text pointer
    j = 0  // pattern pointer
    
    while i < n:
        if text[i] == pattern[j]:
            i += 1
            j += 1
            if j == m:  // full match!
                report_match(i - m)
                j = lps[j-1]  // look for next occurrence
        else:
            if j > 0:
                j = lps[j-1]  // use LPS to skip
            else:
                i += 1  // no prefix matches, move text pointer

The elegance is in the j = lps[j-1] line. When we mismatch:

If j > 0, we've matched some characters. The longest prefix we can preserve is lps[j-1].
If j = 0, even the first character didn't match. Move to the next text position.

LPS Lookup During Mismatch
When Mismatch At	LPS Value Used	Action
j = 0	N/A	Increment text pointer (i++), pattern stays at 0
j > 0	LPS[j-1]	Set j = LPS[j-1], keep text pointer unchanged
After full match (j = m)	LPS[m-1]	Report match, then j = LPS[m-1] to find overlapping matches

Edge Cases and Special Patterns

Understanding how LPS behaves in edge cases deepens our understanding and helps debug implementations.

Case 1: Pattern with no repetition ("ABCDEF")

Pattern: A B C D E F
LPS:     0 0 0 0 0 0

Behavior: Any mismatch at position j > 0 sends us to LPS[j-1] = 0.
This degenerates to naive behavior, but KMP still works correctly.
No benefit from LPS, but no penalty either.

Case 2: Single character pattern ("A")

Pattern: A
LPS:     0

When matching: Compare text[i] with 'A'.
- Match: i++, j++ → j=1=m, report match, j = LPS[0] = 0
- Mismatch: j=0, so i++

This is just a linear scan - O(n) as expected.

Case 3: All same character ("AAAA")

Pattern: A A A A
LPS:     0 1 2 3

Searching in "AAAAAAAA":

Step 1: Match A A A A at positions 0-3. j=4=m, match at 0!
        j = LPS[3] = 3 (we know text[1..3] = "AAA" = pattern[0..2])
        
Step 2: Compare pattern[3]='A' with text[4]='A' ✓
        j=4=m, match at 1!
        j = LPS[3] = 3
        
Step 3: Compare pattern[3]='A' with text[5]='A' ✓
        j=4=m, match at 2!
        ...and so on

We find all 5 matches (at positions 0,1,2,3,4) in O(n) time!
Without KMP, naive would take O(n×m) to find overlapping matches.

Overlapping Matches Handled Correctly

Case 4: Pattern longer than text

Pattern: "ABCDEFGH" (length 8)
Text: "ABC" (length 3)

The algorithm simply never finds j = m = 8, so it reports no matches.
No special handling needed - the loop terminates when i reaches n = 3.

Case 5: Nested repetition ("AABAAB")

Pattern: A A B A A B
Index:   0 1 2 3 4 5

LPS[0] = 0  ("A")
LPS[1] = 1  ("AA" → "A" = "A")
LPS[2] = 0  ("AAB" → no match)
LPS[3] = 1  ("AABA" → "A" = "A")
LPS[4] = 2  ("AABAA" → "AA" = "AA")
LPS[5] = 3  ("AABAAB" → "AAB" = "AAB")

LPS: 0 1 0 1 2 3

Notice LPS[5] = 3, meaning half the pattern ("AAB") is a valid prefix-suffix.
After a full match, we can resume at position 3, effectively checking for
 patterns that share "AAB" with the previous occurrence.

The LPS Array as a State Machine

There's another way to understand the LPS array that provides deeper insight: the pattern matcher is a finite automaton, and the LPS array encodes its failure transitions.

State Machine View:

States: 0, 1, 2, ..., m (where state j means "we've matched pattern[0..j-1]")
Start state: 0
Accepting state: m (full match)
Success transition: From state j on character pattern[j], go to j+1
Failure transition: From state j on any other character, go to LPS[j-1] (or 0 if j=0)

kmp_automaton.txt
Pattern: "ABABC" with LPS = [0, 0, 1, 2, 0]
 
State Diagram:
 
     A          B          A          B          C
[0] ───→ [1] ───→ [2] ───→ [3] ───→ [4] ───→ [5] (Accept!)
 │        │        │        │        │
 │        │        │        │        └── on anything except C: → state LPS[4-1] = LPS[3] = 2
 │        │        │        │           (we know "AB" still matches)
 │        │        │        │
 │        │        │        └─────────── on anything except B: → state LPS[3-1] = LPS[2] = 1  
 │        │        │                     (we know "A" still matches)
 │        │        │
 │        │        └──────────────────── on anything except A: → state LPS[2-1] = LPS[1] = 0
 │        │                              (no prefix matches)
 │        │
 │        └───────────────────────────── on anything except B: → state LPS[1-1] = LPS[0] = 0
 │                                        
 └────────────────────────────────────── on anything except A: stay at state 0

Why the State Machine View Matters:

Determinism: At any state, there's exactly one transition for each possible input character. The KMP automaton is a deterministic finite automaton (DFA).
O(1) Per Character: Once we've built the automaton (computed LPS), each text character triggers exactly one state transition. This is why KMP is O(n) for matching.
Streaming Capability: We can process the text as a stream. At any point, our state tells us how much of the pattern we've matched. This makes KMP ideal for real-time applications.
Pattern Independence: The state machine is determined entirely by the pattern. We can precompute it once and use it for multiple texts, or even in parallel.

KMP and Regular Expression Engines

Alternative Names and Notations

The LPS array goes by several names in different textbooks and contexts. Understanding these variations helps when reading other resources.

Common Names for the Same Concept:

Alternative Names for the LPS Array
Name	Full Form	Context/Source
LPS Array	Longest Proper Prefix which is also Suffix	Common in competitive programming
Failure Function	f(j) or fail[]	Original KMP paper, academic literature
Prefix Function	π (pi) function	CLRS textbook, theoretical CS
Partial Match Table	—	Some implementation guides
Border Array	—	String algorithmics literature
Next Array	next[]	Some implementations (offset by 1)

Watch for Off-by-One Variations

The "Border" Terminology:

This terminology makes some theorems cleaner to state:

"Every string has a unique sequence of borders"
"The borders of a string form a chain"
"The LPS array encodes the border chain for all prefixes"

We'll use "LPS" in this course for consistency, but know that "failure function," "prefix function," and "border array" all refer to the same structure.

Summary: The LPS Array

We've thoroughly examined the LPS (failure function) array—the central data structure of the KMP algorithm. Let's consolidate our understanding:

Key Takeaways

•LPS[i] = length of longest proper prefix of pattern[0..i] that equals a suffix — this single value encodes the overlap structure.
•LPS[0] is always 0 — a single character has no proper prefix-suffix.
•LPS enables intelligent skipping — after mismatch at j, resume at LPS[j-1] instead of 0.
•The LPS array is computed once, used forever — O(m) preprocessing paying off across arbitrarily long texts.
•LPS can be visualized as a state machine — each pattern position is a state, LPS encodes failure transitions.
•LPS handles overlapping matches naturally — after a full match, LPS[m-1] positions us to find overlaps.

What's next:

The next page reveals the elegant algorithm for building the LPS array in O(m) time—using the same insight recursively that makes KMP matching efficient.

Page Complete

2 / 5