Loading content...
In the world of linear-time string matching, two algorithms stand as titans: the Z-algorithm and the Knuth-Morris-Pratt (KMP) algorithm. Both solve the pattern matching problem in O(n + m) time. Both are deterministic, exact, and elegant in their own ways. Yet they approach the problem from fundamentally different perspectives.
Understanding both algorithms—and knowing when to use each—marks a mature string algorithmist. This page provides a deep comparative analysis, helping you internalize not just how each works, but when each shines.
By the end of this page, you will understand the structural relationship between Z-array and KMP's failure function, see side-by-side comparisons for specific scenarios, know the practical tradeoffs (implementation complexity, debugging ease, performance nuances), and develop a decision framework for choosing between them.
Before diving into technical details, understanding the history illuminates why both algorithms exist.
KMP: The Pioneer (1977)
Donald Knuth, Vaughan Pratt, and James Morris independently discovered the algorithm that bears their initials. Published in 1977, KMP was groundbreaking—it proved that pattern matching could be done in linear time, a result that wasn't obvious at all. The algorithm builds a "failure function" (also called prefix function or π-function) that encodes how to "recover" from mismatches without re-examining characters.
Z-Algorithm: The Elegant Alternative
The Z-algorithm, while having roots in earlier string research, became widely popularized through competitive programming communities and modern algorithm education. Its conceptual simplicity—"how much of the prefix matches starting here?"—makes it easier to intuit than KMP's failure function. Both algorithms arrived at the same asymptotic complexity through different lenses.
The Z-array and KMP's prefix function contain equivalent information. Given one, you can compute the other in O(n) time. They are two representations of the same underlying structure—the combinatorial properties of prefix-suffix matches within a string.
Let's compare what each algorithm computes and how.
| Aspect | Z-Algorithm | KMP |
|---|---|---|
| Core Data Structure | Z-array: Z[i] = length of longest prefix match starting at i | Failure function: π[i] = longest proper prefix of P[0..i] that is also suffix |
| Question Answered | "How much of the prefix matches at position i?" | "What is the longest prefix-suffix overlap up to position i?" |
| Preprocessing Focus | Any position can reference the string start | Each position references its own prefix-suffix structure |
| Pattern Matching Approach | Concatenate P$T, look for Z[i] = m | Process text character by character, use π to handle mismatches |
| Conceptual Model | "Overlay" the string on itself | Finite automaton state transitions |
Visual Comparison for S = "aabcaab":
String S: a a b c a a b
Index: 0 1 2 3 4 5 6
Z-array: - 1 0 0 3 1 0
Z[1]=1: "a" matches S[0]
Z[4]=3: "aab" matches S[0..2]
KMP π-function (prefix function):
π: 0 1 0 0 1 2 3
π[1]=1: "aa" has prefix "a" = suffix "a"
π[5]=2: "aabc aa" has prefix "aa" = suffix "aa"
π[6]=3: "aabcaab" has prefix "aab" = suffix "aab"
Notice the different perspectives: Z asks "from here, how far matches the start?" while π asks "up to here, what's the longest overlap?"
Since Z and π encode equivalent information, we can convert between them. Understanding these conversions deepens insight into both structures.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
/** * Convert Z-array to KMP prefix function (π). * * Key insight: If Z[i] > 0, then positions i, i+1, ..., i+Z[i]-1 * all have some prefix-suffix overlap. Specifically, position i+Z[i]-1 * has a prefix of length Z[i] that is also a suffix. */function zToPrefix(Z: number[]): number[] { const n = Z.length; const pi: number[] = new Array(n).fill(0); for (let i = 1; i < n; i++) { if (Z[i] > 0) { // The Z-box [i, i + Z[i] - 1] tells us about prefix-suffix overlap // For each position j in this Z-box, we might set π[j] for (let j = Z[i] - 1; j >= 0; j--) { const targetIdx = i + j; if (pi[targetIdx] > 0) break; // Don't overwrite larger values pi[targetIdx] = j + 1; } } } return pi;} /** * Convert KMP prefix function (π) to Z-array. * * Key insight: If π[i] = k > 0, then S[i-k+1..i] = S[0..k-1]. * This means Z[i-k+1] >= k. */function prefixToZ(pi: number[]): number[] { const n = pi.length; const Z: number[] = new Array(n).fill(0); for (let i = 1; i < n; i++) { if (pi[i] > 0) { const startPos = i - pi[i] + 1; // We know at least pi[i] characters match at startPos if (Z[startPos] === 0) { Z[startPos] = pi[i]; } } } // Note: This gives a lower bound on Z. Full reconstruction needs refinement. return Z;} // Exampleconst s = "aabcaab";const Z = [0, 1, 0, 0, 3, 1, 0];console.log("Original Z:", Z);console.log("Converted π:", zToPrefix(Z));// Output: Converted π: [0, 1, 0, 0, 1, 2, 3]The conversions above are simplified. Full, correct conversion requires careful handling of overlapping Z-boxes and proper initialization. In practice, it's often cleaner to compute whichever structure you need directly rather than converting.
Let's compare the actual algorithms side by side.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
/** * KMP Pattern Matching for comparison with Z-algorithm. */function kmpSearch(text: string, pattern: string): number[] { if (pattern.length === 0) return []; if (pattern.length > text.length) return []; const m = pattern.length; const n = text.length; // Build failure function (prefix function) const pi = buildPrefixFunction(pattern); const result: number[] = []; let j = 0; // Current position in pattern for (let i = 0; i < n; i++) { // On mismatch, use π to find where to continue while (j > 0 && text[i] !== pattern[j]) { j = pi[j - 1]; } // Check for match at current position if (text[i] === pattern[j]) { j++; } // Full pattern match found if (j === m) { result.push(i - m + 1); j = pi[j - 1]; // Continue searching for more matches } } return result;} function buildPrefixFunction(pattern: string): number[] { const m = pattern.length; const pi: number[] = new Array(m).fill(0); let k = 0; // Length of previous longest prefix-suffix for (let i = 1; i < m; i++) { while (k > 0 && pattern[i] !== pattern[k]) { k = pi[k - 1]; } if (pattern[i] === pattern[k]) { k++; } pi[i] = k; } return pi;} // Comparisonconst text = "ababcabababd";const pattern = "abab"; console.log("Z-algorithm result:", findPattern(text, pattern));console.log("KMP result:", kmpSearch(text, pattern));// Both output: [0, 5, 7]Both algorithms achieve the same asymptotic complexity, but there are subtle differences worth understanding.
| Metric | Z-Algorithm | KMP |
|---|---|---|
| Preprocessing Time | O(m) for pattern | O(m) for pattern |
| Matching Time | O(n) | O(n) |
| Total Time | O(n + m) | O(n + m) |
| Preprocessing Space | O(m) for Z-array of P$T (or full O(n+m)) | O(m) for π |
| Character Comparisons (worst case) | ≤ 2(n + m) | ≤ 2n (after preprocessing) |
| Memory Access Pattern | Sequential + Z-box lookups | Sequential with state jumps |
Nuanced Differences:
Memory Usage:
For very long texts (n >> m), KMP has a memory advantage. However, Z-algorithm can be modified to compute only the relevant portion.
Cache Behavior:
Both have good cache behavior for typical inputs, but KMP's sequential text access can be slightly more cache-friendly.
Branch Prediction:
Both have similar branch complexity. Modern processors handle both patterns well.
In practice, both algorithms perform nearly identically for most inputs. Micro-optimizations and constant factors often matter more than the algorithmic structure. Profile on your specific workload if performance is critical.
Beyond asymptotic complexity, practical considerations often determine algorithm choice.
Common Implementation Pitfalls:
Z-Algorithm:
r = r - 1 after the while loopZ[k] < r - i + 1 vs <=)KMP:
π[i] (1-indexed convention) and π[i-1] (0-indexed)Both algorithms have subtle edge cases. Thorough testing is essential.
Here's a practical decision framework for choosing between Z and KMP.
For most practical purposes, either algorithm works. The best choice is often the one you can implement correctly and confidently. Master both, then default to whichever feels more natural for the specific problem.
Both algorithms extend beyond basic pattern matching, but they shine in different contexts.
| Application | Better Choice | Why |
|---|---|---|
| Find all occurrences of pattern in text | Either | Both are equally suitable; O(n+m) |
| Longest prefix that is also suffix | Z | Direct query: find i where i + Z[i] = n |
| String periodicity/minimal period | Z | Z-array structure reveals periods directly |
| Streaming/online matching | KMP | Processes text incrementally without concatenation |
| Multiple pattern matching | Neither (use Aho-Corasick) | Both are single-pattern algorithms |
| Approximate matching | Neither directly | Both need modifications; other algorithms may be better |
| Lexicographically smallest rotation | Z (Z-function) | Can be done with Z-array on S$S |
| Building failure automaton | KMP | Natural connection to finite automata |
Example: Finding the Minimal Period with Z-Array
A string S has period p if S[i] = S[i mod p] for all i. The minimal period is the smallest such p.
Using Z-array:
Minimal period p = smallest i > 0 such that:
i + Z[i] = n AND n mod i = 0
If no such i exists, the minimal period is n (the string is aperiodic).
Example: S = "abab", Z = [0, 0, 2, 0]
We've completed a comprehensive comparison of the Z-algorithm and KMP. Let's crystallize the key insights.
Your Learning Journey:
You've now completed the Z-Algorithm module. You understand:
This knowledge places you among the small percentage of developers who truly understand linear-time string matching, not just as a black box, but as a principled algorithmic technique you can adapt and extend.
Congratulations! You've mastered the Z-Algorithm—a powerful, elegant technique for linear-time string processing. Whether you're tackling competitive programming challenges, building search functionality, or analyzing biological sequences, this knowledge will serve you well. Your string algorithm toolkit is now significantly stronger.
Click to solve
Click to solve
Click to solve
Click to solve
Click to solve
Click to solve