Loading content...
We've built a powerful tool—the Z-array computed in linear time. But how do we wield it? The most fundamental application of the Z-algorithm is pattern matching: given a text T and a pattern P, find all positions where P occurs in T.
This is the problem that powers your Ctrl+F, enables compilers to tokenize source code, allows DNA sequencing to identify gene markers, and lets search engines locate relevant documents. It's perhaps the most important problem in string algorithmics.
The Z-algorithm solves this problem with an elegant insight: by constructing a special combined string, we transform pattern matching into a Z-array query problem.
By the end of this page, you will understand the concatenation technique for Z-based pattern matching, see complete working implementations, understand why the approach is correct, and know how to handle edge cases and special characters. You'll be able to find all pattern occurrences in optimal O(n + m) time.
Let's formally define the problem we're solving.
Problem Statement:
Given:
Find:
Output: A list of all such positions i.
Example:
| Approach | Time Complexity | Practical for n, m = 10⁶? |
|---|---|---|
| Naive (brute force) | O(n × m) | No: ~10¹² comparisons |
| KMP algorithm | O(n + m) | Yes: ~2 × 10⁶ comparisons |
| Z-algorithm | O(n + m) | Yes: ~2 × 10⁶ comparisons |
| Rabin-Karp (average) | O(n + m) | Yes: with hash collisions |
The Z-algorithm achieves the optimal O(n + m) complexity, matching KMP but with a different approach that many find more intuitive.
Here's the key insight that connects Z-arrays to pattern matching:
If we compute the Z-array of the combined string P$T (pattern + separator + text), then any position i where Z[i] = m indicates a pattern occurrence.
Why does this work?
The '$' (or any character not in P or T) serves as a boundary. It ensures that Z-values computed for positions in T won't artificially extend into the pattern prefix. Without a separator, matches could incorrectly "bridge" across P and T.
Visual Breakdown:
P = "aba", T = "abacaba"
Combined string S = P + '$' + T:
Index: 0 1 2 3 4 5 6 7 8 9 10
S: a b a $ a b a c a b a
└─P─┘ ↑ └────T────────┘
separator
Z-array:
Z[0] = 0 (by convention)
Z[1] = 0 (b ≠ a)
Z[2] = 1 (a matches, b ≠ b... wait, let's check: S[2]='a'=S[0], S[3]='$'≠S[1]='b', so Z[2]=1)
Z[3] = 0 ($ ≠ a)
Z[4] = 3 (aba matches!) ← Pattern found at T[0]
Z[5] = 0 (b ≠ a)
Z[6] = 1 (a matches, c ≠ b)
Z[7] = 0 (c ≠ a)
Z[8] = 3 (aba matches!) ← Pattern found at T[4]
Z[9] = 0 (b ≠ a)
Z[10] = 1 (a matches, but we're at end)
Positions where Z[i] = m = 3: indices 4 and 8
Corresponding positions in T: 4 - (3+1) = 0, and 8 - (3+1) = 4
Pattern "aba" occurs in text "abacaba" at positions 0 and 4.
Let's implement the Z-based pattern matching algorithm in full detail.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
/** * Finds all occurrences of pattern in text using the Z-algorithm. * Time complexity: O(n + m) where n = text length, m = pattern length. * * @param text - The text to search in * @param pattern - The pattern to search for * @returns Array of starting positions where pattern occurs in text */function findPattern(text: string, pattern: string): number[] { if (pattern.length === 0) return []; if (pattern.length > text.length) return []; const m = pattern.length; const n = text.length; // Create combined string: pattern + separator + text // The separator must not appear in either string const combined = pattern + '$' + text; // Compute Z-array of the combined string const Z = computeZArray(combined); // Find all positions where Z[i] equals pattern length const result: number[] = []; for (let i = m + 1; i < combined.length; i++) { if (Z[i] === m) { // Position in text is i - (m + 1) result.push(i - m - 1); } } return result;} /** * Computes the Z-array for a given string in O(n) time. */function computeZArray(s: string): number[] { const n = s.length; const Z: number[] = new Array(n).fill(0); if (n === 0) return Z; let l = 0, r = 0; for (let i = 1; i < n; i++) { if (i > r) { l = r = i; while (r < n && s[r - l] === s[r]) { r++; } Z[i] = r - l; r--; } else { const k = i - l; if (Z[k] < r - i + 1) { Z[i] = Z[k]; } else { l = i; while (r < n && s[r - l] === s[r]) { r++; } Z[i] = r - l; r--; } } } return Z;} // Example usageconst text = "ababcabababd";const pattern = "abab";const occurrences = findPattern(text, pattern);console.log(`Text: "${text}"`);console.log(`Pattern: "${pattern}"`);console.log(`Occurrences at positions: [${occurrences.join(', ')}]`);// Output: Occurrences at positions: [0, 5, 7]Why does this technique find exactly the pattern occurrences? Let's prove it rigorously.
Theorem: For the combined string S = P$T, position i (where i > m) has Z[i] = m if and only if P occurs in T starting at position i - m - 1.
Proof:
Part 1: If Z[i] = m, then P occurs at position i - m - 1 in T
Suppose Z[i] = m for some i > m.
Part 2: If P occurs at position j in T, then Z[m+1+j] = m
Suppose T[j..j+m-1] = P for some valid j.
QED
The separator character '$' ensures that no Z-value in the text portion can exceed m. Even if T[j..] looks like it continues matching P, the match would have to "cross" the '$' to extend beyond m characters, which is impossible since '$' only appears once and doesn't match itself.
Real-world pattern matching requires handling several special cases correctly.
The Separator Problem:
If the pattern or text might contain the '$' character, you have two options:
Option 1: Choose a different separator
Find a character not in either string. This requires O(n + m) time to scan for character frequencies, but is straightforward.
Option 2: Compute Z-array directly on P$T without storing the combined string
Modify the Z-algorithm to work on an implicit concatenation: when accessing position i, return P[i] for i < m, '$' for i = m, and T[i-m-1] for i > m. This avoids the separator issue entirely.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
/** * Robust pattern matching that handles all edge cases, * including when the separator character exists in the input. */function findPatternRobust(text: string, pattern: string): number[] { if (pattern.length === 0) return []; if (pattern.length > text.length) return []; const m = pattern.length; const n = text.length; const total = m + 1 + n; // Virtual accessor: access the combined string without building it // This avoids the separator character problem function charAt(i: number): string { if (i < m) return pattern[i]; if (i === m) return '\0'; // Use null character as separator return text[i - m - 1]; } // Z-algorithm on virtual string const Z: number[] = new Array(total).fill(0); let l = 0, r = 0; for (let i = 1; i < total; i++) { if (i > r) { l = r = i; while (r < total && charAt(r - l) === charAt(r)) { r++; } Z[i] = r - l; r--; } else { const k = i - l; if (Z[k] < r - i + 1) { Z[i] = Z[k]; } else { l = i; while (r < total && charAt(r - l) === charAt(r)) { r++; } Z[i] = r - l; r--; } } } // Collect results const result: number[] = []; for (let i = m + 1; i < total; i++) { if (Z[i] === m) { result.push(i - m - 1); } } return result;}The virtual accessor approach has another benefit: it avoids creating a new string of length m+1+n. For very large inputs, this can save significant memory. The cost is slightly more complex per-character access logic.
Let's verify that Z-based pattern matching achieves optimal O(n + m) complexity.
| Operation | Time | Space |
|---|---|---|
| Construct combined string | O(n + m) | O(n + m) |
| Compute Z-array | O(n + m) | O(n + m) |
| Scan for occurrences | O(n) | O(k) where k = number of occurrences |
| Total | O(n + m) | O(n + m) |
Time Complexity: O(n + m)
Space Complexity: O(n + m)
Space Optimization:
Using the virtual accessor technique, we can reduce space to O(m + n) for the Z-array alone, without creating the combined string. This is optimal since we must at least store the Z-array (or compute it on-the-fly).
O(n + m) is the best possible for this problem. We must at least read the entire pattern (m) and text (n) to guarantee correctness, establishing Ω(n + m) as a lower bound. The Z-algorithm matches this lower bound.
Let's work through several examples to build intuition and verify our understanding.
Example 1: Overlapping Matches
Pattern P = "aa"
Text T = "aaaa"
Combined S = "aa$aaaa"
Index: 0 1 2 3 4 5 6
Z-array: [0, 1, 0, 2, 1, 2, 1]
Positions where Z[i] = 2 (= m):
- i = 3: text position 3 - 2 - 1 = 0
- i = 5: text position 5 - 2 - 1 = 2
Wait, that gives only positions 0 and 2. But "aa" occurs at 0, 1, and 2!
Let me re-check:
Z[3] = length of match between "aaaa" and "aa" = 2 ("aa" matches) → occurrence at 0 ✓
Z[4] = length of match between "aaa" and "aa" = 2? Let's see: S[4]='a'=S[0], S[5]='a'=S[1], S[6]='a'≠S[2]='$'... wait, S[2]='$', so Z[4]=2 → occurrence at 1 ✓
Z[5] = length of match between "aa" and "aa" = 2 → occurrence at 2 ✓
Z[6] = length of match between "a" and "aa" = 1 (only 1 char left)
Correct Z-array: [0, 1, 0, 2, 2, 2, 1]
Occurrences at positions: 0, 1, 2 ✓
Example 2: No Matches
Pattern P = "xyz"
Text T = "abcdefg"
Combined S = "xyz$abcdefg"
Index: 0 1 2 3 4 5 6 7 8 9 10
Z-array: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
No position has Z[i] = 3 (= m)
Occurrences: [] (empty list)
Example 3: Pattern Equals Text
Pattern P = "abcd"
Text T = "abcd"
Combined S = "abcd$abcd"
Index: 0 1 2 3 4 5 6 7 8
Z-array: [0, 0, 0, 0, 0, 4, 0, 0, 0]
Z[5] = 4 = m → occurrence at position 5 - 4 - 1 = 0
Occurrences: [0] ✓
Try computing the Z-array and finding occurrences for: Pattern = "aba", Text = "abababa". The answer should be [0, 2, 4] corresponding to the three overlapping occurrences of "aba".
The Z-algorithm's power extends beyond simple pattern matching. Here are some advanced applications that use the same concatenation technique.
123456789101112131415161718192021222324252627
/** * Finds all lengths l such that S[0..l-1] equals S[n-l..n-1] * i.e., all prefixes that are also suffixes. */function prefixEqualsSuffix(s: string): number[] { const n = s.length; if (n === 0) return []; const Z = computeZArray(s); const result: number[] = []; for (let i = 1; i < n; i++) { // If the Z-box [i, i + Z[i] - 1] reaches the end of the string if (i + Z[i] === n) { result.push(Z[i]); } } // The entire string is trivially a prefix and suffix of itself result.push(n); return result.sort((a, b) => a - b);} // Exampleconsole.log(prefixEqualsSuffix("abacaba")); // Output: [1, 3, 7] → "a", "aba", "abacaba"We've seen how the Z-algorithm solves the fundamental pattern matching problem with elegant efficiency.
What's Next:
We've now seen the Z-algorithm in action. The final page compares it with KMP—another O(n + m) pattern matching algorithm—exploring their similarities, differences, and when to prefer one over the other.
You can now use the Z-algorithm for pattern matching. This is a practical skill—pattern matching appears in countless real-world scenarios from text editors to bioinformatics to log analysis. The next page contextualizes Z against its famous sibling, KMP.