Loading learning content...
The naive approach to computing the Z-array—comparing characters one by one at each position—requires O(n²) time in the worst case. For a string of length 100,000, that's 10 billion comparisons. For competitive programming constraints where n can reach 10⁶ or 10⁷, this is utterly impractical.
The Z-algorithm achieves something remarkable: it computes all n Z-values in exactly O(n) time using at most 2n character comparisons. This linear-time achievement isn't a minor optimization—it's a fundamental algorithmic breakthrough that makes the technique practically useful.
The key insight? Previously computed Z-values contain information about future positions. By carefully tracking what we already know, we avoid redundant work.
By the end of this page, you will understand the Z-algorithm in complete detail: how it maintains the rightmost Z-box, when it can reuse previously computed values, and why its worst-case complexity is truly O(n). You'll be able to implement it from first principles and trace through its execution on any input.
The Z-algorithm's power comes from a single, profound observation:
When position i falls inside a known Z-box [l, r], the characters S[i..r] are identical to S[i-l..r-l].
This is not approximate—it's an exact match. Why? Because the Z-box exists precisely because S[l..r] = S[0..r-l]. Therefore, any position i inside this box satisfies:
S[i] = S[i-l]
S[i+1] = S[i-l+1]
...
S[r] = S[r-l]
This means we already know something about the Z-value at position i! The value Z[i-l] was computed earlier and tells us how many characters match the prefix starting at position i-l in the string. Since S[i..r] mirrors S[(i-l)..(r-l)], we can potentially reuse Z[i-l] to jumpstart our computation of Z[i].
Think of the Z-box as a "mirror" of the prefix. When we're inside the mirror, we're looking at a copy of the string's beginning. Any structural information (like Z-values) from the original prefix applies to the mirrored region.
Critical Cases:
When computing Z[i] and i is inside the current Z-box [l, r], we compute k = i - l (the "mirrored position" in the prefix). Then:
Case 1: Z[k] < r - i + 1 (the mirrored match fits entirely within the Z-box)
In this case, Z[i] = Z[k]. We're done—no character comparisons needed!
Why? The match at position k extends Z[k] characters. Since Z[k] < r-i+1, this entire match falls within our known mirror region. The match at position i will be identical.
Case 2: Z[k] ≥ r - i + 1 (the mirrored match reaches or exceeds the Z-box boundary)
In this case, we know Z[i] ≥ r - i + 1, but it might be larger. We must compare characters beyond r to find the true extent.
Why? Beyond position r, we have no mirrored information. The match might continue, or it might not—only direct comparison can tell us.
Let's now formalize the Z-algorithm as executable pseudocode, with every decision point explained.
1234567891011121314151617181920212223242526272829303132333435
function computeZArray(S): n = length(S) Z = array of size n, initialized to 0 // l and r track the rightmost Z-box [l, r] // Initially, no Z-box exists l = 0 r = 0 for i = 1 to n - 1: if i > r: // Case A: i is outside the current Z-box // Must compare from scratch l = r = i while r < n and S[r - l] == S[r]: r = r + 1 Z[i] = r - l r = r - 1 // r points to last matching index else: // Case B: i is inside the Z-box [l, r] k = i - l // mirrored position if Z[k] < r - i + 1: // Case B1: mirrored Z-value fits entirely inside Z-box Z[i] = Z[k] else: // Case B2: mirrored Z-value reaches Z-box boundary // Must extend beyond r l = i while r < n and S[r - l] == S[r]: r = r + 1 Z[i] = r - l r = r - 1 return ZUnderstanding Each Case:
Case A: i > r (outside Z-box)
We've moved beyond our rightmost known information. We must compare S[0], S[1], ... with S[i], S[i+1], ... character by character until we find a mismatch. This establishes a new Z-box starting at i.
Case B1: i ≤ r and Z[k] < r - i + 1 (fits inside)
The mirrored Z-value tells us everything. The match at position i will be exactly Z[k] characters because:
Case B2: i ≤ r and Z[k] ≥ r - i + 1 (reaches boundary)
The mirrored Z-value extends to or beyond the Z-box boundary. We know at least r-i+1 characters match, but there might be more. We must compare characters beyond r to find the true Z[i].
Let's translate the algorithm into production-ready code with comprehensive comments.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
/** * Computes the Z-array for a given string in O(n) time. * * Z[i] = length of the longest substring starting at i that * matches a prefix of the string. * * @param s - The input string * @returns The Z-array where Z[0] is set to 0 by convention */function computeZArray(s: string): number[] { const n = s.length; const Z: number[] = new Array(n).fill(0); if (n === 0) return Z; // l and r define the rightmost Z-box [l, r] // This is the interval where S[l..r] = S[0..r-l] let l = 0; let r = 0; for (let i = 1; i < n; i++) { if (i > r) { // Case A: i is outside the current Z-box // Start fresh comparison from position i l = r = i; // Extend r as long as characters match while (r < n && s[r - l] === s[r]) { r++; } Z[i] = r - l; r--; // r should point to the last matched index } else { // Case B: i is inside the Z-box [l, r] const k = i - l; // Mirrored position in prefix if (Z[k] < r - i + 1) { // Case B1: Z[k] fits entirely inside [l, r] // We can directly use the mirrored value Z[i] = Z[k]; } else { // Case B2: Z[k] extends to or beyond the boundary // We know at least (r - i + 1) characters match // Must check beyond r for more matches l = i; while (r < n && s[r - l] === s[r]) { r++; } Z[i] = r - l; r--; } } } return Z;} // Example usageconst s = "aabxaabxcaab";console.log(`String: ${s}`);console.log(`Z-array: [${computeZArray(s).join(', ')}]`);// Output: Z-array: [0, 1, 0, 0, 4, 1, 0, 0, 0, 3, 1, 0]Let's trace through the algorithm on S = "aabcaab" to see exactly how it works.
Initial State:
S = "aabcaab"
0123456
Z = [0, 0, 0, 0, 0, 0, 0]
l = 0, r = 0
i = 1: i > r, so we're in Case A
i = 2: i > r, so we're in Case A
i = 3: i > r, so we're in Case A
i = 4: i > r, so we're in Case A
i = 5: i ≤ r (5 ≤ 6), so we're in Case B
Notice at i = 5, we determined Z[5] = 1 without any character comparisons! This is the magic of the Z-algorithm: by leveraging the Z-box, we can often skip entire computations.
i = 6: i ≤ r (6 ≤ 6), so we're in Case B
Final Z-array: [0, 1, 0, 0, 3, 1, 0]
The claim that the Z-algorithm runs in O(n) time might seem surprising given the nested while loops. Let's prove this rigorously.
Proof of O(n) Time Complexity:
Key Observation: The variable r only increases, never decreases (except by 1 after each loop, which is still net non-decreasing across iterations).
Counting Character Comparisons:
Each character comparison in the inner while loops either:
r by 1, ORSince r starts at 0 and can be at most n, and r never decreases after successful comparisons, there can be at most n successful character comparisons total.
For failed comparisons: each iteration of the outer loop (i = 1 to n-1) can have at most one failed comparison. That's at most n-1 failed comparisons.
Total character comparisons ≤ n (successes) + n-1 (failures) = 2n - 1 = O(n)
This is an amortized analysis. While a single iteration might do many comparisons (in Case A or Case B2), those comparisons advance r, which means future iterations will benefit. The total work is distributed across all iterations, giving O(1) amortized cost per position.
| Metric | Complexity | Explanation |
|---|---|---|
| Time Complexity | O(n) | At most 2n character comparisons total (amortized O(1) per position) |
| Space Complexity | O(n) | The Z-array requires n integers |
| Auxiliary Space | O(1) | Only l, r, i, k variables needed beyond the output |
Comparison with Naive Approach:
| Approach | Time Complexity | Character Comparisons (worst case) |
|---|---|---|
| Naive | O(n²) | n(n-1)/2 ≈ 500 million for n=10⁶ |
| Z-algorithm | O(n) | 2n - 1 ≈ 2 million for n=10⁶ |
The difference is dramatic: 250× fewer comparisons for n = 10⁶.
A robust implementation must handle all edge cases correctly. Let's examine the critical ones.
1234567891011121314151617181920
// Edge case verificationfunction testZAlgorithm() { const testCases = [ { input: "", expected: [] }, { input: "a", expected: [0] }, { input: "aaaa", expected: [0, 3, 2, 1] }, { input: "abcd", expected: [0, 0, 0, 0] }, { input: "ababab", expected: [0, 0, 4, 0, 2, 0] }, { input: "aabcaab", expected: [0, 1, 0, 0, 3, 1, 0] }, { input: "abcabc", expected: [0, 0, 0, 3, 0, 0] }, ]; for (const { input, expected } of testCases) { const result = computeZArray(input); const passed = JSON.stringify(result) === JSON.stringify(expected); console.log(`Input: "${input}" | Expected: [${expected}] | Got: [${result}] | ${passed ? '✓' : '✗'}`); }} testZAlgorithm();Watch for: (1) Off-by-one errors in the r update (should be r-1 after the while loop). (2) Forgetting to reset l when entering Case A. (3) Incorrect handling of the Z[k] >= r - i + 1 condition. Testing against these edge cases catches most bugs.
While the standard Z-algorithm is already optimal, there are practical considerations and variations worth knowing.
Variation: Right-to-Left Z-Array
Sometimes you need a "reverse Z-array" where Z[i] measures how many characters ending at i match a suffix of the string. This is computed by:
This is useful for problems involving suffix matching or when you need to know "how much to the left matches the end of the string."
12345678910111213141516
/** * Computes the reverse Z-array (suffix matching). * reverseZ[i] = length of longest substring ending at i * that matches a suffix of the string. */function computeReverseZArray(s: string): number[] { const reversed = s.split('').reverse().join(''); const Z = computeZArray(reversed); return Z.reverse();} // Exampleconst s = "abxyab";console.log(`String: ${s}`);console.log(`Z-array: ${computeZArray(s)}`);console.log(`Reverse Z-array: ${computeReverseZArray(s)}`);We've now mastered the core technical content of the Z-algorithm. Let's consolidate what we've learned.
What's Next:
With the Z-array computation mastered, we're ready for the exciting part: applications. The next page shows how to use the Z-algorithm for pattern matching—finding all occurrences of a pattern in a text in O(n + m) time.
You now understand how to compute the Z-array in linear time. This is one of the most elegant algorithms in string processing—simple to state, non-obvious to derive, and powerful in application. The next page puts this tool to work.