Data Structures & AlgorithmsZ-Algorithm

Z-Algorithm: Linear-Time Pattern Matching

LevelIntermediate

Duration50 mins

TopicZ-Algorithm

2 / 4

Computing Z-Values in O(n)

The Elegance of Linear Time

The naive approach to computing the Z-array—comparing characters one by one at each position—requires O(n²) time in the worst case. For a string of length 100,000, that's 10 billion comparisons. For competitive programming constraints where n can reach 10⁶ or 10⁷, this is utterly impractical.

The Z-algorithm achieves something remarkable: it computes all n Z-values in exactly O(n) time using at most 2n character comparisons. This linear-time achievement isn't a minor optimization—it's a fundamental algorithmic breakthrough that makes the technique practically useful.

The key insight? Previously computed Z-values contain information about future positions. By carefully tracking what we already know, we avoid redundant work.

What You Will Learn

By the end of this page, you will understand the Z-algorithm in complete detail: how it maintains the rightmost Z-box, when it can reuse previously computed values, and why its worst-case complexity is truly O(n). You'll be able to implement it from first principles and trace through its execution on any input.

The Core Insight: Information Reuse

The Z-algorithm's power comes from a single, profound observation:

When position i falls inside a known Z-box [l, r], the characters S[i..r] are identical to S[i-l..r-l].

This is not approximate—it's an exact match. Why? Because the Z-box exists precisely because S[l..r] = S[0..r-l]. Therefore, any position i inside this box satisfies:

S[i] = S[i-l]
S[i+1] = S[i-l+1]
...
S[r] = S[r-l]

This means we already know something about the Z-value at position i! The value Z[i-l] was computed earlier and tells us how many characters match the prefix starting at position i-l in the string. Since S[i..r] mirrors S[(i-l)..(r-l)], we can potentially reuse Z[i-l] to jumpstart our computation of Z[i].

The Mirror Principle

Think of the Z-box as a "mirror" of the prefix. When we're inside the mirror, we're looking at a copy of the string's beginning. Any structural information (like Z-values) from the original prefix applies to the mirrored region.

Critical Cases:

When computing Z[i] and i is inside the current Z-box [l, r], we compute k = i - l (the "mirrored position" in the prefix). Then:

Case 1: Z[k] < r - i + 1 (the mirrored match fits entirely within the Z-box)

In this case, Z[i] = Z[k]. We're done—no character comparisons needed!

Why? The match at position k extends Z[k] characters. Since Z[k] < r-i+1, this entire match falls within our known mirror region. The match at position i will be identical.

Case 2: Z[k] ≥ r - i + 1 (the mirrored match reaches or exceeds the Z-box boundary)

In this case, we know Z[i] ≥ r - i + 1, but it might be larger. We must compare characters beyond r to find the true extent.

Why? Beyond position r, we have no mirrored information. The match might continue, or it might not—only direct comparison can tell us.

The Algorithm in Detail

Let's now formalize the Z-algorithm as executable pseudocode, with every decision point explained.

z_algorithm.pseudo

Pseudocode

function computeZArray(S):
    n = length(S)
    Z = array of size n, initialized to 0
    
    // l and r track the rightmost Z-box [l, r]
    // Initially, no Z-box exists
    l = 0
    r = 0
    
    for i = 1 to n - 1:
        if i > r:
            // Case A: i is outside the current Z-box
            // Must compare from scratch
            l = r = i
            while r < n and S[r - l] == S[r]:
                r = r + 1
            Z[i] = r - l
            r = r - 1  // r points to last matching index
        else:
            // Case B: i is inside the Z-box [l, r]
            k = i - l  // mirrored position
            
            if Z[k] < r - i + 1:
                // Case B1: mirrored Z-value fits entirely inside Z-box
                Z[i] = Z[k]
            else:
                // Case B2: mirrored Z-value reaches Z-box boundary
                // Must extend beyond r
                l = i
                while r < n and S[r - l] == S[r]:
                    r = r + 1
                Z[i] = r - l
                r = r - 1
    
    return Z

Understanding Each Case:

Case A: i > r (outside Z-box)

We've moved beyond our rightmost known information. We must compare S[0], S[1], ... with S[i], S[i+1], ... character by character until we find a mismatch. This establishes a new Z-box starting at i.

Case B1: i ≤ r and Z[k] < r - i + 1 (fits inside)

The mirrored Z-value tells us everything. The match at position i will be exactly Z[k] characters because:

The first Z[k] characters match (by the mirror property)
The (Z[k]+1)th character doesn't match (because it didn't at position k, and we're still in the mirrored region)

Case B2: i ≤ r and Z[k] ≥ r - i + 1 (reaches boundary)

The mirrored Z-value extends to or beyond the Z-box boundary. We know at least r-i+1 characters match, but there might be more. We must compare characters beyond r to find the true Z[i].

Complete Implementation

Let's translate the algorithm into production-ready code with comprehensive comments.

z_algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
/**
 * Computes the Z-array for a given string in O(n) time.
 * 
 * Z[i] = length of the longest substring starting at i that
 * matches a prefix of the string.
 * 
 * @param s - The input string
 * @returns The Z-array where Z[0] is set to 0 by convention
 */
function computeZArray(s: string): number[] {
    const n = s.length;
    const Z: number[] = new Array(n).fill(0);
    
    if (n === 0) return Z;
    
    // l and r define the rightmost Z-box [l, r]
    // This is the interval where S[l..r] = S[0..r-l]
    let l = 0;
    let r = 0;
    
    for (let i = 1; i < n; i++) {
        if (i > r) {
            // Case A: i is outside the current Z-box
            // Start fresh comparison from position i
            l = r = i;
            
            // Extend r as long as characters match
            while (r < n && s[r - l] === s[r]) {
                r++;
            }
            
            Z[i] = r - l;
            r--;  // r should point to the last matched index
            
        } else {
            // Case B: i is inside the Z-box [l, r]
            const k = i - l;  // Mirrored position in prefix
            
            if (Z[k] < r - i + 1) {
                // Case B1: Z[k] fits entirely inside [l, r]
                // We can directly use the mirrored value
                Z[i] = Z[k];
                
            } else {
                // Case B2: Z[k] extends to or beyond the boundary
                // We know at least (r - i + 1) characters match
                // Must check beyond r for more matches
                l = i;
                
                while (r < n && s[r - l] === s[r]) {
                    r++;
                }
                
                Z[i] = r - l;
                r--;
            }
        }
    }
    
    return Z;
}
 
// Example usage
const s = "aabxaabxcaab";
console.log(`String: ${s}`);
console.log(`Z-array: [${computeZArray(s).join(', ')}]`);
// Output: Z-array: [0, 1, 0, 0, 4, 1, 0, 0, 0, 3, 1, 0]

Step-by-Step Trace

Let's trace through the algorithm on S = "aabcaab" to see exactly how it works.

Initial State:

S = "aabcaab"
     0123456
Z = [0, 0, 0, 0, 0, 0, 0]
l = 0, r = 0

i = 1: i > r, so we're in Case A

Set l = r = 1
Compare S[0]='a' with S[1]='a' → match, r = 2
Compare S[1]='a' with S[2]='b' → mismatch, stop
Z[1] = 2 - 1 = 1, r = 1
Z = [0, 1, 0, 0, 0, 0, 0], Z-box = [1, 1]

i = 2: i > r, so we're in Case A

Set l = r = 2
Compare S[0]='a' with S[2]='b' → mismatch, stop immediately
Z[2] = 2 - 2 = 0, r = 1 (stays at 2-1=1, but we don't update since no extension)
Z = [0, 1, 0, 0, 0, 0, 0]

i = 3: i > r, so we're in Case A

Set l = r = 3
Compare S[0]='a' with S[3]='c' → mismatch, stop immediately
Z[3] = 0
Z = [0, 1, 0, 0, 0, 0, 0]

i = 4: i > r, so we're in Case A

Set l = r = 4
Compare S[0]='a' with S[4]='a' → match, r = 5
Compare S[1]='a' with S[5]='a' → match, r = 6
Compare S[2]='b' with S[6]='b' → match, r = 7 (but 7 ≥ n, stop)
Z[4] = 7 - 4 = 3, r = 6
Z = [0, 1, 0, 0, 3, 0, 0], Z-box = [4, 6]

i = 5: i ≤ r (5 ≤ 6), so we're in Case B

k = 5 - 4 = 1, Z[k] = Z[1] = 1
Check: Z[1] = 1 < r - i + 1 = 6 - 5 + 1 = 2
Case B1: Z[5] = Z[1] = 1 (no comparisons needed!)
Z = [0, 1, 0, 0, 3, 1, 0]

Power of Information Reuse

Notice at i = 5, we determined Z[5] = 1 without any character comparisons! This is the magic of the Z-algorithm: by leveraging the Z-box, we can often skip entire computations.

i = 6: i ≤ r (6 ≤ 6), so we're in Case B

k = 6 - 4 = 2, Z[k] = Z[2] = 0
Check: Z[2] = 0 < r - i + 1 = 6 - 6 + 1 = 1
Case B1: Z[6] = Z[2] = 0
Z = [0, 1, 0, 0, 3, 1, 0]

Final Z-array: [0, 1, 0, 0, 3, 1, 0]

Complexity Analysis

The claim that the Z-algorithm runs in O(n) time might seem surprising given the nested while loops. Let's prove this rigorously.

Proof of O(n) Time Complexity:

Key Observation: The variable r only increases, never decreases (except by 1 after each loop, which is still net non-decreasing across iterations).

Counting Character Comparisons:

Each character comparison in the inner while loops either:
- Succeeds and increments r by 1, OR
- Fails and exits the loop
Since r starts at 0 and can be at most n, and r never decreases after successful comparisons, there can be at most n successful character comparisons total.
For failed comparisons: each iteration of the outer loop (i = 1 to n-1) can have at most one failed comparison. That's at most n-1 failed comparisons.

Total character comparisons ≤ n (successes) + n-1 (failures) = 2n - 1 = O(n)

The Amortization Insight

This is an amortized analysis. While a single iteration might do many comparisons (in Case A or Case B2), those comparisons advance r, which means future iterations will benefit. The total work is distributed across all iterations, giving O(1) amortized cost per position.

Complexity Summary
Metric	Complexity	Explanation
Time Complexity	O(n)	At most 2n character comparisons total (amortized O(1) per position)
Space Complexity	O(n)	The Z-array requires n integers
Auxiliary Space	O(1)	Only l, r, i, k variables needed beyond the output

Comparison with Naive Approach:

Approach	Time Complexity	Character Comparisons (worst case)
Naive	O(n²)	n(n-1)/2 ≈ 500 million for n=10⁶
Z-algorithm	O(n)	2n - 1 ≈ 2 million for n=10⁶

The difference is dramatic: 250× fewer comparisons for n = 10⁶.

Edge Cases and Robustness

A robust implementation must handle all edge cases correctly. Let's examine the critical ones.

Edge Cases to Handle

•Empty string (n = 0): Return empty Z-array. Our implementation handles this with the early return check.
•Single character (n = 1): Return [0]. The loop from i = 1 to n-1 never executes, leaving Z = [0].
•All identical characters (e.g., "aaaa"): Each position after 0 matches the entire remaining prefix. Z = [0, 3, 2, 1].
•All distinct characters (e.g., "abcd"): No position matches even the first character. Z = [0, 0, 0, 0].
•Alternating patterns (e.g., "ababab"): Tests proper Case B handling with overlapping Z-boxes.
•Single repeated prefix (e.g., "abcabc"): Tests proper Z-box extension at the repeat position.

edge_case_tests
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Edge case verification
function testZAlgorithm() {
    const testCases = [
        { input: "", expected: [] },
        { input: "a", expected: [0] },
        { input: "aaaa", expected: [0, 3, 2, 1] },
        { input: "abcd", expected: [0, 0, 0, 0] },
        { input: "ababab", expected: [0, 0, 4, 0, 2, 0] },
        { input: "aabcaab", expected: [0, 1, 0, 0, 3, 1, 0] },
        { input: "abcabc", expected: [0, 0, 0, 3, 0, 0] },
    ];
    
    for (const { input, expected } of testCases) {
        const result = computeZArray(input);
        const passed = JSON.stringify(result) === JSON.stringify(expected);
        console.log(`Input: "${input}" | Expected: [${expected}] | Got: [${result}] | ${passed ? '✓' : '✗'}`);
    }
}
 
testZAlgorithm();

Common Implementation Bugs

Watch for: (1) Off-by-one errors in the r update (should be r-1 after the while loop). (2) Forgetting to reset l when entering Case A. (3) Incorrect handling of the Z[k] >= r - i + 1 condition. Testing against these edge cases catches most bugs.

Optimizations and Variations

While the standard Z-algorithm is already optimal, there are practical considerations and variations worth knowing.

Practical Optimizations

•Cache-Friendly Access: The algorithm naturally accesses memory sequentially (for i) and with small jumps (comparing s[r-l] with s[r]), making it cache-efficient.
•Avoid Dynamic Arrays: Pre-allocate the Z-array to avoid resize overhead. In typed languages, prefer primitive arrays over objects.
•Early Termination: For some applications (e.g., finding first occurrence), you can stop once you find what you need, rather than computing the full Z-array.
•Incremental Computation: If the string is extended character by character, the Z-array can be updated incrementally, though this requires careful handling of Z-box updates.

Variation: Right-to-Left Z-Array

Sometimes you need a "reverse Z-array" where Z[i] measures how many characters ending at i match a suffix of the string. This is computed by:

Reverse the string
Compute the standard Z-array
Reverse the resulting Z-array

This is useful for problems involving suffix matching or when you need to know "how much to the left matches the end of the string."

reverse_z_array
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/**
 * Computes the reverse Z-array (suffix matching).
 * reverseZ[i] = length of longest substring ending at i
 * that matches a suffix of the string.
 */
function computeReverseZArray(s: string): number[] {
    const reversed = s.split('').reverse().join('');
    const Z = computeZArray(reversed);
    return Z.reverse();
}
 
// Example
const s = "abxyab";
console.log(`String: ${s}`);
console.log(`Z-array: ${computeZArray(s)}`);
console.log(`Reverse Z-array: ${computeReverseZArray(s)}`);

Summary

We've now mastered the core technical content of the Z-algorithm. Let's consolidate what we've learned.

Key Takeaways

•Information Reuse: The Z-algorithm achieves linear time by reusing previously computed Z-values via the Z-box mechanism.
•Three Cases: The algorithm handles three scenarios—outside Z-box (compute fresh), inside with small Z[k] (copy directly), inside with large Z[k] (extend beyond boundary).
•Amortized Analysis: Despite nested loops, the algorithm does at most 2n character comparisons because each success advances r, which never decreases.
•O(n) Time and Space: Both time and space are optimal for this problem—you must at least read the input and store the output.
•Robustness: Edge cases (empty, single character, all same, all distinct) are handled naturally by the algorithm structure.

What's Next:

With the Z-array computation mastered, we're ready for the exciting part: applications. The next page shows how to use the Z-algorithm for pattern matching—finding all occurrences of a pattern in a text in O(n + m) time.

Page Complete

You now understand how to compute the Z-array in linear time. This is one of the most elegant algorithms in string processing—simple to state, non-obvious to derive, and powerful in application. The next page puts this tool to work.

2 / 4

Loading learning content...

Data Structures & AlgorithmsZ-Algorithm

Z-Algorithm: Linear-Time Pattern Matching

LevelIntermediate

Duration50 mins

TopicZ-Algorithm

2 / 4

Computing Z-Values in O(n)

The Elegance of Linear Time

The key insight? Previously computed Z-values contain information about future positions. By carefully tracking what we already know, we avoid redundant work.

What You Will Learn

The Core Insight: Information Reuse

The Z-algorithm's power comes from a single, profound observation:

When position i falls inside a known Z-box [l, r], the characters S[i..r] are identical to S[i-l..r-l].

This is not approximate—it's an exact match. Why? Because the Z-box exists precisely because S[l..r] = S[0..r-l]. Therefore, any position i inside this box satisfies:

S[i] = S[i-l]
S[i+1] = S[i-l+1]
...
S[r] = S[r-l]

The Mirror Principle

Critical Cases:

When computing Z[i] and i is inside the current Z-box [l, r], we compute k = i - l (the "mirrored position" in the prefix). Then:

Case 1: Z[k] < r - i + 1 (the mirrored match fits entirely within the Z-box)

In this case, Z[i] = Z[k]. We're done—no character comparisons needed!

Why? The match at position k extends Z[k] characters. Since Z[k] < r-i+1, this entire match falls within our known mirror region. The match at position i will be identical.

Case 2: Z[k] ≥ r - i + 1 (the mirrored match reaches or exceeds the Z-box boundary)

In this case, we know Z[i] ≥ r - i + 1, but it might be larger. We must compare characters beyond r to find the true extent.

Why? Beyond position r, we have no mirrored information. The match might continue, or it might not—only direct comparison can tell us.

The Algorithm in Detail

Let's now formalize the Z-algorithm as executable pseudocode, with every decision point explained.

z_algorithm.pseudo

Pseudocode

function computeZArray(S):
    n = length(S)
    Z = array of size n, initialized to 0
    
    // l and r track the rightmost Z-box [l, r]
    // Initially, no Z-box exists
    l = 0
    r = 0
    
    for i = 1 to n - 1:
        if i > r:
            // Case A: i is outside the current Z-box
            // Must compare from scratch
            l = r = i
            while r < n and S[r - l] == S[r]:
                r = r + 1
            Z[i] = r - l
            r = r - 1  // r points to last matching index
        else:
            // Case B: i is inside the Z-box [l, r]
            k = i - l  // mirrored position
            
            if Z[k] < r - i + 1:
                // Case B1: mirrored Z-value fits entirely inside Z-box
                Z[i] = Z[k]
            else:
                // Case B2: mirrored Z-value reaches Z-box boundary
                // Must extend beyond r
                l = i
                while r < n and S[r - l] == S[r]:
                    r = r + 1
                Z[i] = r - l
                r = r - 1
    
    return Z

Understanding Each Case:

Case A: i > r (outside Z-box)

We've moved beyond our rightmost known information. We must compare S[0], S[1], ... with S[i], S[i+1], ... character by character until we find a mismatch. This establishes a new Z-box starting at i.

Case B1: i ≤ r and Z[k] < r - i + 1 (fits inside)

The mirrored Z-value tells us everything. The match at position i will be exactly Z[k] characters because:

The first Z[k] characters match (by the mirror property)
The (Z[k]+1)th character doesn't match (because it didn't at position k, and we're still in the mirrored region)

Case B2: i ≤ r and Z[k] ≥ r - i + 1 (reaches boundary)

The mirrored Z-value extends to or beyond the Z-box boundary. We know at least r-i+1 characters match, but there might be more. We must compare characters beyond r to find the true Z[i].

Complete Implementation

Let's translate the algorithm into production-ready code with comprehensive comments.

z_algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
/**
 * Computes the Z-array for a given string in O(n) time.
 * 
 * Z[i] = length of the longest substring starting at i that
 * matches a prefix of the string.
 * 
 * @param s - The input string
 * @returns The Z-array where Z[0] is set to 0 by convention
 */
function computeZArray(s: string): number[] {
    const n = s.length;
    const Z: number[] = new Array(n).fill(0);
    
    if (n === 0) return Z;
    
    // l and r define the rightmost Z-box [l, r]
    // This is the interval where S[l..r] = S[0..r-l]
    let l = 0;
    let r = 0;
    
    for (let i = 1; i < n; i++) {
        if (i > r) {
            // Case A: i is outside the current Z-box
            // Start fresh comparison from position i
            l = r = i;
            
            // Extend r as long as characters match
            while (r < n && s[r - l] === s[r]) {
                r++;
            }
            
            Z[i] = r - l;
            r--;  // r should point to the last matched index
            
        } else {
            // Case B: i is inside the Z-box [l, r]
            const k = i - l;  // Mirrored position in prefix
            
            if (Z[k] < r - i + 1) {
                // Case B1: Z[k] fits entirely inside [l, r]
                // We can directly use the mirrored value
                Z[i] = Z[k];
                
            } else {
                // Case B2: Z[k] extends to or beyond the boundary
                // We know at least (r - i + 1) characters match
                // Must check beyond r for more matches
                l = i;
                
                while (r < n && s[r - l] === s[r]) {
                    r++;
                }
                
                Z[i] = r - l;
                r--;
            }
        }
    }
    
    return Z;
}
 
// Example usage
const s = "aabxaabxcaab";
console.log(`String: ${s}`);
console.log(`Z-array: [${computeZArray(s).join(', ')}]`);
// Output: Z-array: [0, 1, 0, 0, 4, 1, 0, 0, 0, 3, 1, 0]

Step-by-Step Trace

Let's trace through the algorithm on S = "aabcaab" to see exactly how it works.

Initial State:

S = "aabcaab"
     0123456
Z = [0, 0, 0, 0, 0, 0, 0]
l = 0, r = 0

i = 1: i > r, so we're in Case A

Set l = r = 1
Compare S[0]='a' with S[1]='a' → match, r = 2
Compare S[1]='a' with S[2]='b' → mismatch, stop
Z[1] = 2 - 1 = 1, r = 1
Z = [0, 1, 0, 0, 0, 0, 0], Z-box = [1, 1]

i = 2: i > r, so we're in Case A

Set l = r = 2
Compare S[0]='a' with S[2]='b' → mismatch, stop immediately
Z[2] = 2 - 2 = 0, r = 1 (stays at 2-1=1, but we don't update since no extension)
Z = [0, 1, 0, 0, 0, 0, 0]

i = 3: i > r, so we're in Case A

Set l = r = 3
Compare S[0]='a' with S[3]='c' → mismatch, stop immediately
Z[3] = 0
Z = [0, 1, 0, 0, 0, 0, 0]

i = 4: i > r, so we're in Case A

Set l = r = 4
Compare S[0]='a' with S[4]='a' → match, r = 5
Compare S[1]='a' with S[5]='a' → match, r = 6
Compare S[2]='b' with S[6]='b' → match, r = 7 (but 7 ≥ n, stop)
Z[4] = 7 - 4 = 3, r = 6
Z = [0, 1, 0, 0, 3, 0, 0], Z-box = [4, 6]

i = 5: i ≤ r (5 ≤ 6), so we're in Case B

k = 5 - 4 = 1, Z[k] = Z[1] = 1
Check: Z[1] = 1 < r - i + 1 = 6 - 5 + 1 = 2
Case B1: Z[5] = Z[1] = 1 (no comparisons needed!)
Z = [0, 1, 0, 0, 3, 1, 0]

Power of Information Reuse

Notice at i = 5, we determined Z[5] = 1 without any character comparisons! This is the magic of the Z-algorithm: by leveraging the Z-box, we can often skip entire computations.

i = 6: i ≤ r (6 ≤ 6), so we're in Case B

k = 6 - 4 = 2, Z[k] = Z[2] = 0
Check: Z[2] = 0 < r - i + 1 = 6 - 6 + 1 = 1
Case B1: Z[6] = Z[2] = 0
Z = [0, 1, 0, 0, 3, 1, 0]

Final Z-array: [0, 1, 0, 0, 3, 1, 0]

Complexity Analysis

The claim that the Z-algorithm runs in O(n) time might seem surprising given the nested while loops. Let's prove this rigorously.

Proof of O(n) Time Complexity:

Key Observation: The variable r only increases, never decreases (except by 1 after each loop, which is still net non-decreasing across iterations).

Counting Character Comparisons:

Each character comparison in the inner while loops either:
- Succeeds and increments r by 1, OR
- Fails and exits the loop
Since r starts at 0 and can be at most n, and r never decreases after successful comparisons, there can be at most n successful character comparisons total.
For failed comparisons: each iteration of the outer loop (i = 1 to n-1) can have at most one failed comparison. That's at most n-1 failed comparisons.

Total character comparisons ≤ n (successes) + n-1 (failures) = 2n - 1 = O(n)

The Amortization Insight

Complexity Summary
Metric	Complexity	Explanation
Time Complexity	O(n)	At most 2n character comparisons total (amortized O(1) per position)
Space Complexity	O(n)	The Z-array requires n integers
Auxiliary Space	O(1)	Only l, r, i, k variables needed beyond the output

Comparison with Naive Approach:

Approach	Time Complexity	Character Comparisons (worst case)
Naive	O(n²)	n(n-1)/2 ≈ 500 million for n=10⁶
Z-algorithm	O(n)	2n - 1 ≈ 2 million for n=10⁶

The difference is dramatic: 250× fewer comparisons for n = 10⁶.

Edge Cases and Robustness

A robust implementation must handle all edge cases correctly. Let's examine the critical ones.

Edge Cases to Handle

•Empty string (n = 0): Return empty Z-array. Our implementation handles this with the early return check.
•Single character (n = 1): Return [0]. The loop from i = 1 to n-1 never executes, leaving Z = [0].
•All identical characters (e.g., "aaaa"): Each position after 0 matches the entire remaining prefix. Z = [0, 3, 2, 1].
•All distinct characters (e.g., "abcd"): No position matches even the first character. Z = [0, 0, 0, 0].
•Alternating patterns (e.g., "ababab"): Tests proper Case B handling with overlapping Z-boxes.
•Single repeated prefix (e.g., "abcabc"): Tests proper Z-box extension at the repeat position.

edge_case_tests
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Edge case verification
function testZAlgorithm() {
    const testCases = [
        { input: "", expected: [] },
        { input: "a", expected: [0] },
        { input: "aaaa", expected: [0, 3, 2, 1] },
        { input: "abcd", expected: [0, 0, 0, 0] },
        { input: "ababab", expected: [0, 0, 4, 0, 2, 0] },
        { input: "aabcaab", expected: [0, 1, 0, 0, 3, 1, 0] },
        { input: "abcabc", expected: [0, 0, 0, 3, 0, 0] },
    ];
    
    for (const { input, expected } of testCases) {
        const result = computeZArray(input);
        const passed = JSON.stringify(result) === JSON.stringify(expected);
        console.log(`Input: "${input}" | Expected: [${expected}] | Got: [${result}] | ${passed ? '✓' : '✗'}`);
    }
}
 
testZAlgorithm();

Common Implementation Bugs

Optimizations and Variations

While the standard Z-algorithm is already optimal, there are practical considerations and variations worth knowing.

Practical Optimizations

•Cache-Friendly Access: The algorithm naturally accesses memory sequentially (for i) and with small jumps (comparing s[r-l] with s[r]), making it cache-efficient.
•Avoid Dynamic Arrays: Pre-allocate the Z-array to avoid resize overhead. In typed languages, prefer primitive arrays over objects.
•Early Termination: For some applications (e.g., finding first occurrence), you can stop once you find what you need, rather than computing the full Z-array.
•Incremental Computation: If the string is extended character by character, the Z-array can be updated incrementally, though this requires careful handling of Z-box updates.

Variation: Right-to-Left Z-Array

Sometimes you need a "reverse Z-array" where Z[i] measures how many characters ending at i match a suffix of the string. This is computed by:

Reverse the string
Compute the standard Z-array
Reverse the resulting Z-array

This is useful for problems involving suffix matching or when you need to know "how much to the left matches the end of the string."

reverse_z_array
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/**
 * Computes the reverse Z-array (suffix matching).
 * reverseZ[i] = length of longest substring ending at i
 * that matches a suffix of the string.
 */
function computeReverseZArray(s: string): number[] {
    const reversed = s.split('').reverse().join('');
    const Z = computeZArray(reversed);
    return Z.reverse();
}
 
// Example
const s = "abxyab";
console.log(`String: ${s}`);
console.log(`Z-array: ${computeZArray(s)}`);
console.log(`Reverse Z-array: ${computeReverseZArray(s)}`);

Summary

We've now mastered the core technical content of the Z-algorithm. Let's consolidate what we've learned.

Key Takeaways

•Information Reuse: The Z-algorithm achieves linear time by reusing previously computed Z-values via the Z-box mechanism.
•Three Cases: The algorithm handles three scenarios—outside Z-box (compute fresh), inside with small Z[k] (copy directly), inside with large Z[k] (extend beyond boundary).
•Amortized Analysis: Despite nested loops, the algorithm does at most 2n character comparisons because each success advances r, which never decreases.
•O(n) Time and Space: Both time and space are optimal for this problem—you must at least read the input and store the output.
•Robustness: Edge cases (empty, single character, all same, all distinct) are handled naturally by the algorithm structure.

What's Next:

Page Complete

2 / 4