Union Find - Learning Module

Loading content...

0/279

Path Compression Optimization

The Insight That Changes Everything

In the previous page, we identified a critical weakness in naive Union-Find: tree depth can grow linearly with the number of elements, making Find operations painfully slow. A chain of n elements requires n-1 pointer traversals.

But here's a profound insight: every time we traverse a path to find a root, we're gathering valuable information that we can exploit to speed up future queries.

When we find that elements 5, 4, 3, 2, 1, and 0 all belong to the same set with root 0, why should subsequent queries on element 5 traverse the entire path again? We already know the answer—element 5's root is 0.

Path compression captures this insight. During Find, we restructure the tree to make future Finds faster. It's a form of lazy optimization: we do a little extra work now to save a lot of work later.

What You Will Learn

This page covers: (1) The intuition behind path compression, (2) Full path compression implementation, (3) Path splitting and path halving variants, (4) Correctness proof—why compression doesn't break Union-Find, (5) Complexity improvements, and (6) Practical trade-offs between variants.

The Intuition Behind Path Compression

Consider a chain structure where finding element 0 requires traversing through elements 1, 2, 3, 4, and 5 to reach the root:

When we call Find(0), we traverse: 0 → 1 → 2 → 3 → 4 → 5. At the end, we know that the root is 5. All elements along this path belong to the same set.

The key insight: Since all these elements share the same root, why not update their parent pointers to point directly to the root?

After path compression during Find(0):

           [5] ← root
     ┌──┬──┼──┬──┬──┐
    [0][1][2][3][4]

Now every element points directly to the root! The next Find on any of these elements will take O(1) time.

The Self-Adjusting Property

Path compression makes Union-Find a self-adjusting data structure. Each query improves the structure for future queries. Trees naturally flatten over time, making subsequent operations faster. This is reminiscent of splay trees and other self-adjusting structures.

Why does this work?

Path compression exploits a fundamental property of Union-Find: the tree structure is not semantically meaningful. All that matters is which elements share the same root. The specific parent-child relationships don't affect the logical partition—they only affect query performance.

Therefore, we're free to restructure trees however we want, as long as:

Every element in a set can still reach the same root
The root remains unchanged
Elements don't move between sets

Path compression satisfies all three requirements. It only flattens paths within a single tree—it never crosses set boundaries or changes which elements are connected.

Full Path Compression Implementation

The most aggressive form of path compression makes every node on the find path point directly to the root. This can be implemented either recursively or iteratively.

Recursive Implementation:

The recursive version is elegant and naturally embodies path compression through the return value:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
class UnionFindPathCompression {
    private parent: number[];
    
    constructor(n: number) {
        this.parent = Array.from({ length: n }, (_, i) => i);
    }
    
    /**
     * Find with full path compression (recursive).
     * 
     * Every node on the path from x to root gets its parent
     * updated to point directly to the root.
     * 
     * Time Complexity: O(depth) for first call
     * Space Complexity: O(depth) due to recursion stack
     */
    find(x: number): number {
        // Base case: x is the root
        if (this.parent[x] === x) {
            return x;
        }
        
        // Recursive case:
        // 1. Find the root of the tree
        // 2. Set x's parent directly to the root
        // 3. Return the root
        const root = this.find(this.parent[x]);
        this.parent[x] = root;  // Path compression!
        
        return root;
    }
    
    // Union remains unchanged
    union(x: number, y: number): boolean {
        const rootX = this.find(x);
        const rootY = this.find(y);
        
        if (rootX === rootY) return false;
        
        this.parent[rootY] = rootX;
        return true;
    }
}

Iterative Implementation (Two-Pass):

For environments where deep recursion is problematic, we can implement path compression iteratively with two passes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
class UnionFindIterative {
    private parent: number[];
    
    constructor(n: number) {
        this.parent = Array.from({ length: n }, (_, i) => i);
    }
    
    /**
     * Find with full path compression (iterative two-pass).
     * 
     * Pass 1: Follow path to find the root
     * Pass 2: Update all nodes on the path to point to root
     * 
     * Time Complexity: O(depth)
     * Space Complexity: O(depth) for storing the path
     */
    find(x: number): number {
        // Pass 1: Collect all nodes on the path to root
        const path: number[] = [];
        let current = x;
        
        while (this.parent[current] !== current) {
            path.push(current);
            current = this.parent[current];
        }
        const root = current;
        
        // Pass 2: Update all collected nodes to point to root
        for (const node of path) {
            this.parent[node] = root;
        }
        
        return root;
    }
    
    /**
     * Alternative: Single-pass with deferred updates.
     * Uses O(1) extra space by traversing twice without storing.
     */
    findSpaceOptimized(x: number): number {
        // First pass: find root
        let root = x;
        while (this.parent[root] !== root) {
            root = this.parent[root];
        }
        
        // Second pass: update parents along the path
        while (this.parent[x] !== root) {
            const next = this.parent[x];
            this.parent[x] = root;
            x = next;
        }
        
        return root;
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
Initial tree (a chain):
 
        [5] ← root
         |
        [4]
         |
        [3]
         |
        [2]
         |
        [1]
         |
        [0]
 
parent: [1, 2, 3, 4, 5, 5]
 
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
Call: find(0) with path compression
 
Step 1 (Recursion): find(0)
  parent[0] = 1 ≠ 0, so recurse on find(1)
  
Step 2 (Recursion): find(1)
  parent[1] = 2 ≠ 1, so recurse on find(2)
  
Step 3 (Recursion): find(2)
  parent[2] = 3 ≠ 2, so recurse on find(3)
  
Step 4 (Recursion): find(3)
  parent[3] = 4 ≠ 3, so recurse on find(4)
  
Step 5 (Recursion): find(4)
  parent[4] = 5 ≠ 4, so recurse on find(5)
  
Step 6 (Base case): find(5)
  parent[5] = 5 → Root found! Return 5
 
Unwinding and compressing:
 
Step 5 returns: root = 5, set parent[4] = 5
Step 4 returns: root = 5, set parent[3] = 5
Step 3 returns: root = 5, set parent[2] = 5
Step 2 returns: root = 5, set parent[1] = 5
Step 1 returns: root = 5, set parent[0] = 5
 
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
After path compression:
 
           [5] ← root
     ┌──┬──┼──┬──┐
    [0][1][2][3][4]
 
parent: [5, 5, 5, 5, 5, 5]
 
Future find(0), find(1), find(2), find(3), find(4):
All take O(1)! Just one parent lookup to reach root.

The Dramatic Improvement

One Find operation compressed a chain of depth 5 into a star of depth 1. The first Find took 5 steps, but now ALL subsequent Finds take just 1 step. This is the magic of path compression—it amortizes the cost of traversal over future operations.

Path Splitting and Path Halving

Full path compression is maximally aggressive—every node ends up pointing to the root. But there are lighter-weight alternatives that achieve nearly the same asymptotic performance with simpler implementations and no recursion.

Path Splitting:

Make every node on the path point to its grandparent (except the root and its child):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/**
 * Path Splitting: During find, make every node point to its grandparent.
 * 
 * This is a one-pass algorithm that roughly halves the path length
 * each time find is called.
 * 
 * Advantages:
 * - No recursion (bounded stack)
 * - Simple single-pass loop
 * - Same asymptotic performance as full compression
 */
class UnionFindPathSplitting {
    private parent: number[];
    
    constructor(n: number) {
        this.parent = Array.from({ length: n }, (_, i) => i);
    }
    
    find(x: number): number {
        while (this.parent[x] !== x) {
            // Save current parent before modifying
            const next = this.parent[x];
            
            // Point current node to its grandparent
            this.parent[x] = this.parent[this.parent[x]];
            
            // Move to the original parent (now sibling)
            x = next;
        }
        return x;
    }
    
    union(x: number, y: number): boolean {
        const rootX = this.find(x);
        const rootY = this.find(y);
        if (rootX === rootY) return false;
        this.parent[rootY] = rootX;
        return true;
    }
}

Path Halving:

Make every other node on the path point to its grandparent:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/**
 * Path Halving: During find, make every OTHER node point to its grandparent.
 * 
 * Even simpler than path splitting - only half as many updates.
 * Same asymptotic complexity as full compression.
 * 
 * This is often the preferred implementation due to its simplicity.
 */
class UnionFindPathHalving {
    private parent: number[];
    
    constructor(n: number) {
        this.parent = Array.from({ length: n }, (_, i) => i);
    }
    
    find(x: number): number {
        while (this.parent[x] !== x) {
            // Point to grandparent
            this.parent[x] = this.parent[this.parent[x]];
            
            // Skip to grandparent (not just parent)
            x = this.parent[x];
        }
        return x;
    }
    
    union(x: number, y: number): boolean {
        const rootX = this.find(x);
        const rootY = this.find(y);
        if (rootX === rootY) return false;
        this.parent[rootY] = rootX;
        return true;
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
Initial chain:
  [5] → [4] → [3] → [2] → [1] → [0]
  (all arrows point toward root)
 
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
After find(0) with FULL PATH COMPRESSION:
 
           [5] ← root
     ┌──┬──┼──┬──┐
    [0][1][2][3][4]
 
All nodes point directly to root.
Maximum flattening in one operation.
 
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
After find(0) with PATH SPLITTING:
 
        [5]
       / \
     [4] [3]
          \
          [2]
            \
            [1]
              \
              [0]
 
Wait, that's not right. Let me trace this carefully:
 
Starting: parent = [1, 2, 3, 4, 5, 5]
x = 0: parent[0] = parent[parent[0]] = parent[1] = 2
       x = 1 (original parent[0])
x = 1: parent[1] = parent[parent[1]] = parent[2] = 3
       x = 2 (original parent[1])
x = 2: parent[2] = parent[parent[2]] = parent[3] = 4
       x = 3
x = 3: parent[3] = parent[parent[3]] = parent[4] = 5
       x = 4
x = 4: parent[4] = parent[parent[4]] = parent[5] = 5
       x = 5
x = 5: parent[5] = 5 → done!
 
Resulting: parent = [2, 3, 4, 5, 5, 5]
 
        [5]
       / |
     [4][3]
        |
       [2]
        |
       [1]
        |
       [0]
 
Path roughly halved at each step.
 
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
After find(0) with PATH HALVING:
 
Similar effect but with half as many updates.
Every other node updated.

Comparison of Path Compression Variants
Variant	Updates per Find	Recursion Needed?	Passes	Complexity
Full Compression (Recursive)	All nodes on path	Yes	1 (recursive)	O(α(n)) amortized
Full Compression (Iterative)	All nodes on path	No	2	O(α(n)) amortized
Path Splitting	All nodes on path	No	1	O(α(n)) amortized
Path Halving	Half the nodes	No	1	O(α(n)) amortized

Which Variant to Use?

In practice, path halving is often preferred: it's the simplest to implement (just 3 lines in the loop!), uses no extra space, needs no recursion, and achieves the same asymptotic complexity. Unless you have specific requirements, path halving is an excellent default choice.

Correctness Proof — Why Compression Is Safe

Path compression modifies the tree structure during Find operations. This might seem dangerous—are we breaking the Union-Find invariants? Let's prove that compression is safe.

Claim: Path compression maintains all Union-Find correctness properties:

Elements in the same set continue to share the same root
Elements in different sets continue to have different roots
The partition structure remains valid

Proof:

Correctness Arguments

•Same-set elements keep the same root: Path compression only changes parent pointers of nodes along the path from x to its root. All these nodes were already in the same set (reachable to the same root). After compression, they still reach the same root—now directly. The root itself never changes (it remains its own parent).
•Different-set elements stay separated: Path compression only modifies nodes within a single tree. It never creates edges between different trees. Therefore, elements that were in different sets (different trees) remain in different sets.
•Partition validity preserved: No element becomes orphaned (every compressed node points to the valid root). No element joins a different set. The partition remains a valid partition—just represented differently.
•Reachability invariant maintained: Before compression, x could reach root via parent chain. After compression, x points directly to root. The reachability relationship is preserved, just shortened.

Formal Argument:

Let T be the tree containing element x with root r. During Find(x) with path compression:

We identify r as the root by parent[r] = r
For each node v on the path from x to r, we set parent[v] = r
After this, all nodes on the path point directly to r

Before: For any node v on path, Find(v) follows chain v → ... → r = r After: For any node v on path, Find(v) = parent[v] = r

The result is the same! The answer to "what set is v in?" hasn't changed—only the path to compute it has shortened.

Invariants after compression:

parent[r] = r (root unchanged) ✓
For all v in T, following parent pointers reaches r ✓
For all v not in T, paths are unmodified ✓
No cycles created (all paths still tree-structured with unique root) ✓

The Key Insight

The tree structure is just an implementation detail—what matters semantically is which elements share a root. Path compression changes the tree but preserves the root relationships. It's a structure-preserving transformation that optimizes performance without affecting correctness.

Complexity Analysis

Path compression alone already provides significant improvements over the naive implementation. Let's analyze its complexity.

Single Operation Analysis:

First Find on a deep node: O(depth) — we must traverse the full path
Subsequent Finds on compressed nodes: O(1) — direct access to root

Amortized Analysis (Informal):

With path compression alone (no union by rank), we can show that a sequence of m operations on n elements takes O(m log n) time. Here's the intuition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Consider what happens as we perform operations:
 
1. Every Find operation shortens paths for future operations
2. The "work" done in traversing a path is "paid for" by compression
3. A node that gets compressed to depth 1 never needs to be 
   traversed again (for that portion of the path)
 
Accounting argument:
━━━━━━━━━━━━━━━━━━━
 
We can charge each pointer traversal to either:
- The Find operation itself (constant charge)
- The node being moved closer to root (compression credit)
 
Each node can only be "compressed" O(log n) times before it 
reaches the root, because:
- Compression at least halves the distance to root
- Starting distance is at most n
- log(n) halvings reach distance 1
 
Therefore:
- n nodes × O(log n) compressions each = O(n log n) total compression work
- m Find operations × O(1) base cost = O(m)
- Total: O(n log n + m) = O(m log n) for m ≥ n
 
This is already much better than O(mn) for naive implementation!

With Union by Rank:

When we combine path compression with union by rank (covered in the next page), something magical happens. The amortized complexity drops to O(α(n)) per operation, where α is the inverse Ackermann function.

The inverse Ackermann function grows extraordinarily slowly:

n	α(n)
1	0
2	1
4	2
16	3
65,536	4
2^65536	5

For any practical value of n (up to the number of atoms in the universe), α(n) ≤ 4. This means Union-Find with both optimizations has essentially O(1) amortized time per operation.

The Bottom Line

Path compression alone: O(log n) amortized per operation. Path compression + union by rank: O(α(n)) ≈ O(1) amortized per operation. Either way, we've transformed a potentially O(n) operation into something essentially constant. This is one of the most remarkable complexity results in data structure theory.

Implementation Best Practices

Let's consolidate what we've learned into production-quality implementations with best practices.

Complete Implementation with Path Compression:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
/**
 * Union-Find with Path Compression
 * 
 * This implementation uses path halving for simplicity and efficiency.
 * Path halving achieves the same asymptotic complexity as full compression
 * with simpler code and no recursion.
 * 
 * Time Complexity: O(log n) amortized per operation (with compression only)
 * Space Complexity: O(n)
 */
class UnionFind {
    private parent: number[];
    private count: number;
    private size: number[];
    
    constructor(n: number) {
        if (n <= 0) {
            throw new Error("Size must be positive");
        }
        
        this.parent = Array.from({ length: n }, (_, i) => i);
        this.size = new Array(n).fill(1);
        this.count = n;
    }
    
    /**
     * Find root of x with path halving compression.
     * Each node on the path points to its grandparent after this call.
     */
    find(x: number): number {
        this.validateIndex(x);
        
        while (this.parent[x] !== x) {
            // Path halving: point to grandparent
            this.parent[x] = this.parent[this.parent[x]];
            x = this.parent[x];
        }
        return x;
    }
    
    /**
     * Unite the sets containing x and y.
     * @returns true if union occurred, false if already connected
     */
    union(x: number, y: number): boolean {
        const rootX = this.find(x);
        const rootY = this.find(y);
        
        if (rootX === rootY) {
            return false;
        }
        
        // Attach smaller tree under larger (simple heuristic)
        // Full union by rank will be covered next page
        if (this.size[rootX] < this.size[rootY]) {
            this.parent[rootX] = rootY;
            this.size[rootY] += this.size[rootX];
        } else {
            this.parent[rootY] = rootX;
            this.size[rootX] += this.size[rootY];
        }
        
        this.count--;
        return true;
    }
    
    /**
     * Check if x and y are in the same set.
     */
    connected(x: number, y: number): boolean {
        return this.find(x) === this.find(y);
    }
    
    /**
     * Get the number of disjoint sets.
     */
    getCount(): number {
        return this.count;
    }
    
    /**
     * Get the size of the set containing x.
     */
    getSize(x: number): number {
        return this.size[this.find(x)];
    }
    
    private validateIndex(x: number): void {
        if (x < 0 || x >= this.parent.length) {
            throw new RangeError(`Index ${x} out of bounds [0, ${this.parent.length})`);
        }
    }
}

Implementation Guidelines

•Always use some form of path compression — The constant-factor overhead is negligible, and the benefits are enormous. There's no good reason to use naive Find in production.
•Prefer path halving for simplicity — It's 3 lines of code, no recursion, no extra space, and matches the asymptotic complexity of full compression.
•Use iterative implementation in performance-critical code — Avoids function call overhead and potential stack overflow on very deep trees.
•Validate inputs — Throw clear errors for out-of-bounds indices rather than corrupting state or returning undefined behavior.
•Consider combining with union by rank — Path compression is powerful alone, but combining it with union by rank (next page) achieves optimal O(α(n)) complexity.

Common Pitfall: Forgetting Parent Update Order

In path halving, the order of operations matters. Always read parent[parent[x]] BEFORE updating parent[x], or you'll get incorrect results. The update parent[x] = parent[parent[x]] followed by x = parent[x] is correct because we jump to the (new) parent after the update.

Summary and Next Steps

Path compression is a beautiful optimization that transforms Union-Find from a potentially slow O(n) per operation to a much faster O(log n) amortized per operation. Let's consolidate the key insights:

Key Takeaways

•Path compression flattens trees during Find — Every traversal is an opportunity to shorten paths for future queries.
•Three variants exist — Full compression (all nodes to root), path splitting (nodes to grandparent), and path halving (every other node). All achieve the same asymptotic complexity.
•Compression is safe — It only restructures within a single tree, preserving which elements share roots. Correctness is maintained.
•O(log n) amortized with compression alone — Each node can only be compressed O(log n) times before reaching depth 1.
•O(α(n)) with both optimizations — Combined with union by rank (next page), we achieve essentially O(1) per operation.
•Path halving is the practical choice — Simple, efficient, no recursion, same complexity. Use it unless you have specific reasons not to.

What's next:

Path compression optimizes the Find operation. But we still have an issue: the naive Union always attaches one root to another arbitrarily. This can create unbalanced trees where one large tree dominates.

The next page introduces Union by Rank — an optimization for the Union operation that ensures trees stay balanced. When combined with path compression, this yields the remarkable O(α(n)) amortized complexity.

Page Complete

You now deeply understand path compression—the optimization that makes Find fast. You know three variants, can prove correctness, and understand the complexity improvements. Next, we'll learn Union by Rank: the optimization that makes Union smart about which tree goes under which.