Loading content...
In Kruskal's algorithm, we repeatedly ask a simple question: "Would adding this edge create a cycle?" This is equivalent to asking: "Are the two endpoints of this edge already in the same connected component?"\n\nIf we had to traverse the entire graph to answer this question each time, Kruskal's algorithm would be hopelessly slow—O(E × V) at best. But there's a data structure that answers this question in nearly constant time: the Union-Find (also known as Disjoint Set Union or DSU).\n\nUnion-Find is the unsung hero of Kruskal's algorithm. It transforms what could be an O(E × V) algorithm into an elegant O(E log E) solution, where sorting—not cycle detection—becomes the bottleneck.
By the end of this page, you will understand why cycle detection is equivalent to component membership testing, master the Union-Find data structure with its critical optimizations, implement efficient Find and Union operations, and see how Union-Find enables near-constant-time cycle detection in Kruskal's algorithm.
Before diving into Union-Find, let's formalize the problem it solves and understand why naive approaches fail.\n\nThe Core Question:\n\nGiven an edge (u, v) and a forest F of trees we've built so far, determine:\n- Are u and v in the same tree? If yes, adding (u, v) would create a cycle.\n- Are u and v in different trees? If yes, adding (u, v) is safe and merges the trees.\n\nNaive Approach 1: Graph Traversal\n\nFor each edge (u, v), run BFS/DFS from u to see if we can reach v using edges already in F.\n\n- Time per query: O(V) in the worst case (traverse entire component)\n- Total time: O(E × V)\n- Space: O(V) for visited array\n\nNaive Approach 2: Component Labeling\n\nMaintain an array component[v] indicating which component each vertex belongs to. When we add edge (u, v), merge all vertices in v's component into u's component.\n\n- Time per query: O(1) to check if same component\n- Time per merge: O(V) to update all labels in smaller component\n- Total time: O(V) per merge × (V-1) merges = O(V²)\n- Slightly better, but still too slow for large graphs
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
from collections import dequefrom typing import List, Set, Dict # ═══════════════════════════════════════════════════════════════════# NAIVE APPROACH 1: BFS for each edge query# Time: O(V) per query → O(E × V) total# ═══════════════════════════════════════════════════════════════════def would_create_cycle_bfs( u: int, v: int, adjacency: Dict[int, Set[int]], n: int) -> bool: """ Check if u and v are already connected using BFS. If they are connected, adding edge (u,v) would create a cycle. Time: O(V + edges_in_component) = O(V) worst case """ if u == v: return True visited = [False] * n queue = deque([u]) visited[u] = True while queue: current = queue.popleft() for neighbor in adjacency.get(current, []): if neighbor == v: return True # u and v are connected → cycle! if not visited[neighbor]: visited[neighbor] = True queue.append(neighbor) return False # u and v are not connected → safe to add # ═══════════════════════════════════════════════════════════════════# NAIVE APPROACH 2: Component array with linear-time merges# Time: O(1) per query, O(V) per merge → O(V²) total for V-1 merges# ═══════════════════════════════════════════════════════════════════class NaiveComponentTracker: """ Track components using an array. Queries are O(1), but merges are O(V). """ def __init__(self, n: int): # component[i] = which component vertex i belongs to # Initially, each vertex is its own component self.component = list(range(n)) self.n = n def same_component(self, u: int, v: int) -> bool: """O(1) query.""" return self.component[u] == self.component[v] def merge(self, u: int, v: int) -> None: """ Merge u's component into v's component. O(V) — must update all vertices in u's component. """ old_comp = self.component[u] new_comp = self.component[v] if old_comp == new_comp: return # Update all vertices in old component for i in range(self.n): if self.component[i] == old_comp: self.component[i] = new_comp # ═══════════════════════════════════════════════════════════════════# WHY THESE ARE TOO SLOW# ═══════════════════════════════════════════════════════════════════ def analyze_naive_complexity(): """ For a graph with V=10,000 vertices and E=100,000 edges: Approach 1 (BFS per query): E queries × O(V) per query = 100,000 × 10,000 = 1 billion operations! Approach 2 (Component array): V-1 merges × O(V) per merge = 9,999 × 10,000 = 100 million operations Plus E queries × O(1) per query = 100,000 operations Total: ~100 million operations Union-Find with optimizations: E operations × O(α(V)) ≈ E × 4 = 400,000 operations! Union-Find is 250× faster than the "improved" naive approach! """ passFor a graph with 100,000 edges and 10,000 vertices, naive approaches require 100 million to 1 billion operations. Union-Find with optimizations requires only about 400,000 operations—a difference of 250× to 2500×. This is why Union-Find is essential, not optional.
Union-Find is a data structure that maintains a collection of disjoint sets and supports two primary operations:\n\n1. Find(x): Determine which set element x belongs to (return a "representative" element)\n2. Union(x, y): Merge the sets containing x and y into a single set\n\nThe Key Insight:\n\nTwo elements are in the same set if and only if Find(x) == Find(y). This directly answers our cycle detection question: if Find(u) == Find(v), then u and v are already connected, and adding edge (u, v) would create a cycle.\n\nTree-Based Representation:\n\nUnion-Find represents each set as a tree, where:\n- Each element points to a parent element\n- The root of the tree is the "representative" of the set\n- Find(x) traverses up the tree to find the root\n- Union(x, y) makes one root point to the other\n\nThis tree structure is the foundation of Union-Find's efficiency.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
class BasicUnionFind: """ Basic Union-Find without optimizations. Demonstrates the core concept but is NOT efficient. Time Complexity: - Find: O(n) worst case (degenerate tree becomes linked list) - Union: O(n) worst case (dominated by Find) """ def __init__(self, n: int): """ Initialize n singleton sets. parent[i] = i means i is the root of its tree. """ self.parent = list(range(n)) # Each element is its own parent def find(self, x: int) -> int: """ Find the representative (root) of x's set. Traverse up the parent chain until we reach the root. """ while self.parent[x] != x: x = self.parent[x] return x def union(self, x: int, y: int) -> None: """ Merge the sets containing x and y. Make root of x's tree point to root of y's tree. """ root_x = self.find(x) root_y = self.find(y) if root_x != root_y: self.parent[root_x] = root_y # Arbitrary: make x's root point to y's root def same_set(self, x: int, y: int) -> bool: """Check if x and y are in the same set.""" return self.find(x) == self.find(y) # ═══════════════════════════════════════════════════════════════════# VISUALIZATION OF TREE EVOLUTION# ═══════════════════════════════════════════════════════════════════ def visualize_basic_union_find(): """ Trace through Union-Find operations to see tree structure. """ print("Initial state: 6 singleton sets") print("parent = [0, 1, 2, 3, 4, 5]") print() print(" (0) (1) (2) (3) (4) (5)") print(" 0 1 2 3 4 5") print() uf = BasicUnionFind(6) # Union(0, 1) uf.union(0, 1) print("After Union(0, 1):") print("parent = [1, 1, 2, 3, 4, 5]") print() print(" (1) (2) (3) (4) (5)") print(" /") print(" (0)") print() # Union(2, 3) uf.union(2, 3) print("After Union(2, 3):") print("parent = [1, 1, 3, 3, 4, 5]") print() print(" (1) (3) (4) (5)") print(" / /") print(" (0) (2)") print() # Union(0, 2) - connects trees with roots 1 and 3 uf.union(0, 2) # Find(0)=1, Find(2)=3, so parent[1]=3 print("After Union(0, 2):") print("parent = [1, 3, 3, 3, 4, 5]") print() print(" (3) (4) (5)") print(" / |") print(" (1) (2)") print(" /") print(" (0)") print() # Now Find(0) requires traversing: 0 → 1 → 3 print("Find(0): traverse 0 → 1 → 3, returns 3") print("Find(2): traverse 2 → 3, returns 3") print("Find(0) == Find(2)? True → same component!")Without optimizations, Union-Find trees can become tall and skinny (like linked lists), causing Find to take O(n) time. If we perform unions in an unlucky order (always attaching to the same root), we get a chain of length n. The optimizations in the next sections address this.
Path Compression is the first critical optimization for Union-Find. The idea is simple but powerful: whenever we traverse a path to find the root, we make all nodes along that path point directly to the root.\n\nThe Insight:\n\nAfter calling Find(x), we know the root of x's tree. Every node we visited during the traversal should also point directly to that root—they're all in the same set, so why not give them the shortest possible path?\n\nHow It Works:\n\n1. Find the root by traversing parent pointers\n2. Traverse again, updating each node's parent to point directly to the root\n\nAlternative: Path Splitting / Path Halving\n\nEven simpler variants that achieve similar results:\n- Path Splitting: Each node points to its grandparent during traversal\n- Path Halving: Every other node points to its grandparent\n\nThese are slightly faster in practice (single pass) but achieve the same amortized complexity.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139
class UnionFindPathCompression: """ Union-Find with Path Compression. After Find(x), all nodes on the path from x to root point directly to the root. Time Complexity: - Find: O(log n) amortized (even better with union by rank) - Union: O(log n) amortized """ def __init__(self, n: int): self.parent = list(range(n)) def find(self, x: int) -> int: """ Find with path compression (two-pass). Pass 1: Find the root. Pass 2: Update all nodes on path to point to root. """ # Pass 1: Find root root = x while self.parent[root] != root: root = self.parent[root] # Pass 2: Path compression - update all nodes to point to root while self.parent[x] != root: next_x = self.parent[x] self.parent[x] = root x = next_x return root def find_recursive(self, x: int) -> int: """ Find with path compression (recursive version). More elegant but may cause stack overflow for deep trees. """ if self.parent[x] != x: self.parent[x] = self.find_recursive(self.parent[x]) # Compress path return self.parent[x] def union(self, x: int, y: int) -> bool: """Union two sets. Returns True if they were different sets.""" root_x = self.find(x) root_y = self.find(y) if root_x == root_y: return False self.parent[root_x] = root_y return True # ═══════════════════════════════════════════════════════════════════# PATH COMPRESSION VISUALIZATION# ═══════════════════════════════════════════════════════════════════ def visualize_path_compression(): """ Show how path compression flattens trees. """ print("═" * 60) print("PATH COMPRESSION IN ACTION") print("═" * 60) print() print("BEFORE Find(0):") print() print(" (5) ← root") print(" |") print(" (4)") print(" |") print(" (3)") print(" |") print(" (2)") print(" |") print(" (1)") print(" |") print(" (0) ← querying this node") print() print("parent = [1, 2, 3, 4, 5, 5]") print("Find(0) must traverse: 0→1→2→3→4→5 (5 hops)") print() print("AFTER Find(0) with path compression:") print() print(" (5) ← root") print(" /|||||\\") print(" (0)(1)(2)(3)(4)") print() print("parent = [5, 5, 5, 5, 5, 5]") print("Now Find(0) takes 1 hop, Find(1) takes 1 hop, etc.") print() print("═" * 60) print("The tree went from height 5 to height 1!") print("Future finds for ANY of these nodes will be O(1).") print("═" * 60) # ═══════════════════════════════════════════════════════════════════# PATH SPLITTING AND PATH HALVING VARIANTS# ═══════════════════════════════════════════════════════════════════ class UnionFindPathSplitting: """ Path Splitting: every node points to its grandparent. Single pass, slightly faster in practice. Same amortized complexity as full path compression. """ def __init__(self, n: int): self.parent = list(range(n)) def find(self, x: int) -> int: while self.parent[x] != x: # Point to grandparent and move there next_x = self.parent[x] self.parent[x] = self.parent[next_x] # Point to grandparent x = next_x return x class UnionFindPathHalving: """ Path Halving: every other node points to its grandparent. Single pass with half the updates. Same amortized complexity as full path compression. """ def __init__(self, n: int): self.parent = list(range(n)) def find(self, x: int) -> int: while self.parent[x] != x: self.parent[x] = self.parent[self.parent[x]] # Point to grandparent x = self.parent[x] # Move to grandparent (skipping parent) return xFor most use cases, full path compression (two-pass or recursive) is the best choice. It's simple to implement correctly and provides excellent amortized performance. Path splitting/halving offer minor practical speedups but add implementation complexity.
The second critical optimization is Union by Rank (or Union by Size). The key insight: when merging two trees, attach the smaller tree under the larger tree's root. This prevents trees from becoming too tall.\n\nUnion by Size:\n\nTrack the size (number of elements) of each tree. Always attach the smaller tree to the larger tree.\n\n- When merging trees of sizes a and b, the new tree has size a+b\n- The root of the larger tree becomes the new root\n- Maximum tree height: O(log n) because each time a node's root changes, the tree size at least doubles\n\nUnion by Rank:\n\nTrack the "rank" (upper bound on height) of each tree. Always attach the lower-rank tree to the higher-rank tree.\n\n- More commonly used because it requires updating less information\n- Rank only increases when merging two trees of equal rank\n- Maximum rank: O(log n)\n\nWhy This Guarantees Logarithmic Height:\n\nConsider any node x. Each time x's tree is merged as the smaller tree (so x gets a new root), the tree containing x at least doubles in size. A tree can double at most log₂(n) times before containing all n elements. Therefore, the height is at most O(log n).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171
class UnionFindByRank: """ Union-Find with Union by Rank. Rank is an upper bound on tree height. Always attach lower-rank tree under higher-rank tree. Time Complexity (with path compression): - Find: O(α(n)) amortized ≈ O(1) for all practical n - Union: O(α(n)) amortized """ def __init__(self, n: int): self.parent = list(range(n)) self.rank = [0] * n # Rank starts at 0 (single-node tree has rank 0) def find(self, x: int) -> int: """Find with path compression.""" if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) return self.parent[x] def union(self, x: int, y: int) -> bool: """ Union by rank. Attach smaller-rank tree under larger-rank tree. """ root_x = self.find(x) root_y = self.find(y) if root_x == root_y: return False # Already in same set # Union by rank: attach smaller rank to larger rank if self.rank[root_x] < self.rank[root_y]: self.parent[root_x] = root_y elif self.rank[root_x] > self.rank[root_y]: self.parent[root_y] = root_x else: # Equal ranks: arbitrarily make root_y the new root, increment its rank self.parent[root_x] = root_y self.rank[root_y] += 1 return True class UnionFindBySize: """ Union-Find with Union by Size. Always attach smaller tree under larger tree. Track actual tree sizes rather than rank upper bounds. Useful when you need to know component sizes. """ def __init__(self, n: int): self.parent = list(range(n)) self.size = [1] * n # Each tree initially has size 1 def find(self, x: int) -> int: """Find with path compression.""" if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) return self.parent[x] def union(self, x: int, y: int) -> bool: """ Union by size. Attach smaller tree under larger tree. """ root_x = self.find(x) root_y = self.find(y) if root_x == root_y: return False # Union by size: smaller tree goes under larger tree if self.size[root_x] < self.size[root_y]: self.parent[root_x] = root_y self.size[root_y] += self.size[root_x] else: self.parent[root_y] = root_x self.size[root_x] += self.size[root_y] return True def get_size(self, x: int) -> int: """Get the size of the component containing x.""" return self.size[self.find(x)] # ═══════════════════════════════════════════════════════════════════# WHY UNION BY RANK KEEPS TREES SHORT# ═══════════════════════════════════════════════════════════════════ def demonstrate_height_bound(): """ Demonstrate why union by rank ensures O(log n) height. """ print("═" * 60) print("UNION BY RANK HEIGHT ANALYSIS") print("═" * 60) print() print("Key insight: A tree of rank r has at least 2^r nodes.") print() print("Proof by induction:") print(" Base: Rank 0 tree has 1 = 2^0 node ✓") print(" Step: Rank increases only when merging two rank-r trees") print(" New tree has at least 2^r + 2^r = 2^(r+1) nodes ✓") print() print("Consequence for n nodes:") print(" Maximum rank r satisfies: 2^r ≤ n") print(" Therefore: r ≤ log₂(n)") print() print("With n = 1,000,000 nodes:") print(" Maximum rank = log₂(1,000,000) ≈ 20") print() print("Even before path compression, Find is at most O(20) = O(log n)") print("═" * 60) # ═══════════════════════════════════════════════════════════════════# VISUALIZATION OF UNION BY RANK# ═══════════════════════════════════════════════════════════════════ def visualize_union_by_rank(): """ Show how union by rank keeps trees balanced. """ print("═" * 60) print("UNION BY RANK EXAMPLE") print("═" * 60) print() print("Initial: 8 singleton trees, all rank 0") print("(0) (1) (2) (3) (4) (5) (6) (7)") print(" r=0 r=0 r=0 r=0 r=0 r=0 r=0 r=0") print() print("After Union(0,1), Union(2,3), Union(4,5), Union(6,7):") print(" (1) (3) (5) (7)") print(" | | | |") print(" (0) (2) (4) (6)") print(" r=1 r=1 r=1 r=1") print() print("Equal ranks → rank increased to 1") print() print("After Union(0,2), Union(4,6):") print(" (3) (7)") print(" / | / |") print(" (1)(2) (5)(6)") print(" | |") print(" (0) (4)") print(" r=2 r=2") print() print("After Union(0,4):") print(" (7) ← highest rank wins") print(" __/||\__") print(" (3)(5)(6)") print(" / | |") print(" (1)(2) (4)") print(" |") print(" (0)") print(" r=2 (didn't increase: attached r=2 under r=2)") print() print("Final tree has 8 nodes, rank 2 = log₂(8) = optimal!") print("═" * 60)| Strategy | What to Track | Find Complexity | Advantage |
|---|---|---|---|
| No optimization | Just parent | O(n) worst | Simplest implementation |
| Path compression only | Just parent | O(log n) amortized | Simple, fast in practice |
| Union by rank only | Parent + rank | O(log n) worst | Guaranteed log height |
| Union by size only | Parent + size | O(log n) worst | Know component sizes |
| Both optimizations | Parent + rank | O(α(n)) ≈ O(1) | Optimal (nearly constant) |
When combining path compression with union by rank, the amortized time complexity per operation becomes O(α(n)), where α is the inverse Ackermann function.\n\nWhat is the Inverse Ackermann Function?\n\nThe Ackermann function A(m, n) is a recursively defined function that grows extremely fast—faster than any polynomial, exponential, or even tower of exponentials. Its inverse, α(n), grows correspondingly slowly.\n\nHow Slowly?\n\n- α(1) = 0\n- α(2) = 1\n- α(3) = 1\n- α(4) = 2\n- α(2047) = 3\n- α(A(4,3)) = 4, where A(4,3) is a number with 19,000+ digits\n\nFor all practical purposes, α(n) ≤ 4 for any n representable in a computer.\n\nThe number of atoms in the observable universe (~10^80) has α(10^80) = 4.\n\nThe Practical Implication:\n\nα(n) is effectively constant. We can treat Union-Find operations as O(1) for any realistic input size. This is what makes Kruskal's algorithm O(E log E) where sorting, not Union-Find, is the bottleneck.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
═══════════════════════════════════════════════════════════════════THE ACKERMANN FUNCTION: A(m, n)═══════════════════════════════════════════════════════════════════ Definition: A(0, n) = n + 1 A(m, 0) = A(m-1, 1) A(m, n) = A(m-1, A(m, n-1)) First few values: A(0, n) = n + 1 (addition) A(1, n) = n + 2 (like addition) A(2, n) = 2n + 3 (linear) A(3, n) = 2^(n+3) - 3 (exponential) A(4, n) = 2^2^2^...^2 - 3 (tower of n+3 2's) A(5, n) = ...too large to describe... How fast does it grow? A(4, 2) = 2^65536 - 3 ≈ 10^19728 (a number with nearly 20,000 digits!) A(5, 0) = A(4, A(5, -1)) ...already incomprehensibly large ═══════════════════════════════════════════════════════════════════THE INVERSE ACKERMANN FUNCTION: α(n)═══════════════════════════════════════════════════════════════════ Definition (simplified): α(n) = min { k : A(k, k) ≥ n } Values: n │ α(n) ────────────┼────── 1 │ 0 2 │ 1 3 │ 1 4 │ 2 7 │ 2 8 │ 3 2047 │ 3 2048 │ 3 A(4,3)+1 │ 4 (A(4,3) has ~19,000 digits) ... │ ... For ANY number that can be stored in a computer: α(n) ≤ 4 For the number of atoms in the observable universe: α(10^80) = 4 ═══════════════════════════════════════════════════════════════════PRACTICAL IMPLICATION═══════════════════════════════════════════════════════════════════ When we say Union-Find is O(α(n)) per operation, we mean: For n = 10^9 (1 billion): α(n) = 4 For n = 10^18 (quintillion): α(n) = 4 For n = 10^100: α(n) = 4 It's not O(1), but it's "O(1) for any input you'll ever see." This is why we can treat Union-Find as effectively constant-timeand say that sorting dominates Kruskal's algorithm.The O(α(n)) bound for Union-Find with both optimizations was proven by Tarjan in 1975. It's one of the most elegant results in algorithm analysis, showing that a seemingly simple data structure achieves nearly optimal efficiency for the problem it solves.
Let's put together a complete, production-ready Union-Find implementation optimized for Kruskal's algorithm, along with the full Kruskal's algorithm using it.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191
from typing import List, Tuple, Optional Edge = Tuple[int, int, float] # (u, v, weight) class UnionFind: """ Optimized Union-Find (Disjoint Set Union) for Kruskal's algorithm. Features: - Path compression in find() - Union by rank in union() - O(α(n)) amortized time per operation For Kruskal's algorithm with E edges and V vertices: - E find operations (two per edge check) - At most V-1 union operations - Total: O(E × α(V)) ≈ O(E) since α(V) ≤ 4 """ __slots__ = ['parent', 'rank', 'num_components'] def __init__(self, n: int): """ Initialize n elements in n singleton sets. Args: n: Number of elements (vertices in graph context) """ self.parent = list(range(n)) # parent[i] = i means i is root self.rank = [0] * n # Upper bound on tree height self.num_components = n # Track component count def find(self, x: int) -> int: """ Find the representative (root) of x's component. Uses path compression: all nodes visited point directly to root. Time: O(α(n)) amortized Args: x: Element to find the representative of Returns: The root element representing x's component """ if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) # Path compression return self.parent[x] def union(self, x: int, y: int) -> bool: """ Merge the components containing x and y. Uses union by rank: shorter tree goes under taller tree. Time: O(α(n)) amortized Args: x: First element y: Second element Returns: True if x and y were in different components (merge happened) False if x and y were already in the same component """ root_x = self.find(x) root_y = self.find(y) if root_x == root_y: return False # Already in same component - would create cycle # Union by rank if self.rank[root_x] < self.rank[root_y]: self.parent[root_x] = root_y elif self.rank[root_x] > self.rank[root_y]: self.parent[root_y] = root_x else: self.parent[root_x] = root_y self.rank[root_y] += 1 self.num_components -= 1 return True def connected(self, x: int, y: int) -> bool: """Check if x and y are in the same component.""" return self.find(x) == self.find(y) def kruskal_mst( num_vertices: int, edges: List[Edge]) -> Tuple[List[Edge], float, bool]: """ Compute Minimum Spanning Tree using Kruskal's algorithm. Args: num_vertices: Number of vertices (labeled 0 to n-1) edges: List of edges as (u, v, weight) tuples Returns: Tuple of: - List of MST edges - Total MST weight - Boolean: True if MST found (graph connected), False otherwise Time Complexity: O(E log E + E × α(V)) = O(E log E) Space Complexity: O(V) for Union-Find + O(E) for sorted edges """ # Handle trivial cases if num_vertices == 0: return [], 0.0, True if num_vertices == 1: return [], 0.0, True # Sort edges by weight - O(E log E) sorted_edges = sorted(edges, key=lambda e: e[2]) # Initialize Union-Find - O(V) uf = UnionFind(num_vertices) # Greedily select edges mst_edges: List[Edge] = [] mst_weight = 0.0 edges_needed = num_vertices - 1 for u, v, weight in sorted_edges: # Check if adding this edge would create a cycle if uf.union(u, v): # O(α(V)) per operation # Edge added successfully (no cycle) mst_edges.append((u, v, weight)) mst_weight += weight # Early termination: MST complete if len(mst_edges) == edges_needed: break # Check if MST was found (graph was connected) is_connected = len(mst_edges) == edges_needed return mst_edges, mst_weight, is_connected # ═══════════════════════════════════════════════════════════════════# EXAMPLE USAGE# ═══════════════════════════════════════════════════════════════════ def example(): """ Example: Find MST of a sample graph. """ # 1 4 # A───────B───────C # │\ /│ /│ # │ 3 2 │5 1/ │2 # │ \ / │ / │ # │ X │ / │ # │ / \ │ / │ # │ 4 3 │/ │ # │/ \│ │ # D───────E────────F # 2 3 # Vertices: 0=A, 1=B, 2=C, 3=D, 4=E, 5=F edges = [ (0, 1, 1), # A-B (0, 3, 4), # A-D (0, 4, 3), # A-E (1, 2, 4), # B-C (1, 3, 2), # B-D (1, 4, 5), # B-E (2, 4, 1), # C-E (2, 5, 2), # C-F (3, 4, 2), # D-E (4, 5, 3), # E-F ] mst_edges, total_weight, is_connected = kruskal_mst(6, edges) print("MST Edges:") for u, v, w in mst_edges: print(f" {chr(65+u)}-{chr(65+v)}: weight {w}") print(f"Total Weight: {total_weight}") print(f"Graph Connected: {is_connected}") # Expected: # A-B: 1, B-D: 2, C-E: 1, D-E: 2, C-F: 2 # Total: 8 if __name__ == "__main__": example()This implementation is suitable for production use. It handles edge cases, provides clear documentation, and uses both critical optimizations. The slots declaration in UnionFind reduces memory overhead, and early termination prevents unnecessary work.
Let's trace through a complete example showing exactly how Union-Find detects cycles and enables efficient MST construction.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195
═══════════════════════════════════════════════════════════════════UNION-FIND IN KRUSKAL'S ALGORITHM: COMPLETE TRACE═══════════════════════════════════════════════════════════════════ Graph: 2 6 A───────B───────C │ │ │ 4│ │3 │5 │ │ │ D───────E───────F 1 2 Vertices: A=0, B=1, C=2, D=3, E=4, F=5Edges sorted by weight: DE(1), AB(2), EF(2), BE(3), AD(4), CF(5), BC(6) ═══════════════════════════════════════════════════════════════════INITIAL STATE═══════════════════════════════════════════════════════════════════ parent = [0, 1, 2, 3, 4, 5] (each vertex is its own parent)rank = [0, 0, 0, 0, 0, 0] Trees: (A) (B) (C) (D) (E) (F) [6 singletons] ═══════════════════════════════════════════════════════════════════EDGE 1: D─E (weight 1)═══════════════════════════════════════════════════════════════════ Question: Would adding D─E create a cycle? find(3) = 3 (D is its own root) find(4) = 4 (E is its own root) 3 ≠ 4 → Different components → NO CYCLE Action: union(3, 4) rank[3] = 0, rank[4] = 0 (equal) Make parent[3] = 4, increment rank[4] After: parent = [0, 1, 2, 4, 4, 5] rank = [0, 0, 0, 0, 1, 0] Trees: (A) (B) (C) (E) (F) [5 components] | (D) MST edges: [D─E] ═══════════════════════════════════════════════════════════════════EDGE 2: A─B (weight 2)═══════════════════════════════════════════════════════════════════ Question: Would adding A─B create a cycle? find(0) = 0 (A is its own root) find(1) = 1 (B is its own root) 0 ≠ 1 → Different components → NO CYCLE Action: union(0, 1) rank[0] = 0, rank[1] = 0 (equal) Make parent[0] = 1, increment rank[1] After: parent = [1, 1, 2, 4, 4, 5] rank = [0, 1, 0, 0, 1, 0] Trees: (B) (C) (E) (F) [4 components] | | (A) (D) MST edges: [D─E, A─B] ═══════════════════════════════════════════════════════════════════EDGE 3: E─F (weight 2)═══════════════════════════════════════════════════════════════════ Question: Would adding E─F create a cycle? find(4) = 4 (E is root of its tree) find(5) = 5 (F is its own root) 4 ≠ 5 → Different components → NO CYCLE Action: union(4, 5) rank[4] = 1, rank[5] = 0 Higher rank wins: make parent[5] = 4 After: parent = [1, 1, 2, 4, 4, 4] rank = [0, 1, 0, 0, 1, 0] Trees: (B) (C) (E) [3 components] | / \ (A) (D) (F) MST edges: [D─E, A─B, E─F] ═══════════════════════════════════════════════════════════════════EDGE 4: B─E (weight 3)═══════════════════════════════════════════════════════════════════ Question: Would adding B─E create a cycle? find(1) = 1 (B is root) find(4) = 4 (E is root) 1 ≠ 4 → Different components → NO CYCLE Action: union(1, 4) rank[1] = 1, rank[4] = 1 (equal) Make parent[1] = 4, increment rank[4] After: parent = [1, 4, 2, 4, 4, 4] rank = [0, 1, 0, 0, 2, 0] Trees: (C) (E) [2 components] / | \ (B) (D) (F) | (A) MST edges: [D─E, A─B, E─F, B─E] ═══════════════════════════════════════════════════════════════════EDGE 5: A─D (weight 4)═══════════════════════════════════════════════════════════════════ Question: Would adding A─D create a cycle? find(0) = ? parent[0] = 1 parent[1] = 4 parent[4] = 4 (root!) Path compression: parent[0] = 4, parent[1] = 4 find(0) = 4 find(3) = ? parent[3] = 4 (root!) find(3) = 4 4 = 4 → SAME COMPONENT → WOULD CREATE CYCLE! Action: REJECT edge A─D Trees unchanged (but with path compression applied) After: parent = [4, 4, 2, 4, 4, 4] ← A and B now point directly to E rank = [0, 1, 0, 0, 2, 0] MST edges unchanged: [D─E, A─B, E─F, B─E] ═══════════════════════════════════════════════════════════════════EDGE 6: C─F (weight 5)═══════════════════════════════════════════════════════════════════ Question: Would adding C─F create a cycle? find(2) = 2 (C is its own root) find(5) = ? parent[5] = 4 (root!) find(5) = 4 2 ≠ 4 → Different components → NO CYCLE Action: union(2, 4) rank[2] = 0, rank[4] = 2 Higher rank wins: make parent[2] = 4 After: parent = [4, 4, 4, 4, 4, 4] ← All vertices now point to E! rank = [0, 1, 0, 0, 2, 0] Single tree: (E) / | | | \ (A)(B)(C)(D)(F) MST edges: [D─E, A─B, E─F, B─E, C─F] ═══════════════════════════════════════════════════════════════════TERMINATION═══════════════════════════════════════════════════════════════════ We have 5 edges = V - 1 = 6 - 1 ✓MST is complete! Final MST: D─E: 1 A─B: 2 E─F: 2 B─E: 3 C─F: 5 ─────── Total: 13 Remaining edge BC(6) was never examined due to early termination.Notice how in Step 5, when we rejected edge A─D, path compression flattened the tree. After find(0), both A and B pointed directly to E. This is why subsequent find operations become faster over time—the trees naturally flatten as we query them.
We've explored Union-Find in depth—the data structure that makes Kruskal's algorithm practical. Let's consolidate the key insights:
What's Next:\n\nWith both sorting and cycle detection understood, the final page will bring everything together with a rigorous complexity analysis of Kruskal's algorithm, comparing it to Prim's and examining when each approach is preferable.
You now understand how Union-Find enables efficient cycle detection in Kruskal's algorithm. With path compression and union by rank, each operation takes nearly constant time, making Union-Find the perfect complement to Kruskal's greedy edge selection strategy.