Loading content...
Imagine you're building a social network. Users arrive continuously and form friendships. At any moment, someone might ask: "Are Alice and David in the same friend network?" Or consider a computer network where cables connect machines—you need to quickly answer: "Can server A communicate with server B?"
These questions share a common structure. They're not asking about shortest paths, distances, or optimal routes. They're asking something simpler yet profound: Are these two elements connected?
This is the dynamic connectivity problem—one of the most fundamental questions in computer science. And while it sounds simple, solving it efficiently as connections form and dissolve requires a data structure of remarkable elegance: the Union-Find data structure, also known as Disjoint Set Union (DSU).
By the end of this page, you will understand what Union-Find is, why it exists, what problems it solves, and how it represents disjoint sets internally. You'll see why this data structure is considered one of the most beautiful inventions in computer science—combining simplicity with extraordinary efficiency.
Before we dive into the solution, let's precisely define the problem that Union-Find so elegantly addresses.
The Dynamic Connectivity Problem:
You have n elements (numbered 0 to n-1). Initially, each element is in its own isolated set. You must support two operations:
a and b into a single seta belongs to (typically by returning a representative element of that set)With these two operations, you can answer connectivity queries: "Are a and b in the same set?" by checking if Find(a) == Find(b).
The sets in Union-Find are disjoint—no element can belong to more than one set at any time. When we union two sets, we merge them completely. This property is crucial: at any moment, the collection of sets forms a partition of all elements, where every element belongs to exactly one set.
Why is this problem interesting?
At first glance, you might think: "Just use a graph and run BFS or DFS for each connectivity query." But consider the scale:
We need something dramatically faster—operations that approach constant time even as the network grows to billions of elements. This is exactly what Union-Find delivers.
| Approach | Union Time | Find/Query Time | Practicality at Scale |
|---|---|---|---|
| Store list of connections, run BFS | O(1) | O(V + E) | Impractical for frequent queries |
| Maintain full adjacency matrix | O(1) | O(V²) for transitive closure | Memory prohibitive |
| Rebuild connected components | O(V + E) | O(1) after rebuild | Impractical for frequent unions |
| Union-Find with optimizations | α(n) ≈ O(1)* | α(n) ≈ O(1)* | Excellent at any scale |
The α(n) in the table is the inverse Ackermann function—a function that grows so slowly that for all practical purposes (any conceivable number of elements in the universe), α(n) ≤ 4. We'll explore this remarkable efficiency in later pages. For now, understand that Union-Find achieves essentially constant time per operation.
Union-Find (also called Disjoint Set Union or DSU) is a data structure that maintains a collection of disjoint (non-overlapping) sets. It provides near-constant-time operations to:
The brilliance of Union-Find lies in its representation. Rather than explicitly storing set membership lists, it represents each set as a tree. Elements are nodes, and each node points to its parent. The root of each tree serves as the representative (or canonical element) of the set.
Key insight: To find which set an element belongs to, follow parent pointers until you reach the root. Two elements are in the same set if and only if they have the same root.
12345678910111213141516171819202122232425262728293031323334353637
Initially, 5 elements (0-4), each in its own set: [0] [1] [2] [3] [4] ↑ ↑ ↑ ↑ ↑ root root root root root After Union(0, 1) - element 1 now points to 0: [0] [2] [3] [4] ↑ ↑ ↑ ↑ 1 Element 0 is root (representative) of set {0, 1}Elements 2, 3, 4 are each their own root After Union(2, 3) and Union(3, 4): [0] [2] ↑ ↑ 1 3 ↑ 4 Now we have two sets: {0, 1} and {2, 3, 4} After Union(0, 2) - now all elements in one set: [0] ↑ / \ [1] [2] ↑ 3 ↑ 4 All elements connected through root 0The representation is minimal and elegant:
All we need is a parent array where parent[i] stores the parent of element i. If an element is a root, it is its own parent: parent[i] = i.
This single array encodes the entire partition structure:
1234567891011121314151617181920212223242526272829303132333435363738
// The most minimal Union-Find representationclass UnionFindBasic { private parent: number[]; constructor(n: number) { // Initially, each element is its own parent (its own set) this.parent = Array.from({ length: n }, (_, i) => i); // parent = [0, 1, 2, 3, 4] means 5 separate sets } // Find the root (representative) of element x find(x: number): number { // Keep following parent pointers until we reach the root // Root is identified by parent[x] === x while (this.parent[x] !== x) { x = this.parent[x]; } return x; } // Union the sets containing x and y union(x: number, y: number): void { const rootX = this.find(x); const rootY = this.find(y); // If already in the same set, nothing to do if (rootX !== rootY) { // Make rootY's parent be rootX // (arbitrary choice - we'll optimize this later) this.parent[rootY] = rootX; } } // Check if x and y are in the same set connected(x: number, y: number): boolean { return this.find(x) === this.find(y); }}The naive implementation shown above can degenerate into a linked list in the worst case, making Find operations O(n). Imagine unioning elements in sequence: Union(0,1), Union(1,2), Union(2,3), ... This creates a chain where finding element 0 requires traversing n-1 pointers. We'll fix this with optimizations in later pages.
Let's formalize what Union-Find represents mathematically. This abstraction helps us reason about correctness and understand why certain operations make sense.
Set Partition:
A partition of a set S is a collection of non-empty, pairwise disjoint subsets of S whose union equals S. In mathematical notation:
Union-Find maintains exactly such a partition. Every element belongs to exactly one set, and the collection of all sets covers all elements.
Why Trees?
You might wonder why we use trees rather than other representations. Consider the alternatives:
Alternative 1: Explicit set membership lists
setId[i] for each elementAlternative 2: Linked lists with head pointers
The tree representation:
When we add path compression and union by rank (covered in later pages), tree heights become nearly constant, giving us the best of all worlds: near-O(1) for both operations with O(n) space.
Union-Find is a masterclass in implicit data structure design. Instead of maintaining complex explicit relationships, we let the tree structure emerge naturally. The parent array is all we need—everything else (set membership, connectivity, partitioning) is derived from following pointers. This simplicity is what enables the remarkable optimizations.
Understanding Union-Find's invariants is essential for both implementing it correctly and reasoning about its behavior. These invariants hold after every operation completes.
Derived Properties:
From these invariants, we can derive several useful properties:
Property 1: Connectivity is an equivalence relation
Property 2: The partition only becomes coarser
Property 3: The number of unions is bounded
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
// Helper methods to verify Union-Find invariants// Useful for debugging and testing class UnionFindWithVerification { private parent: number[]; private n: number; constructor(n: number) { this.n = n; this.parent = Array.from({ length: n }, (_, i) => i); } // Verify all invariants hold (expensive, for debugging only) verifyInvariants(): boolean { // 1. Check root self-reference for (let i = 0; i < this.n; i++) { const root = this.findWithoutCompression(i); if (this.parent[root] !== root) { console.error(`Invariant violated: root ${root} not self-referential`); return false; } } // 2. Check reachability (no cycles other than root self-loops) for (let i = 0; i < this.n; i++) { const visited = new Set<number>(); let current = i; while (!visited.has(current)) { visited.add(current); if (this.parent[current] === current) break; // Reached root current = this.parent[current]; } // Should have reached a root, not formed a cycle if (this.parent[current] !== current) { console.error(`Invariant violated: cycle detected from ${i}`); return false; } } // 3. Check unique root per set (implicitly guaranteed by structure) // Every element reaches exactly one root by following parents console.log("All invariants satisfied!"); return true; } private findWithoutCompression(x: number): number { while (this.parent[x] !== x) { x = this.parent[x]; } return x; } find(x: number): number { return this.findWithoutCompression(x); } union(x: number, y: number): void { const rootX = this.find(x); const rootY = this.find(y); if (rootX !== rootY) { this.parent[rootY] = rootX; } }}Union-Find isn't just a theoretical construct—it's a workhorse data structure that appears in countless real-world applications. Understanding these applications helps you recognize when Union-Find is the right tool for a problem.
Why Union-Find appears everywhere:
Any problem that involves grouping elements, detecting when elements become connected, or maintaining equivalence classes is a potential Union-Find application. The structure's efficiency makes it practical even for massive datasets.
| Domain | Application | How Union-Find Helps |
|---|---|---|
| Graph Algorithms | Kruskal's MST Algorithm | Detect if adding an edge would create a cycle |
| Graph Algorithms | Connected Components | Track and query component membership dynamically |
| Network Design | Network Connectivity | Determine if two nodes can communicate |
| Image Processing | Percolation Simulation | Model fluid flow through porous materials |
| Social Networks | Friend Circles | Find and count distinct friend networks |
| Compilers | Type Unification | Merge type variables during type inference |
| Gaming | Dynamic Terrain | Track connected regions as terrain changes |
| Clustering | Single-Linkage Clustering | Build hierarchical clusters bottom-up |
Deep Dive: Kruskal's Algorithm
Perhaps the most famous application of Union-Find is in Kruskal's algorithm for finding Minimum Spanning Trees. The algorithm:
Union-Find answers the critical question "Are u and v already connected?" in near-constant time, making Kruskal's algorithm efficient even for dense graphs.
12345678910111213141516171819202122232425262728293031
interface Edge { u: number; v: number; weight: number;} function kruskalMST(n: number, edges: Edge[]): Edge[] { // Sort edges by weight edges.sort((a, b) => a.weight - b.weight); const uf = new UnionFind(n); // Assume optimized Union-Find const mst: Edge[] = []; for (const edge of edges) { // Key question: Are u and v already connected? // Union-Find answers this in near O(1) time if (!uf.connected(edge.u, edge.v)) { uf.union(edge.u, edge.v); mst.push(edge); // MST complete when we have n-1 edges if (mst.length === n - 1) break; } } return mst;} // Without Union-Find, we'd need O(V + E) per connectivity check// With Union-Find, the entire algorithm runs in O(E log E) time// (dominated by sorting, not connectivity checks)Deep Dive: Percolation
Percolation is a fascinating application in computational physics. Imagine a porous material represented as an n×n grid. Each cell can be open (permeable) or closed (blocked). We want to know: Can fluid flow from the top to the bottom?
This happens if and only if there's a path of open cells connecting any top-row cell to any bottom-row cell. As cells randomly open, we use Union-Find to efficiently track when percolation occurs.
The trick: Create two virtual nodes—one connected to all top-row cells, one connected to all bottom-row cells. Percolation occurs when these virtual nodes become connected.
Adding virtual nodes that connect to multiple real nodes is a powerful Union-Find pattern. Instead of checking connectivity between many pairs, you check connectivity between two virtual nodes. This reduces multiple queries to a single query—a technique applicable to many problems.
A well-designed Union-Find interface is clean, intuitive, and expressive. Let's examine the standard API and some useful extensions.
Core Operations:
| Operation | Purpose | Typical Signature |
|---|---|---|
find(x) | Find representative of x's set | find(x: number): number |
union(x, y) | Merge sets containing x and y | union(x: number, y: number): boolean |
connected(x, y) | Check if x and y are in same set | connected(x: number, y: number): boolean |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
interface IUnionFind { // Core operations /** * Find the representative (root) of the set containing x. * Elements in the same set have the same representative. */ find(x: number): number; /** * Merge the sets containing x and y. * Returns true if a merge occurred, false if x and y were already connected. */ union(x: number, y: number): boolean; /** * Check if x and y are in the same set. * Equivalent to: find(x) === find(y) */ connected(x: number, y: number): boolean; // Extended operations (optional but useful) /** * Return the size of the set containing x. * Requires tracking sizes during unions. */ getSize(x: number): number; /** * Return the total number of disjoint sets. * Decreases with each successful union. */ getCount(): number; /** * Get all elements in the same set as x. * Note: This is O(n) as we must scan all elements. */ getSetMembers(x: number): number[];} // Example usagefunction demonstrateUnionFind() { const uf = new UnionFind(10); // 10 elements: 0-9 console.log(uf.getCount()); // 10 (each element is its own set) uf.union(0, 1); uf.union(2, 3); uf.union(4, 5); console.log(uf.getCount()); // 7 (three pairs, four singles) console.log(uf.connected(0, 1)); // true console.log(uf.connected(0, 2)); // false uf.union(1, 3); // Merges {0,1} with {2,3} console.log(uf.connected(0, 2)); // true (now in same set) console.log(uf.getSize(0)); // 4 (set is {0,1,2,3})}Design Considerations:
Return value of union():
Some implementations return void, others return boolean (whether a merge occurred). The boolean version is convenient—it tells you if this was a "new" connection or a redundant one.
Zero-indexed vs One-indexed:
Most implementations use 0-indexed elements (0 to n-1). Be consistent and match your problem's indexing.
Dynamic resizing:
The basic Union-Find has a fixed size set at construction. For dynamic applications, you might need a map-based variant that can grow arbitrarily.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
// When elements aren't contiguous integers from 0 to n-1// Use a Map-based approach for arbitrary keys class DynamicUnionFind<T> { private parent: Map<T, T> = new Map(); private rank: Map<T, number> = new Map(); private count: number = 0; // Lazily initialize elements when first accessed private ensure(x: T): void { if (!this.parent.has(x)) { this.parent.set(x, x); // Self-parent this.rank.set(x, 0); this.count++; } } find(x: T): T { this.ensure(x); // Path compression if (this.parent.get(x) !== x) { this.parent.set(x, this.find(this.parent.get(x)!)); } return this.parent.get(x)!; } union(x: T, y: T): boolean { const rootX = this.find(x); const rootY = this.find(y); if (rootX === rootY) return false; // Union by rank const rankX = this.rank.get(rootX)!; const rankY = this.rank.get(rootY)!; if (rankX < rankY) { this.parent.set(rootX, rootY); } else if (rankX > rankY) { this.parent.set(rootY, rootX); } else { this.parent.set(rootY, rootX); this.rank.set(rootX, rankX + 1); } this.count--; return true; } connected(x: T, y: T): boolean { return this.find(x) === this.find(y); }} // Usage with strings, objects, any hashable typeconst uf = new DynamicUnionFind<string>();uf.union("Alice", "Bob");uf.union("Charlie", "Diana");console.log(uf.connected("Alice", "Bob")); // trueconsole.log(uf.connected("Alice", "Charlie")); // falseUse array-based Union-Find when elements are integers in a known range—it's faster and uses less memory. Use Map-based Union-Find when elements are sparse, non-integer, or the range is unknown. The algorithmic complexity is the same, but array operations have lower constants.
We've established the foundation for understanding Union-Find. Let's consolidate what we've learned:
What's next:
We've defined what Union-Find is. Now we need to understand how it works efficiently. In the next page, we'll dive deep into the Union and Find operations—examining exactly how they work, tracing through examples, and understanding why the naive approach can degrade to linear time per operation.
You now understand what Union-Find is, why it exists, and how it represents disjoint sets. You've seen the basic implementation and its applications. Next, we'll explore the Union and Find operations in detail, setting the stage for the powerful optimizations that make Union-Find truly remarkable.