Connected Components - Learning Module

Loading content...

0/276

Counting and Labeling Components

Beyond Finding: Organizing Component Information

Finding connected components is just the first step. In practice, we often need to answer sophisticated queries about components:

"How many components exist?"
"Which component does vertex v belong to?"
"Are vertices u and v in the same component?"
"What's the size of the largest component?"
"Give me all vertices in vertex v's component."

The naive approach—run a full traversal for each query—is expensive. With proper preprocessing, we can answer these queries in O(1) time.

This page covers techniques for efficiently counting components, labeling vertices with component identifiers, tracking component sizes, and building data structures that support fast queries.

What You Will Master

By the end of this page, you'll be able to preprocess a graph to support constant-time component queries, track component sizes efficiently, find the largest component, and understand when to use component labeling versus Union-Find for dynamic scenarios.

Counting Components Efficiently

The simplest component operation is counting: "How many connected components does this graph have?"

The Algorithm:

Counting components is a natural byproduct of our traversal approach. Each time we start a new traversal from an unvisited vertex, we've found a new component. Just count how many times this happens.

Implementation:

count_components
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/**
 * Count the number of connected components in an undirected graph
 * Time: O(V + E)  Space: O(V)
 */
function countComponents(graph: Map<number, number[]>): number {
    const visited = new Set<number>();
    let componentCount = 0;
    
    function dfs(vertex: number): void {
        visited.add(vertex);
        for (const neighbor of graph.get(vertex) || []) {
            if (!visited.has(neighbor)) {
                dfs(neighbor);
            }
        }
    }
    
    for (const vertex of graph.keys()) {
        if (!visited.has(vertex)) {
            dfs(vertex);
            componentCount++;  // Each new DFS = one new component
        }
    }
    
    return componentCount;
}
 
// Example usage
const graph = new Map([
    [0, [1, 2]], [1, [0, 2]], [2, [0, 1]],  // Component 1
    [3, [4]], [4, [3]],                      // Component 2
    [5, []]                                  // Component 3 (isolated)
]);
 
console.log(countComponents(graph));  // Output: 3

Why This Is Optimal:

Counting requires O(V + E) time—the same as a single full traversal. You cannot count faster without knowing the answer ahead of time, because:

You must check every vertex (to find isolated components)
You must examine edges (to determine connectivity)

Checking Connectivity:

A graph is connected if and only if it has exactly one component. This gives us an efficient connectivity check:

is_connected
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/**
 * Check if a graph is connected (all vertices mutually reachable)
 * Time: O(V + E)  Space: O(V)
 */
function isConnected(graph: Map<number, number[]>): boolean {
    const vertices = Array.from(graph.keys());
    
    // Empty graph or single vertex is connected by convention
    if (vertices.length <= 1) return true;
    
    // Run one DFS from any vertex
    const visited = new Set<number>();
    
    function dfs(v: number): void {
        visited.add(v);
        for (const neighbor of graph.get(v) || []) {
            if (!visited.has(neighbor)) {
                dfs(neighbor);
            }
        }
    }
    
    dfs(vertices[0]);
    
    // Connected iff all vertices were reached
    return visited.size === vertices.length;
}

Early Termination

For connectivity checking, we can stop as soon as we've visited all vertices. In practice, if the graph is connected, DFS will visit all vertices naturally. If it's disconnected, we discover this when DFS terminates but visited.size < vertices.length.

Labeling Vertices with Component IDs

Counting tells us how many components exist, but often we need to know which component each vertex belongs to. By assigning each vertex a component ID during traversal, we enable O(1) queries.

The Labeling Process:

Initialize a component counter to 0
Initialize a mapping: vertex → component ID
For each unvisited vertex:
- Traverse its component, assigning the current component ID to all vertices
- Increment the component counter
Return the mapping

Benefits of Component IDs:

Same component check: Compare IDs (O(1))
Component lookup: Get vertices with same ID
Component size: Count vertices with each ID
Graph partitioning: Process each component separately

label_components
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
interface LabelingResult {
    componentId: Map<number, number>;  // vertex -> component ID
    componentCount: number;
    componentSizes: number[];          // componentSizes[id] = size
}
 
/**
 * Label each vertex with its component ID and compute sizes
 * Time: O(V + E)  Space: O(V)
 */
function labelComponents(graph: Map<number, number[]>): LabelingResult {
    const componentId = new Map<number, number>();
    const componentSizes: number[] = [];
    let currentComponent = 0;
    
    function dfs(vertex: number, componentIdx: number): number {
        componentId.set(vertex, componentIdx);
        let size = 1;  // Count this vertex
        
        for (const neighbor of graph.get(vertex) || []) {
            if (!componentId.has(neighbor)) {
                size += dfs(neighbor, componentIdx);
            }
        }
        
        return size;
    }
    
    for (const vertex of graph.keys()) {
        if (!componentId.has(vertex)) {
            const size = dfs(vertex, currentComponent);
            componentSizes.push(size);
            currentComponent++;
        }
    }
    
    return {
        componentId,
        componentCount: currentComponent,
        componentSizes
    };
}
 
// Usage example
const graph = new Map([
    [0, [1, 2]], [1, [0, 2]], [2, [0, 1]],  // Component 0 (size 3)
    [3, [4]], [4, [3]],                      // Component 1 (size 2)
    [5, []]                                  // Component 2 (size 1)
]);
 
const result = labelComponents(graph);
console.log("Component of vertex 0:", result.componentId.get(0));  // 0
console.log("Component of vertex 4:", result.componentId.get(4));  // 1
console.log("Component sizes:", result.componentSizes);  // [3, 2, 1]
console.log("Same component (0, 2)?", 
    result.componentId.get(0) === result.componentId.get(2));  // true

Preprocessing Cost vs Query Cost

The O(V + E) preprocessing time is paid once. After that, every same-component query is O(1). If you have Q queries on a static graph, total time is O(V + E + Q) instead of O(Q × (V + E)) without preprocessing.

Finding the Largest Component

A common query in network analysis is finding the largest connected component—the biggest cluster of interconnected vertices. In social networks, this is often called the "giant component."

Approach 1: Track Sizes During Traversal

The most efficient approach computes component sizes during the initial traversal, then finds the maximum:

largest_component
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
interface LargestComponentResult {
    vertices: number[];
    size: number;
    componentId: number;
}
 
/**
 * Find the largest connected component
 * Time: O(V + E)  Space: O(V)
 */
function findLargestComponent(
    graph: Map<number, number[]>
): LargestComponentResult | null {
    if (graph.size === 0) return null;
    
    const visited = new Set<number>();
    let largestVertices: number[] = [];
    let largestSize = 0;
    let largestId = -1;
    let currentId = 0;
    
    function dfs(vertex: number, component: number[]): void {
        visited.add(vertex);
        component.push(vertex);
        
        for (const neighbor of graph.get(vertex) || []) {
            if (!visited.has(neighbor)) {
                dfs(neighbor, component);
            }
        }
    }
    
    for (const vertex of graph.keys()) {
        if (!visited.has(vertex)) {
            const component: number[] = [];
            dfs(vertex, component);
            
            if (component.length > largestSize) {
                largestVertices = component;
                largestSize = component.length;
                largestId = currentId;
            }
            
            currentId++;
        }
    }
    
    return {
        vertices: largestVertices,
        size: largestSize,
        componentId: largestId
    };
}

Approach 2: Use Pre-computed Component Sizes

If you've already computed component sizes using the labeling function, finding the largest is trivial:

largest_from_sizes
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
function getLargestComponentFromLabeling(
    result: LabelingResult
): { componentId: number; size: number } {
    let maxSize = 0;
    let maxId = 0;
    
    for (let id = 0; id < result.componentSizes.length; id++) {
        if (result.componentSizes[id] > maxSize) {
            maxSize = result.componentSizes[id];
            maxId = id;
        }
    }
    
    return { componentId: maxId, size: maxSize };
}
 
// Get vertices in a specific component
function getComponentVertices(
    componentId: number,
    labelResult: LabelingResult
): number[] {
    const vertices: number[] = [];
    
    for (const [vertex, id] of labelResult.componentId.entries()) {
        if (id === componentId) {
            vertices.push(vertex);
        }
    }
    
    return vertices;
}

The Giant Component Phenomenon

In many real-world networks, one component contains a significant fraction of all vertices (often 50-90%). This 'giant component' emerges through a phase transition as network density increases. Understanding this helps predict network behavior and guides algorithm design.

Computing Component Statistics

For comprehensive graph analysis, we often need various statistics about components. Let's build a complete component analyzer.

component_analyzer
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
interface ComponentStats {
    totalComponents: number;
    totalVertices: number;
    totalEdges: number;
    componentSizes: number[];
    largestComponentSize: number;
    smallestComponentSize: number;
    averageComponentSize: number;
    isolatedVertices: number;          // Components of size 1
    giantComponentRatio: number;       // Fraction in largest component
}
 
class ComponentAnalyzer {
    private graph: Map<number, number[]>;
    private componentId: Map<number, number>;
    private componentSizes: number[];
    private componentCount: number;
    
    constructor(graph: Map<number, number[]>) {
        this.graph = graph;
        this.componentId = new Map();
        this.componentSizes = [];
        this.componentCount = 0;
        this.analyze();
    }
    
    private analyze(): void {
        let currentComponent = 0;
        
        const dfs = (vertex: number, compId: number): number => {
            this.componentId.set(vertex, compId);
            let size = 1;
            
            for (const neighbor of this.graph.get(vertex) || []) {
                if (!this.componentId.has(neighbor)) {
                    size += dfs(neighbor, compId);
                }
            }
            
            return size;
        };
        
        for (const vertex of this.graph.keys()) {
            if (!this.componentId.has(vertex)) {
                const size = dfs(vertex, currentComponent);
                this.componentSizes.push(size);
                currentComponent++;
            }
        }
        
        this.componentCount = currentComponent;
    }
    
    getStats(): ComponentStats {
        const totalVertices = this.graph.size;
        
        // Count edges (each edge counted once)
        let edgeCount = 0;
        for (const neighbors of this.graph.values()) {
            edgeCount += neighbors.length;
        }
        edgeCount = edgeCount / 2;  // Undirected: divide by 2
        
        const largest = Math.max(...this.componentSizes, 0);
        const smallest = Math.min(...this.componentSizes, 0);
        const average = totalVertices / Math.max(this.componentCount, 1);
        const isolated = this.componentSizes.filter(s => s === 1).length;
        const giantRatio = totalVertices > 0 ? largest / totalVertices : 0;
        
        return {
            totalComponents: this.componentCount,
            totalVertices,
            totalEdges: edgeCount,
            componentSizes: [...this.componentSizes],
            largestComponentSize: largest,
            smallestComponentSize: smallest,
            averageComponentSize: average,
            isolatedVertices: isolated,
            giantComponentRatio: giantRatio
        };
    }
    
    // O(1) queries after preprocessing
    sameComponent(u: number, v: number): boolean {
        return this.componentId.get(u) === this.componentId.get(v);
    }
    
    getComponentOf(vertex: number): number | undefined {
        return this.componentId.get(vertex);
    }
    
    getComponentSize(vertex: number): number {
        const compId = this.componentId.get(vertex);
        if (compId === undefined) return 0;
        return this.componentSizes[compId];
    }
    
    isInLargestComponent(vertex: number): boolean {
        const compId = this.componentId.get(vertex);
        if (compId === undefined) return false;
        const largest = Math.max(...this.componentSizes);
        return this.componentSizes[compId] === largest;
    }
}

Usage Example:

analyzer_usage
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Build a sample graph
const graph = new Map<number, number[]>([
    [0, [1, 2, 3]],
    [1, [0, 2]],
    [2, [0, 1]],
    [3, [0]],
    [4, [5]],
    [5, [4]],
    [6, []],
    [7, []],
]);
 
const analyzer = new ComponentAnalyzer(graph);
const stats = analyzer.getStats();
 
console.log(`Total components: ${stats.totalComponents}`);
// Output: Total components: 4
 
console.log(`Component sizes: ${stats.componentSizes}`);
// Output: Component sizes: [4, 2, 1, 1]
 
console.log(`Giant component ratio: ${(stats.giantComponentRatio * 100).toFixed(1)}%`);
// Output: Giant component ratio: 50.0%
 
console.log(`Isolated vertices: ${stats.isolatedVertices}`);
// Output: Isolated vertices: 2
 
console.log(`Are 0 and 3 connected? ${analyzer.sameComponent(0, 3)}`);
// Output: Are 0 and 3 connected? true
 
console.log(`Are 0 and 4 connected? ${analyzer.sameComponent(0, 4)}`);
// Output: Are 0 and 4 connected? false

When to Compute Statistics

Computing all statistics takes O(V + E) time—the same as basic labeling. If you need multiple statistics, compute them together. If you only need one specific piece of information, you might optimize to compute just that.

Building a Component Inverted Index

Sometimes you need to efficiently retrieve all vertices in a given component. While you can scan the component ID map (O(V)), an inverted index provides O(1) access to any component's vertex list.

Structure:

Forward index: vertex → component ID (what we've built so far)
Inverted index: component ID → list of vertices

Building Both Indices:

inverted_index
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
interface ComponentIndices {
    vertexToComponent: Map<number, number>;  // Forward: vertex -> component
    componentToVertices: number[][];         // Inverted: component -> vertices
    componentCount: number;
}
 
/**
 * Build both forward and inverted component indices
 * Time: O(V + E)  Space: O(V)
 */
function buildComponentIndices(
    graph: Map<number, number[]>
): ComponentIndices {
    const vertexToComponent = new Map<number, number>();
    const componentToVertices: number[][] = [];
    let currentComponent = 0;
    
    function dfs(vertex: number, componentVertices: number[]): void {
        vertexToComponent.set(vertex, currentComponent);
        componentVertices.push(vertex);
        
        for (const neighbor of graph.get(vertex) || []) {
            if (!vertexToComponent.has(neighbor)) {
                dfs(neighbor, componentVertices);
            }
        }
    }
    
    for (const vertex of graph.keys()) {
        if (!vertexToComponent.has(vertex)) {
            const vertices: number[] = [];
            dfs(vertex, vertices);
            componentToVertices.push(vertices);
            currentComponent++;
        }
    }
    
    return {
        vertexToComponent,
        componentToVertices,
        componentCount: currentComponent
    };
}
 
// Usage
const indices = buildComponentIndices(graph);
 
// O(1): Get component ID of a vertex
const compId = indices.vertexToComponent.get(0);  // 0
 
// O(1): Get all vertices in a component
const verticesInComp0 = indices.componentToVertices[0];  // [0, 1, 2, 3]
 
// O(1): Check if two vertices are in same component
const same = indices.vertexToComponent.get(0) === 
             indices.vertexToComponent.get(1);  // true
 
// O(1): Get size of vertex's component
const size = indices.componentToVertices[
    indices.vertexToComponent.get(0)!
].length;  // 4

Trade-off: Memory vs Access Pattern

The inverted index uses O(V) additional space but enables O(1) access to component vertex lists. Only build it if you need to frequently enumerate component members. For just same-component queries or size queries, the forward index suffices.

Handling Edge Cases in Labeling

Robust component labeling must handle various edge cases correctly. Let's examine them systematically.

Edge Cases in Component Labeling
Case	Expected Result	Common Bug
Empty graph (no vertices)	0 components, empty indices	Null pointer / empty iteration failure
Single vertex, no edges	1 component of size 1	Missing isolated vertex handling
All isolated vertices	n components of size 1	Not iterating over all vertices
Fully connected graph	1 component of size n	Inefficient for dense graphs
Non-contiguous vertex IDs	Correct components	Array index out of bounds
Self-loops	No effect on connectivity	Infinite loop if not handled
Parallel edges	No effect on connectivity	Should just work

edge_case_tests
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Edge case test suite
function testEdgeCases(): void {
    // Test 1: Empty graph
    const empty = new Map<number, number[]>();
    const emptyResult = labelComponents(empty);
    console.assert(emptyResult.componentCount === 0);
    console.assert(emptyResult.componentSizes.length === 0);
    
    // Test 2: Single isolated vertex
    const single = new Map([[0, []]]);
    const singleResult = labelComponents(single);
    console.assert(singleResult.componentCount === 1);
    console.assert(singleResult.componentSizes[0] === 1);
    
    // Test 3: All isolated vertices
    const allIsolated = new Map([
        [0, []], [1, []], [2, []], [3, []]
    ]);
    const allIsoResult = labelComponents(allIsolated);
    console.assert(allIsoResult.componentCount === 4);
    console.assert(allIsoResult.componentSizes.every(s => s === 1));
    
    // Test 4: Non-contiguous IDs
    const nonContig = new Map([
        [100, [200, 500]],
        [200, [100]],
        [500, [100]],
        [999, []]
    ]);
    const ncResult = labelComponents(nonContig);
    console.assert(ncResult.componentCount === 2);
    console.assert(ncResult.componentId.get(100) === ncResult.componentId.get(200));
    console.assert(ncResult.componentId.get(100) !== ncResult.componentId.get(999));
    
    // Test 5: Self-loop (vertex with edge to itself)
    const selfLoop = new Map([
        [0, [0, 1]],  // 0 has edge to itself
        [1, [0]]
    ]);
    const slResult = labelComponents(selfLoop);
    console.assert(slResult.componentCount === 1);
    console.assert(slResult.componentSizes[0] === 2);  // Self-loop doesn't add size
    
    console.log("All edge case tests passed!");
}

Self-Loop Handling

Self-loops can cause issues if your DFS doesn't properly check visited status. When vertex 0 has edge to itself, exploring 0's neighbors includes 0—but since we mark 0 visited before exploring neighbors, it's correctly skipped. Always mark visited BEFORE exploring neighbors.

When Labeling Isn't Enough: Union-Find Preview

The labeling approach works beautifully for static graphs—graphs that don't change after the initial preprocessing. But what about dynamic graphs where edges are added or removed?

The Problem with Relabeling:

If edges change, our component labels become invalid. Options:

Recompute from scratch: O(V + E) per update—expensive
Incremental update: Complex and error-prone with DFS labels

Enter Union-Find (Disjoint Set Union):

Union-Find is a data structure specifically designed for dynamic connectivity:

Union: Merge two components when an edge is added → O(α(n)) ≈ O(1)
Find: Determine which component a vertex belongs to → O(α(n)) ≈ O(1)

Here, α(n) is the inverse Ackermann function—grows so slowly it's effectively constant for any practical input size.

Use DFS Labeling When

•Graph is static (no edge changes)
•You need all components at once
•You want to iterate over components
•You need component statistics
•Memory is constrained (slightly more efficient)

Use Union-Find When

•Edges are added dynamically
•You only need same-component queries
•Building graph from edge stream
•Kruskal's MST algorithm
•Real-time connectivity updates

Quick Union-Find Sketch:

union_find_sketch
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
class UnionFind {
    private parent: Map<number, number>;
    private rank: Map<number, number>;
    private componentCount: number;
    
    constructor(vertices: number[]) {
        this.parent = new Map();
        this.rank = new Map();
        this.componentCount = vertices.length;
        
        // Initially, each vertex is its own component
        for (const v of vertices) {
            this.parent.set(v, v);
            this.rank.set(v, 0);
        }
    }
    
    // Find with path compression - O(α(n))
    find(x: number): number {
        if (this.parent.get(x) !== x) {
            this.parent.set(x, this.find(this.parent.get(x)!));
        }
        return this.parent.get(x)!;
    }
    
    // Union by rank - O(α(n))  
    union(x: number, y: number): boolean {
        const rootX = this.find(x);
        const rootY = this.find(y);
        
        if (rootX === rootY) return false;  // Already same component
        
        // Attach smaller tree under larger tree
        const rankX = this.rank.get(rootX)!;
        const rankY = this.rank.get(rootY)!;
        
        if (rankX < rankY) {
            this.parent.set(rootX, rootY);
        } else if (rankX > rankY) {
            this.parent.set(rootY, rootX);
        } else {
            this.parent.set(rootY, rootX);
            this.rank.set(rootX, rankX + 1);
        }
        
        this.componentCount--;
        return true;
    }
    
    connected(x: number, y: number): boolean {
        return this.find(x) === this.find(y);
    }
    
    getComponentCount(): number {
        return this.componentCount;
    }
}
 
// Usage: Build components from edge stream
const uf = new UnionFind([0, 1, 2, 3, 4, 5]);
console.log(uf.getComponentCount());  // 6 (all isolated)
 
uf.union(0, 1);
console.log(uf.getComponentCount());  // 5
 
uf.union(1, 2);
console.log(uf.getComponentCount());  // 4
 
console.log(uf.connected(0, 2));  // true (0-1-2 connected)
console.log(uf.connected(0, 3));  // false

Full Coverage in Chapter 24

Union-Find is covered comprehensively in Chapter 24 (Advanced Graph Algorithms). This preview shows why it exists and when to consider it. For static component finding, the DFS/BFS approach from this module is simpler and sufficient.

Summary: Counting and Labeling Mastery

We've covered comprehensive techniques for counting, labeling, and querying connected components. Let's consolidate the key points.

Key Takeaways

•Counting is a traversal byproduct: Increment a counter each time you start a new DFS/BFS—this counts components in O(V + E).
•Labeling enables O(1) queries: Assign each vertex a component ID during traversal; same-component checks become simple equality tests.
•Track sizes during traversal: Return the size from each DFS call and aggregate; no extra passes needed.
•Inverted indices for enumeration: Map component ID → vertex list for O(1) access to any component's members.
•Handle edge cases carefully: Empty graphs, isolated vertices, self-loops, and non-contiguous IDs all require attention.
•Union-Find for dynamic graphs: When edges change, Union-Find provides efficient incremental updates; DFS labeling requires full recomputation.

What's Next:

The final page of this module explores real-world applications of connected components. We'll see how these algorithms power social network analysis, image processing, network reliability testing, and more. The abstract concepts become concrete through practical examples.

Query Infrastructure Complete

You now have the tools to preprocess any undirected graph and answer component queries in constant time. This is the standard approach in production systems—pay O(V + E) once, then answer unlimited queries in O(1). This pattern of 'expensive preprocessing, cheap queries' appears throughout algorithm design.