Connected Components - Learning Module

Loading content...

0/276

DFS/BFS-Based Approach for Finding Connected Components

From Intuition to Implementation

In the previous page, we established that finding connected components fundamentally requires graph traversal—systematically exploring all vertices reachable from a starting point. We developed the intuition that one traversal finds one component, and iterating over unvisited vertices finds them all.

Now we transform that intuition into rigorous, efficient algorithms. We have two primary traversal strategies at our disposal:

Depth-First Search (DFS): Explore as deep as possible before backtracking
Breadth-First Search (BFS): Explore all neighbors at distance 1, then 2, then 3...

Remarkably, both achieve the same goal of finding connected components with the same time complexity. The choice between them comes down to implementation preferences, memory patterns, and secondary information you might want to extract.

What You Will Master

By the end of this page, you will be able to implement connected component finding using both DFS and BFS, understand the trade-offs between approaches, analyze their time and space complexity, and handle the practical considerations that arise in real implementations.

DFS-Based Approach

Depth-First Search explores graphs by going as deep as possible along each branch before backtracking. When applied to connected components, DFS exhibits a characteristic pattern: it dives deep into the graph, marking vertices as visited, then backtracks only when it reaches dead ends (vertices with no unvisited neighbors).

The DFS Strategy for Components:

Maintain a visited array/set to track explored vertices
For each unvisited vertex v:
- Start a new component
- Run DFS from v, marking all reachable vertices as visited
- All vertices visited in this DFS belong to the same component
Repeat until all vertices are visited

Recursive DFS Implementation:

The most natural implementation of DFS uses recursion, where the call stack implicitly manages the "go deep, then backtrack" behavior.

dfs_connected_components
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
/**
 * Find all connected components using DFS (Recursive)
 * Time: O(V + E)  Space: O(V) for visited + O(V) for recursion stack
 */
function findConnectedComponentsDFS(
    graph: Map<number, number[]>
): number[][] {
    const visited = new Set<number>();
    const components: number[][] = [];
    
    // DFS helper function
    function dfs(vertex: number, currentComponent: number[]): void {
        visited.add(vertex);
        currentComponent.push(vertex);
        
        // Explore all unvisited neighbors
        for (const neighbor of graph.get(vertex) || []) {
            if (!visited.has(neighbor)) {
                dfs(neighbor, currentComponent);
            }
        }
    }
    
    // Iterate over all vertices
    for (const vertex of graph.keys()) {
        if (!visited.has(vertex)) {
            // Found a new component - start DFS
            const component: number[] = [];
            dfs(vertex, component);
            components.push(component);
        }
    }
    
    return components;
}
 
// Example usage
const graph = new Map<number, number[]>([
    [0, [1, 2]],
    [1, [0, 2, 3]],
    [2, [0, 1]],
    [3, [1]],
    [4, [5, 6]],
    [5, [4, 6]],
    [6, [4, 5]],
    [7, [8]],
    [8, [7]],
    [9, []], // isolated vertex
]);
 
const components = findConnectedComponentsDFS(graph);
console.log("Components:", components);
// Output: [[0, 1, 2, 3], [4, 5, 6], [7, 8], [9]]

Understanding the Execution:

Let's trace through the algorithm on our example graph:

Start with vertex 0 (first unvisited): DFS explores 0 → 1 → 2 (0 already visited) → 3 → backtrack → backtrack → backtrack. Component 1: {0, 1, 2, 3}
Next unvisited is 4: DFS explores 4 → 5 → 6 → backtrack → backtrack → backtrack. Component 2: {4, 5, 6}
Next unvisited is 7: DFS explores 7 → 8 → backtrack → backtrack. Component 3: {7, 8}
Next unvisited is 9: DFS explores 9 (no neighbors). Component 4: {9}
All vertices visited: Algorithm terminates with 4 components.

Stack Overflow Risk

Recursive DFS uses the call stack proportional to the longest path explored (up to V in worst case). For very large or deep graphs, this can cause stack overflow. In production code with large graphs, use iterative DFS with an explicit stack.

Iterative DFS: Explicit Stack Control

To avoid stack overflow risks and gain more control over the traversal, we can implement DFS iteratively using an explicit stack. The logic remains the same—we're just managing the "backtracking" ourselves rather than relying on recursion.

The Iterative Pattern:

Push the starting vertex onto the stack
While the stack is not empty:
- Pop a vertex v
- If v is not visited: mark it visited, process it, push all unvisited neighbors
When the stack empties, we've explored the entire component

iterative_dfs_components
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
/**
 * Find all connected components using Iterative DFS
 * Time: O(V + E)  Space: O(V) for visited + O(V) for explicit stack
 */
function findConnectedComponentsIterativeDFS(
    graph: Map<number, number[]>
): number[][] {
    const visited = new Set<number>();
    const components: number[][] = [];
    
    for (const startVertex of graph.keys()) {
        if (visited.has(startVertex)) continue;
        
        // Start a new component
        const component: number[] = [];
        const stack: number[] = [startVertex];
        
        while (stack.length > 0) {
            const vertex = stack.pop()!;
            
            // Skip if already visited (may have been added multiple times)
            if (visited.has(vertex)) continue;
            
            visited.add(vertex);
            component.push(vertex);
            
            // Push all unvisited neighbors onto stack
            for (const neighbor of graph.get(vertex) || []) {
                if (!visited.has(neighbor)) {
                    stack.push(neighbor);
                }
            }
        }
        
        components.push(component);
    }
    
    return components;
}
 
// Note: Traversal order differs from recursive DFS due to stack LIFO,
// but component membership is identical.

Observation: Duplicate Stack Entries

In the iterative version, a vertex might be pushed to the stack multiple times (if multiple neighbors point to it) before it's processed. We handle this by checking visited when we pop, not when we push. This is simpler than tracking "in-stack" vertices separately.

Alternative: Check Before Push

You could alternatively check before pushing and avoid duplicates:

optimized_iterative_dfs
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Alternative: Mark visited when pushing (prevents duplicates)
function dfsComponentOptimized(
    startVertex: number,
    graph: Map<number, number[]>,
    visited: Set<number>
): number[] {
    const component: number[] = [];
    const stack: number[] = [startVertex];
    
    // Mark as visited immediately when adding to stack
    visited.add(startVertex);
    
    while (stack.length > 0) {
        const vertex = stack.pop()!;
        component.push(vertex);
        
        for (const neighbor of graph.get(vertex) || []) {
            if (!visited.has(neighbor)) {
                visited.add(neighbor);  // Mark before push
                stack.push(neighbor);
            }
        }
    }
    
    return component;
}

Choose Based on Context

Both approaches work correctly. 'Check on pop' is simpler but may push more items to the stack. 'Mark on push' is slightly more efficient but requires tracking visited status at a different point. For connected components, the difference is negligible.

BFS-Based Approach

Breadth-First Search explores the graph level by level—first all vertices at distance 1 from the start, then distance 2, then distance 3, and so on. For finding connected components, BFS is equally valid as DFS and produces the same components (though vertices may be discovered in different order).

BFS uses a queue instead of a stack, which naturally gives us level-order exploration. This has a bonus: we automatically get shortest path distances from the starting vertex to all other vertices in the component.

The BFS Pattern:

Enqueue the starting vertex and mark it visited
While the queue is not empty:
- Dequeue a vertex v
- For each unvisited neighbor: mark visited and enqueue
When the queue empties, the component is complete

bfs_connected_components
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/**
 * Find all connected components using BFS
 * Time: O(V + E)  Space: O(V) for visited + O(V) for queue
 */
function findConnectedComponentsBFS(
    graph: Map<number, number[]>
): number[][] {
    const visited = new Set<number>();
    const components: number[][] = [];
    
    for (const startVertex of graph.keys()) {
        if (visited.has(startVertex)) continue;
        
        // Start a new component
        const component: number[] = [];
        const queue: number[] = [startVertex];
        visited.add(startVertex);  // Mark visited when enqueuing
        
        while (queue.length > 0) {
            const vertex = queue.shift()!;  // Dequeue from front
            component.push(vertex);
            
            for (const neighbor of graph.get(vertex) || []) {
                if (!visited.has(neighbor)) {
                    visited.add(neighbor);  // Mark before enqueue
                    queue.push(neighbor);   // Enqueue at back
                }
            }
        }
        
        components.push(component);
    }
    
    return components;
}
 
// BFS visits vertices in level order from start:
// Level 0: start vertex
// Level 1: immediate neighbors
// Level 2: neighbors of neighbors
// etc.

BFS with Distance Tracking:

A powerful advantage of BFS: we can track the shortest path distance from the starting vertex to every other vertex in the component "for free".

bfs_with_distances
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
interface ComponentWithDistances {
    vertices: number[];
    distances: Map<number, number>;  // vertex -> distance from start
}
 
function bfsWithDistances(
    startVertex: number,
    graph: Map<number, number[]>,
    visited: Set<number>
): ComponentWithDistances {
    const vertices: number[] = [];
    const distances = new Map<number, number>();
    const queue: number[] = [startVertex];
    
    visited.add(startVertex);
    distances.set(startVertex, 0);
    
    while (queue.length > 0) {
        const vertex = queue.shift()!;
        vertices.push(vertex);
        const currentDist = distances.get(vertex)!;
        
        for (const neighbor of graph.get(vertex) || []) {
            if (!visited.has(neighbor)) {
                visited.add(neighbor);
                distances.set(neighbor, currentDist + 1);
                queue.push(neighbor);
            }
        }
    }
    
    return { vertices, distances };
}

Queue Implementation Matters

In JavaScript/TypeScript, array.shift() is O(n). For large graphs, use a proper deque or queue implementation. In Python, collections.deque has O(1) popleft(). This doesn't change asymptotic complexity but can significantly impact practical performance.

DFS vs BFS: A Detailed Comparison

Both DFS and BFS find connected components correctly with the same time complexity. However, they differ in important ways that might influence your choice.

DFS vs BFS for Connected Components
Aspect	DFS	BFS
Time Complexity	O(V + E)	O(V + E)
Space Complexity	O(V) worst case	O(V) worst case
Data Structure	Stack (implicit/explicit)	Queue
Traversal Order	Deep first, then backtrack	Level by level
Memory Pattern	Stack grows/shrinks	Queue maxes at widest level
Recursion Support	Natural recursive form	Iterative only (typically)
Shortest Paths	Does not provide	Provides as byproduct
Implementation	Often simpler (recursive)	Slightly more code

Choose DFS When

•You prefer recursive solutions
•Stack depth is not a concern (small graphs)
•You need implicit backtracking behavior
•Memory width is a concern (wide graphs)
•You're also detecting cycles or doing topological sort

Choose BFS When

•You need shortest path distances
•You want level-order information
•Graph is very deep (avoid stack overflow)
•Memory depth is a concern (deep graphs)
•You're also finding multi-source shortest paths

Memory Behavior Analysis:

The memory characteristics differ based on graph shape:

Long, thin graph (path-like): DFS stack grows to O(V), BFS queue stays small O(1)
Wide, shallow graph (star-like): DFS stack stays small O(1), BFS queue grows O(V)
Tree graph: Both use O(height) for DFS stack, O(max level width) for BFS queue
Complete graph: Both could need O(V) in worst case

In practice, for "typical" graphs (sparse, irregular), both behave similarly.

Converting Mermaid diagram...

Practical Advice

For connected components specifically, DFS is often preferred due to simpler recursive implementation. The component finding doesn't benefit from BFS's level-order property. However, if you expect very deep graphs or need distances, BFS is better. When in doubt, iterative DFS offers the best balance.

Rigorous Complexity Analysis

Let's formally analyze why both approaches achieve O(V + E) time complexity.

The Key Invariant:

Every vertex is visited exactly once, and every edge is examined exactly twice (once from each endpoint).

Time Complexity Breakdown:

Let V = number of vertices, E = number of edges.

Outer loop: Iterates over all V vertices → O(V)
Checking visited status: O(1) per vertex with hash set
Processing each vertex once: Each vertex enters visited exactly once
Edge examination: When processing vertex v, we examine all edges incident to v

The total edge examinations: sum over all vertices of degree(v) = 2E (each edge counted from both endpoints).

Total: O(V + 2E) = O(V + E)

complexity_proof

Pseudocode

// Why O(V + E)?
 
// Outer loop: executes exactly V times (once per vertex)
for each vertex v in graph:           // O(V) iterations
    if v not visited:                 // O(1) check
        // Inner exploration: each vertex entered exactly once
        explore(v):
            mark v visited            // O(1) per vertex, V total
            for each neighbor u of v: // degree(v) iterations
                if u not visited:     // O(1) check
                    explore(u)        // Only called once per vertex
 
// summing neighbor iterations across all vertices:
// Σ degree(v) for all v = 2E (handshaking lemma)
 
// Total: O(V) for outer loop
//      + O(V) for visiting all vertices
//      + O(2E) for examining all edges
//      = O(V + E)

Space Complexity:

Visited set: O(V) — one entry per vertex
Component storage: O(V) — storing all vertices across all components
Traversal data structure:
- DFS stack: O(V) worst case (for a path graph)
- BFS queue: O(V) worst case (for a star graph)
Recursion stack (recursive DFS only): O(V) worst case

Total Space: O(V)

Why O(V + E) is Optimal:

Any algorithm that finds all connected components must:

Touch every vertex at least once (to assign it to a component)
Examine every edge at least once (to determine connectivity)

Thus, Ω(V + E) is a lower bound, making O(V + E) optimal.

Representation Assumption

The O(V + E) analysis assumes adjacency list representation. With an adjacency matrix, finding neighbors takes O(V) per vertex, giving O(V²) total—demonstrating why representation matters.

Production-Ready Implementation Patterns

Real-world implementations need to handle edge cases, provide useful return formats, and integrate cleanly with larger systems. Let's examine production-quality patterns.

Pattern 1: Return Component IDs Instead of Lists

Often, you don't need the actual vertex lists—you just need to know which component each vertex belongs to. This enables O(1) "same component?" queries.

component_ids
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
interface ComponentResult {
    componentIds: Map<number, number>;  // vertex -> component ID
    componentCount: number;
}
 
function findComponentIds(
    graph: Map<number, number[]>
): ComponentResult {
    const componentIds = new Map<number, number>();
    let componentCount = 0;
    
    function dfs(vertex: number, componentId: number): void {
        componentIds.set(vertex, componentId);
        
        for (const neighbor of graph.get(vertex) || []) {
            if (!componentIds.has(neighbor)) {
                dfs(neighbor, componentId);
            }
        }
    }
    
    for (const vertex of graph.keys()) {
        if (!componentIds.has(vertex)) {
            dfs(vertex, componentCount);
            componentCount++;
        }
    }
    
    return { componentIds, componentCount };
}
 
// Usage: Check if two vertices are in same component
function sameComponent(
    u: number, 
    v: number, 
    componentIds: Map<number, number>
): boolean {
    return componentIds.get(u) === componentIds.get(v);
}

Pattern 2: Class-Based Graph with Component Methods

For larger applications, encapsulating graph operations in a class provides cleaner interfaces and state management.

graph_class
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
class UndirectedGraph {
    private adjacencyList: Map<number, Set<number>> = new Map();
    
    addVertex(v: number): void {
        if (!this.adjacencyList.has(v)) {
            this.adjacencyList.set(v, new Set());
        }
    }
    
    addEdge(u: number, v: number): void {
        this.addVertex(u);
        this.addVertex(v);
        this.adjacencyList.get(u)!.add(v);
        this.adjacencyList.get(v)!.add(u);
    }
    
    getVertices(): number[] {
        return Array.from(this.adjacencyList.keys());
    }
    
    getNeighbors(v: number): number[] {
        return Array.from(this.adjacencyList.get(v) || []);
    }
    
    isConnected(): boolean {
        const vertices = this.getVertices();
        if (vertices.length === 0) return true;
        
        const visited = new Set<number>();
        this.dfsVisit(vertices[0], visited);
        return visited.size === vertices.length;
    }
    
    getComponentCount(): number {
        const visited = new Set<number>();
        let count = 0;
        
        for (const vertex of this.adjacencyList.keys()) {
            if (!visited.has(vertex)) {
                this.dfsVisit(vertex, visited);
                count++;
            }
        }
        
        return count;
    }
    
    getComponents(): number[][] {
        const visited = new Set<number>();
        const components: number[][] = [];
        
        for (const vertex of this.adjacencyList.keys()) {
            if (!visited.has(vertex)) {
                const component: number[] = [];
                this.dfsCollect(vertex, visited, component);
                components.push(component);
            }
        }
        
        return components;
    }
    
    private dfsVisit(v: number, visited: Set<number>): void {
        visited.add(v);
        for (const neighbor of this.adjacencyList.get(v) || []) {
            if (!visited.has(neighbor)) {
                this.dfsVisit(neighbor, visited);
            }
        }
    }
    
    private dfsCollect(
        v: number, 
        visited: Set<number>, 
        component: number[]
    ): void {
        visited.add(v);
        component.push(v);
        for (const neighbor of this.adjacencyList.get(v) || []) {
            if (!visited.has(neighbor)) {
                this.dfsCollect(neighbor, visited, component);
            }
        }
    }
}

API Design Matters

Design your graph API around the operations you need. If you frequently check component membership, cache component IDs. If you need component lists, store them. If the graph changes dynamically, consider Union-Find (covered in Chapter 24) for efficient updates.

Common Pitfalls and How to Avoid Them

Even with a solid understanding of the algorithms, implementation mistakes are common. Let's examine typical pitfalls and their solutions.

Common Mistakes

•Forgetting isolated vertices (no neighbors)
•Not handling disconnected input graphs
•Stack overflow on large recursive DFS
•Using array index as vertex ID (misses non-contiguous IDs)
•Not marking visited before push/enqueue
•Using O(V) shift() in JavaScript arrays
•Mutating graph during traversal

Correct Approaches

•Loop over all vertices in graph.keys()
•Outer loop handles multiple components
•Use iterative DFS for large graphs
•Use Map/dict keyed by actual vertex IDs
•Mark visited at correct point (see earlier)
•Use deque in Python, custom queue in JS
•Work on immutable view or copy if needed

Pitfall Deep Dive: Isolated Vertices

A common bug is to only iterate over vertices that appear in edges. Isolated vertices (degree 0) might not appear in a naively constructed adjacency list.

isolated_vertex_bug
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// BUG: Misses isolated vertices!
function buildGraphBuggy(edges: [number, number][]): Map<number, number[]> {
    const graph = new Map<number, number[]>();
    
    for (const [u, v] of edges) {
        if (!graph.has(u)) graph.set(u, []);
        if (!graph.has(v)) graph.set(v, []);
        graph.get(u)!.push(v);
        graph.get(v)!.push(u);
    }
    
    return graph;
    // Problem: If vertex 5 exists but has no edges, it never appears!
}
 
// CORRECT: Include all vertices explicitly
function buildGraphCorrect(
    vertices: number[],
    edges: [number, number][]
): Map<number, number[]> {
    const graph = new Map<number, number[]>();
    
    // First, add all vertices (including isolated ones)
    for (const v of vertices) {
        graph.set(v, []);
    }
    
    // Then add edges
    for (const [u, v] of edges) {
        graph.get(u)!.push(v);
        graph.get(v)!.push(u);
    }
    
    return graph;
}

Test Your Edge Cases

Always test with: empty graph, single vertex, two vertices (with and without edge), star graph, path graph, complete graph, and graph with isolated vertices. Each reveals different potential bugs.

Summary: Traversal-Based Component Finding

We've covered the fundamental algorithms for finding connected components using graph traversal. Let's consolidate the key points.

Key Takeaways

•DFS and BFS both work: Either traversal strategy finds connected components correctly with time O(V + E).
•One traversal = One component: Starting from any unvisited vertex and exploring all reachable vertices finds exactly one complete component.
•Iterate for all components: Loop over all vertices; start a new traversal from each unvisited vertex to find all components.
•Use iterative DFS for large graphs: Recursive DFS can overflow the stack on deep graphs.
•BFS gives shortest paths free: If you need distances from a source within each component, BFS provides them.
•Component IDs enable O(1) queries: Preprocessing to label vertices with component IDs allows instant same-component checks.
•Representation matters: Adjacency lists give O(V + E); matrices give O(V²). Always prefer lists for sparse graphs.

What's Next:

The next page focuses on counting and labeling components efficiently—building data structures that support fast queries about component membership and sizes. We'll also explore variations like finding the largest component and handling dynamic graphs.

Core Algorithms Mastered

You now have complete implementations of both DFS and BFS approaches to connected component finding. These are foundational algorithms that appear repeatedly in graph problems. Master them thoroughly—they're building blocks for more advanced techniques.