Data Structures & AlgorithmsStrongly Connected Components

Strongly Connected Components — Deep Connectivity in Directed Graphs

LevelAdvanced

Duration90 mins

TopicStrongly Connected Components

3 / 5

Kosaraju's Algorithm — An Elegant Two-Pass Approach to Finding SCCs

The Power of Two Traversals

Given a directed graph with V vertices and E edges, can we find all strongly connected components in linear time O(V + E)? The answer is yes, and one of the most elegant solutions is Kosaraju's algorithm.

Named after S. Rao Kosaraju who discovered it in 1978 (though it was published by Micha Sharir in 1981), this algorithm exploits a beautiful property: a graph and its transpose (with all edges reversed) have exactly the same strongly connected components. By running DFS twice—once on the original graph and once on the transpose—we can identify all SCCs with remarkable efficiency.

What You Will Learn

By the end of this page, you will understand the key insight behind Kosaraju's algorithm, the role of finish times in DFS, why the transpose graph preserves SCCs, the complete step-by-step algorithm with proof of correctness, implementation in code with detailed walkthrough, and the precise O(V + E) complexity analysis.

The Key Insight — Finish Times and Ordering

To understand Kosaraju's algorithm, we first need to understand DFS finish times and their remarkable properties.

DFS Finish Times:

When running DFS, each vertex has a finish time—the moment when DFS has fully explored the vertex and all its descendants and is about to backtrack. In a timestamp-based formulation:

Discovery time: when we first visit the vertex
Finish time: when we finish exploring all reachable vertices and return

The Critical Property:

Consider two SCCs, A and B, where there's an edge from some vertex in A to some vertex in B (so A appears before B in a topological sort of the component graph). When we run DFS:

If the DFS starts from (or reaches) component A before starting/reaching component B:

DFS will explore A, possibly jump to B via the cross-component edge, explore all of B, finish B, return to A, and eventually finish A.
The last vertex to finish in A will finish AFTER the last vertex to finish in B.

If the DFS starts from component B first:

DFS explores B, cannot reach A (no edge from B to A), finishes B.
Later DFS (in a new tree of the DFS forest) starts from A, explores A, finishes A.
Again, A finishes after B.

The Universal Rule: If there's an edge from SCC A to SCC B in the component graph, then the vertex with the maximum finish time in A has a higher finish time than any vertex in B.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Graph with two SCCs:
 
    ┌───────────┐         ┌───────────┐
    │ SCC_A     │         │ SCC_B     │
    │   1 → 2   │ ──────→ │   3 → 4   │
    │   ↑   ↓   │         │   ↑   ↓   │
    │   └── 3   │         │   └── 5   │
    └───────────┘         └───────────┘
 
Wait, I have naming confusion. Let me redo:
 
    ┌───────────┐         ┌───────────┐
    │ SCC_A     │         │ SCC_B     │
    │   A → B   │         │   D → E   │
    │   ↑   ↓   │ ──────→ │   ↑   ↓   │
    │   └── C   │   (C→D) │   └── F   │
    └───────────┘         └───────────┘
 
Edge C → D connects SCC_A to SCC_B.
 
DFS from A (starting in SCC_A):
    Visit A (discovery time 1)
    Visit B (discovery time 2)
    Visit C (discovery time 3)
    From C, can go to A (already visited) or D (unvisited)
    Jump to D (discovery time 4)
    Visit E (discovery time 5)
    Visit F (discovery time 6)
    Backtrack: F finishes (finish time 6)
    Backtrack: E finishes (finish time 7)
    Backtrack: D finishes (finish time 8)
    Back to C: C finishes (finish time 9)
    Backtrack: B finishes (finish time 10)
    Backtrack: A finishes (finish time 11)
 
Finish times:
    SCC_A: A=11, B=10, C=9 (max = 11)
    SCC_B: D=8, E=7, F=6   (max = 8)
 
The maximum finish time in SCC_A (11) > maximum in SCC_B (8)
This confirms: SCC_A should be processed first when finding SCCs.

The Ordering Secret

If we process vertices in decreasing order of finish time, we're guaranteed to visit 'source' SCCs (in the component graph) before we can accidentally wander into 'later' SCCs. This is the foundation of Kosaraju's algorithm.

The Transpose Graph — Same SCCs, Reversed Flow

The second key insight of Kosaraju's algorithm involves the transpose graph.

Definition: Transpose Graph

Given G = (V, E), the transpose G^T = (V, E^T) has the same vertices but all edges reversed: E^T = {(v, u) | (u, v) ∈ E}

Critical Theorem: SCCs Are Preserved

G and G^T have exactly the same strongly connected components.

Proof: Let u and v be in the same SCC in G. This means:

There exists a path u → ... → v in G
There exists a path v → ... → u in G

Reversing all edges:

The path u → ... → v in G becomes v → ... → u in G^T
The path v → ... → u in G becomes u → ... → v in G^T

So in G^T:

u can reach v (via the reversed second path)
v can reach u (via the reversed first path)

Thus u and v are in the same SCC in G^T.

The converse follows by the same argument (transpose is its own inverse: (G^T)^T = G).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Original Graph G:
 
    A → B → C
    ↑       ↓
    └───────┘
 
Edges: A→B, B→C, C→A (one SCC: {A, B, C})
 
 
Transpose Graph G^T:
 
    A ← B ← C
    ↓       ↑
    └───────┘
 
Edges: B→A, C→B, A→C (same SCC: {A, B, C})
 
Paths in G:
    A → B → C → A (cycle)
    
Paths in G^T:
    A → C → B → A (cycle, reversed direction)
 
Same vertices are mutually reachable in both graphs!
 
 
Another example with multiple SCCs:
 
Graph G:
    SCC₁: A ⇄ B
          │
          ↓
    SCC₂: C ⇄ D
 
Transpose G^T:
    SCC₁: A ⇄ B
          ↑
          │
    SCC₂: C ⇄ D
 
Same SCCs, but inter-SCC edge is reversed!
This is crucial: SCCs are preserved, but their "dependency order" flips.

Component Graph Also Transposes

While SCCs are preserved, the component graph (condensation) of G^T is the transpose of the component graph of G. Source SCCs in G become sink SCCs in G^T, and vice versa. This reversal is key to the algorithm's correctness.

Kosaraju's Algorithm — Step by Step

Now we can present the complete algorithm. Kosaraju's algorithm consists of three phases:

Phase 1: First DFS on Original Graph G

Run DFS on G, visiting all vertices. Track the finish times of vertices. Store vertices in a stack or list in the order they finish (last to finish goes on top of stack).

Phase 2: Compute Transpose Graph G^T

Build G^T by reversing all edges in G. This takes O(V + E) time.

Phase 3: Second DFS on G^T in Finish-Time Order

Pop vertices from the stack (starting with the highest finish time). For each unvisited vertex, run DFS on G^T and mark all reachable vertices as belonging to the same SCC.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
function kosarajuSCC(
    n: number,
    edges: [number, number][]
): number[][] {
    // Build adjacency list for G
    const graph: number[][] = Array.from({ length: n }, () => []);
    for (const [u, v] of edges) {
        graph[u].push(v);
    }
    
    // PHASE 1: DFS on G to compute finish order
    const visited = new Array(n).fill(false);
    const finishStack: number[] = [];
    
    function dfs1(v: number): void {
        visited[v] = true;
        for (const neighbor of graph[v]) {
            if (!visited[neighbor]) {
                dfs1(neighbor);
            }
        }
        // Key: push when FINISHED (after exploring all descendants)
        finishStack.push(v);
    }
    
    // Run DFS from all vertices to handle disconnected components
    for (let v = 0; v < n; v++) {
        if (!visited[v]) {
            dfs1(v);
        }
    }
    
    // PHASE 2: Build transpose graph G^T
    const transposeGraph: number[][] = Array.from({ length: n }, () => []);
    for (const [u, v] of edges) {
        transposeGraph[v].push(u); // Reverse the edge
    }
    
    // PHASE 3: DFS on G^T in reverse finish order
    const sccVisited = new Array(n).fill(false);
    const sccs: number[][] = [];
    
    function dfs2(v: number, currentSCC: number[]): void {
        sccVisited[v] = true;
        currentSCC.push(v);
        for (const neighbor of transposeGraph[v]) {
            if (!sccVisited[neighbor]) {
                dfs2(neighbor, currentSCC);
            }
        }
    }
    
    // Process vertices in decreasing finish time (pop from stack)
    while (finishStack.length > 0) {
        const v = finishStack.pop()!;
        if (!sccVisited[v]) {
            const currentSCC: number[] = [];
            dfs2(v, currentSCC);
            sccs.push(currentSCC);
        }
    }
    
    return sccs;
}
 
// Example usage:
// const sccs = kosarajuSCC(6, [[0,1], [1,2], [2,0], [2,3], [3,4], [4,5], [5,3]]);
// Result: [[0, 2, 1], [3, 5, 4]] - two SCCs

Stack = Topological Order Approximation

The finish stack from Phase 1 approximates a reverse topological order of the component graph. Vertices from 'sink' SCCs finish first (pushed early), vertices from 'source' SCCs finish last (pushed late, on top). Popping from top processes source SCCs first.

Why It Works — Correctness Proof

Let's rigorously prove that Kosaraju's algorithm correctly identifies all SCCs.

Lemma 1: Finish Time Ordering

If there is an edge from SCC C₁ to SCC C₂ in the component graph (C₁ → C₂), then: max{finish(v) : v ∈ C₁} > max{finish(v) : v ∈ C₂}

Proof: During DFS on G:

If DFS discovers a vertex in C₁ first, it will eventually explore into C₂ (via the cross-component edge), fully explore C₂, then return to continue/finish C₁. Thus C₁ finishes after C₂.
If DFS discovers a vertex in C₂ first, it explores C₂ completely but cannot reach C₁ (no edge from C₂ to C₁). C₂ finishes. Later, a new DFS tree starts from C₁, and C₁ finishes even later.

In both cases, max finish time in C₁ > max finish time in C₂. ∎

Corollary: Stack Order Reflects Component Dependencies

When we pop from the finish stack, we process vertices from "source" SCCs in the component graph before vertices from "sink" SCCs.

Lemma 2: DFS on Transpose Respects SCC Boundaries

When we run DFS on G^T starting from a vertex v in an SCC C, we will visit exactly the vertices in C (assuming vertices from other SCCs reachable in G^T from C are already marked visited).

Proof: Consider what vertices are reachable from v in G^T:

Other vertices in C: Since C is an SCC, all vertices in C are mutually reachable in G, which means they're also mutually reachable in G^T. These vertices are reachable.
Vertices in other SCCs: If C → C' exists in the component graph of G, then C' → C exists in the component graph of G^T. So from v in C, we might reach vertices in SCCs that "precede" C in the component graph of G.

However! By processing vertices in decreasing finish time order:

Vertices in C' (which precedes C in the G's component graph) have higher maximum finish times than C
They were processed (and marked visited) BEFORE we reached v
So they're already visited and won't be included in the current SCC

Thus, DFS from v visits exactly the unvisited vertices in C. ∎

Theorem: Kosaraju's Algorithm Is Correct

For each SCC, exactly its vertices are grouped together.

Proof: Combining Lemmas 1 and 2:

We process SCCs in topological order of the component graph (reversed in G^T means normal order for reaching "downstream" components)
When we start DFS from an unvisited vertex v, only vertices in v's SCC are reachable among unvisited vertices
All vertices in the SCC are reachable (by mutual reachability property)
Therefore, DFS collects exactly the SCC containing v

This holds for every SCC, proving correctness. ∎

The Beautiful Interplay

The algorithm's brilliance lies in how the first DFS creates an ordering where 'important' (upstream) SCCs are processed first, and the second DFS on the transpose 'blocks' expansion into earlier SCCs because they're already visited. Each SCC is discovered in isolation.

Detailed Walkthrough Example

Let's trace through a complete example to see Kosaraju's algorithm in action.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
Graph G with 8 vertices (0-7):
 
    0 → 1       4 → 5
    ↑   ↓       ↑   ↓
    3 ← 2   →   7 ← 6
        ↓
        (2→4 cross edge)
 
Edges: 0→1, 1→2, 2→3, 3→0, 2→4, 4→5, 5→6, 6→7, 7→4
 
Expected SCCs:
    SCC_A: {0, 1, 2, 3} - cycle 0→1→2→3→0
    SCC_B: {4, 5, 6, 7} - cycle 4→5→6→7→4
 
═══════════════════════════════════════════════════
PHASE 1: DFS on G (compute finish order)
═══════════════════════════════════════════════════
 
Start DFS from vertex 0:
    Visit 0
    → Visit 1
      → Visit 2
        → Visit 3
          → Back edge to 0 (already visited)
          FINISH 3 → Stack: [3]
        → Visit 4
          → Visit 5
            → Visit 6
              → Visit 7
                → Back edge to 4 (already visited)
                FINISH 7 → Stack: [3, 7]
              FINISH 6 → Stack: [3, 7, 6]
            FINISH 5 → Stack: [3, 7, 6, 5]
          FINISH 4 → Stack: [3, 7, 6, 5, 4]
        FINISH 2 → Stack: [3, 7, 6, 5, 4, 2]
      FINISH 1 → Stack: [3, 7, 6, 5, 4, 2, 1]
    FINISH 0 → Stack: [3, 7, 6, 5, 4, 2, 1, 0]
 
All vertices visited. Final finish stack (bottom to top):
    [3, 7, 6, 5, 4, 2, 1, 0]
    Pop order: 0, 1, 2, 4, 5, 6, 7, 3
 
═══════════════════════════════════════════════════
PHASE 2: Build transpose graph G^T
═══════════════════════════════════════════════════
 
Original edges reversed:
    0→1 becomes 1→0
    1→2 becomes 2→1
    2→3 becomes 3→2
    3→0 becomes 0→3
    2→4 becomes 4→2
    4→5 becomes 5→4
    5→6 becomes 6→5
    6→7 becomes 7→6
    7→4 becomes 4→7
 
G^T:
    0 ← 1       4 ← 5
    ↓   ↑       ↓   ↑
    3 → 2   ←   7 → 6
        ↑
        (4→2 cross edge)
 
═══════════════════════════════════════════════════
PHASE 3: DFS on G^T in finish order
═══════════════════════════════════════════════════
 
Pop 0: Visit 0 → DFS on G^T
    Visit 0, neighbors in G^T: [3]
    → Visit 3, neighbors: [2]
      → Visit 2, neighbors: [1]
        → Visit 1, neighbors: [0]
          → 0 already visited
        Back from 1
      Back from 2
    Back from 3
    SCC₁ complete: {0, 3, 2, 1}
 
Pop 1: Already visited
 
Pop 2: Already visited
 
Pop 4: Visit 4 → DFS on G^T
    Visit 4, neighbors in G^T: [2, 7]
    → 2 already visited (different SCC, blocked!)
    → Visit 7, neighbors: [6]
      → Visit 6, neighbors: [5]
        → Visit 5, neighbors: [4]
          → 4 already visited
        Back from 5
      Back from 6
    Back from 7
    SCC₂ complete: {4, 7, 6, 5}
 
Pop 5: Already visited
Pop 6: Already visited  
Pop 7: Already visited
Pop 3: Already visited
 
═══════════════════════════════════════════════════
RESULT
═══════════════════════════════════════════════════
 
SCC₁: {0, 1, 2, 3}
SCC₂: {4, 5, 6, 7}
 
Verified correct! ✓

Critical Observation

Notice how in Phase 3, when we explore from vertex 4, the edge 4→2 in G^T exists, but vertex 2 is already visited. This 'blocks' the DFS from including SCC_A's vertices in SCC_B. The finish-time ordering ensures we visit SCC_A first, preventing cross-contamination.

Complexity Analysis

Let's analyze the time and space complexity of Kosaraju's algorithm.

Time Complexity: O(V + E)

Phase 1 — DFS on G:

Visit every vertex exactly once: O(V)
Traverse every edge exactly once: O(E)
Total: O(V + E)

Phase 2 — Build Transpose:

Iterate through all edges once to build G^T: O(E)
Initialize adjacency lists: O(V)
Total: O(V + E)

Phase 3 — DFS on G^T:

Visit every vertex exactly once: O(V)
Traverse every (reversed) edge exactly once: O(E)
Total: O(V + E)

Overall Time: O(V + E) + O(V + E) + O(V + E) = O(V + E)

The algorithm is linear in the size of the graph—optimal for this problem!

Space Complexity: O(V + E)

Graph Storage:

Adjacency list for G: O(V + E)
Adjacency list for G^T: O(V + E)

DFS Auxiliary:

Visited arrays: O(V)
Finish stack: O(V)
Recursion stack (or explicit stack): O(V) in worst case

Total Space: O(V + E)

Space Optimization:

We can reduce space by reusing the visited array and not storing G^T explicitly if we traverse edges in reverse during Phase 3. However, this complicates the implementation and the constant factors in time may increase. The trade-off is rarely worthwhile for practical implementations.

Kosaraju's Algorithm Complexity Summary
Phase	Time	Space	Notes
Phase 1: DFS on G	O(V + E)	O(V)	Builds finish order stack
Phase 2: Build G^T	O(V + E)	O(V + E)	Can be done during Phase 3
Phase 3: DFS on G^T	O(V + E)	O(V)	Identifies SCCs
Total	O(V + E)	O(V + E)	Optimal for SCC detection

Why O(V + E) Is Optimal

Any algorithm that identifies all SCCs must examine every vertex (to assign it to an SCC) and potentially every edge (to determine connectivity). Thus, Ω(V + E) is a lower bound, and Kosaraju's O(V + E) is optimal.

Implementation Variations and Considerations

There are several considerations and variations when implementing Kosaraju's algorithm in practice.

Recursive vs. Iterative DFS:

The recursive implementation is cleaner but may cause stack overflow for very large graphs (deep recursion). An iterative version using an explicit stack handles any graph size:

Handling Large Graphs:

For graphs with millions of vertices:

Use iterative DFS to avoid stack overflow
Consider memory-efficient graph representations
Stream the transpose construction if G doesn't fit in memory

Edge Cases:

Empty graph: Returns no SCCs (or an empty list)
Single vertex, no edges: One trivial SCC {v}
Complete graph: Often one large SCC
DAG: Every vertex is its own SCC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
function kosarajuIterative(
    n: number,
    edges: [number, number][]
): number[][] {
    // Build adjacency lists
    const graph: number[][] = Array.from({ length: n }, () => []);
    const transposeGraph: number[][] = Array.from({ length: n }, () => []);
    
    for (const [u, v] of edges) {
        graph[u].push(v);
        transposeGraph[v].push(u);
    }
    
    // Phase 1: Iterative DFS to compute finish order
    const visited = new Array(n).fill(false);
    const finishStack: number[] = [];
    
    for (let start = 0; start < n; start++) {
        if (visited[start]) continue;
        
        const stack: [number, number][] = [[start, 0]]; // [vertex, neighborIndex]
        
        while (stack.length > 0) {
            const [v, idx] = stack[stack.length - 1];
            
            if (!visited[v]) {
                visited[v] = true;
            }
            
            // Find next unvisited neighbor
            let foundNext = false;
            for (let i = idx; i < graph[v].length; i++) {
                const neighbor = graph[v][i];
                if (!visited[neighbor]) {
                    stack[stack.length - 1] = [v, i + 1]; // Update current
                    stack.push([neighbor, 0]);            // Push neighbor
                    foundNext = true;
                    break;
                }
            }
            
            if (!foundNext) {
                finishStack.push(v);
                stack.pop();
            }
        }
    }
    
    // Phase 2 & 3: DFS on transpose in reverse finish order
    const sccVisited = new Array(n).fill(false);
    const sccs: number[][] = [];
    
    while (finishStack.length > 0) {
        const start = finishStack.pop()!;
        if (sccVisited[start]) continue;
        
        const currentSCC: number[] = [];
        const stack: number[] = [start];
        
        while (stack.length > 0) {
            const v = stack.pop()!;
            if (sccVisited[v]) continue;
            
            sccVisited[v] = true;
            currentSCC.push(v);
            
            for (const neighbor of transposeGraph[v]) {
                if (!sccVisited[neighbor]) {
                    stack.push(neighbor);
                }
            }
        }
        
        sccs.push(currentSCC);
    }
    
    return sccs;
}

Production Considerations

In production code, consider using libraries that provide optimized graph algorithms. If implementing from scratch, the iterative version is preferred for robustness. Also consider returning additional information like vertex-to-SCC mapping for downstream processing.

Comparison with Other Approaches

Kosaraju's algorithm is one of two famous O(V + E) algorithms for finding SCCs. The other is Tarjan's algorithm (covered in the next page). Let's compare them and simpler approaches.

Brute Force: O(V² × (V + E))

For each pair of vertices, run DFS to check mutual reachability. Then cluster mutually reachable vertices. This is extremely slow but trivially correct.

Transitive Closure: O(V³) or O(V × (V + E))

Compute the transitive closure (all pairs reachability). Vertices with identical reachability sets (both reach each other) form SCCs. Better than brute force but still too slow for large graphs.

Kosaraju's Algorithm: O(V + E)

Two DFS passes with transpose. Simple to understand and implement. Requires building the transpose graph (extra space and time, though both are O(V + E)).

Tarjan's Algorithm: O(V + E)

Single DFS pass using low-link values. More complex to understand but more space-efficient (no explicit transpose). We'll cover this next.

SCC Algorithm Comparison
Algorithm	Time	Space	Passes	Difficulty
Brute Force	O(V² × (V + E))	O(V)	2V	Easy
Transitive Closure	O(V³)	O(V²)	1	Medium
Kosaraju	O(V + E)	O(V + E)	2	Medium
Tarjan	O(V + E)	O(V)	1	Hard

When to Use Kosaraju vs Tarjan

Use Kosaraju when code clarity is important and memory is not extremely constrained. Use Tarjan when you need single-pass efficiency or must minimize auxiliary space. Both are O(V + E), so the choice often comes down to implementation preference.

Summary and Looking Ahead

Kosaraju's algorithm is an elegant solution to the SCC problem, leveraging deep insights about DFS finish times and the transpose graph. Let's summarize the key points:

Key Takeaways

•Two DFS passes — First on G for finish order, second on G^T using that order to identify SCCs.
•Finish time ordering — Vertices in 'upstream' SCCs have higher maximum finish times than 'downstream' SCCs.
•Transpose preserves SCCs — G and G^T have identical strongly connected components, but reversed inter-SCC edges.
•O(V + E) complexity — The algorithm is linear, optimal for SCC detection.
•Correctness via blocking — Processing in finish order ensures we can't accidentally include vertices from other SCCs.
•Two implementations — Recursive for clarity, iterative for large graphs.

What's Next:

The next page introduces Tarjan's algorithm—an alternative approach that finds all SCCs in a single DFS pass using the concept of low-link values. Tarjan's algorithm is more space-efficient and is the basis for many other graph algorithms.

Page Complete

You now understand Kosaraju's algorithm in depth: the key insights about finish times and transpose graphs, the three-phase algorithm structure, the correctness proof, complete implementation, and complexity analysis. You can implement this algorithm and explain why it works. Next, we explore Tarjan's alternative approach.

3 / 5

Loading learning content...

Data Structures & AlgorithmsStrongly Connected Components

Strongly Connected Components — Deep Connectivity in Directed Graphs

LevelAdvanced

Duration90 mins

TopicStrongly Connected Components

3 / 5

Kosaraju's Algorithm — An Elegant Two-Pass Approach to Finding SCCs

The Power of Two Traversals

What You Will Learn

The Key Insight — Finish Times and Ordering

To understand Kosaraju's algorithm, we first need to understand DFS finish times and their remarkable properties.

DFS Finish Times:

When running DFS, each vertex has a finish time—the moment when DFS has fully explored the vertex and all its descendants and is about to backtrack. In a timestamp-based formulation:

Discovery time: when we first visit the vertex
Finish time: when we finish exploring all reachable vertices and return

The Critical Property:

Consider two SCCs, A and B, where there's an edge from some vertex in A to some vertex in B (so A appears before B in a topological sort of the component graph). When we run DFS:

If the DFS starts from (or reaches) component A before starting/reaching component B:

DFS will explore A, possibly jump to B via the cross-component edge, explore all of B, finish B, return to A, and eventually finish A.
The last vertex to finish in A will finish AFTER the last vertex to finish in B.

If the DFS starts from component B first:

DFS explores B, cannot reach A (no edge from B to A), finishes B.
Later DFS (in a new tree of the DFS forest) starts from A, explores A, finishes A.
Again, A finishes after B.

The Universal Rule: If there's an edge from SCC A to SCC B in the component graph, then the vertex with the maximum finish time in A has a higher finish time than any vertex in B.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Graph with two SCCs:
 
    ┌───────────┐         ┌───────────┐
    │ SCC_A     │         │ SCC_B     │
    │   1 → 2   │ ──────→ │   3 → 4   │
    │   ↑   ↓   │         │   ↑   ↓   │
    │   └── 3   │         │   └── 5   │
    └───────────┘         └───────────┘
 
Wait, I have naming confusion. Let me redo:
 
    ┌───────────┐         ┌───────────┐
    │ SCC_A     │         │ SCC_B     │
    │   A → B   │         │   D → E   │
    │   ↑   ↓   │ ──────→ │   ↑   ↓   │
    │   └── C   │   (C→D) │   └── F   │
    └───────────┘         └───────────┘
 
Edge C → D connects SCC_A to SCC_B.
 
DFS from A (starting in SCC_A):
    Visit A (discovery time 1)
    Visit B (discovery time 2)
    Visit C (discovery time 3)
    From C, can go to A (already visited) or D (unvisited)
    Jump to D (discovery time 4)
    Visit E (discovery time 5)
    Visit F (discovery time 6)
    Backtrack: F finishes (finish time 6)
    Backtrack: E finishes (finish time 7)
    Backtrack: D finishes (finish time 8)
    Back to C: C finishes (finish time 9)
    Backtrack: B finishes (finish time 10)
    Backtrack: A finishes (finish time 11)
 
Finish times:
    SCC_A: A=11, B=10, C=9 (max = 11)
    SCC_B: D=8, E=7, F=6   (max = 8)
 
The maximum finish time in SCC_A (11) > maximum in SCC_B (8)
This confirms: SCC_A should be processed first when finding SCCs.

The Ordering Secret

The Transpose Graph — Same SCCs, Reversed Flow

The second key insight of Kosaraju's algorithm involves the transpose graph.

Definition: Transpose Graph

Given G = (V, E), the transpose G^T = (V, E^T) has the same vertices but all edges reversed: E^T = {(v, u) | (u, v) ∈ E}

Critical Theorem: SCCs Are Preserved

G and G^T have exactly the same strongly connected components.

Proof: Let u and v be in the same SCC in G. This means:

There exists a path u → ... → v in G
There exists a path v → ... → u in G

Reversing all edges:

The path u → ... → v in G becomes v → ... → u in G^T
The path v → ... → u in G becomes u → ... → v in G^T

So in G^T:

u can reach v (via the reversed second path)
v can reach u (via the reversed first path)

Thus u and v are in the same SCC in G^T.

The converse follows by the same argument (transpose is its own inverse: (G^T)^T = G).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Original Graph G:
 
    A → B → C
    ↑       ↓
    └───────┘
 
Edges: A→B, B→C, C→A (one SCC: {A, B, C})
 
 
Transpose Graph G^T:
 
    A ← B ← C
    ↓       ↑
    └───────┘
 
Edges: B→A, C→B, A→C (same SCC: {A, B, C})
 
Paths in G:
    A → B → C → A (cycle)
    
Paths in G^T:
    A → C → B → A (cycle, reversed direction)
 
Same vertices are mutually reachable in both graphs!
 
 
Another example with multiple SCCs:
 
Graph G:
    SCC₁: A ⇄ B
          │
          ↓
    SCC₂: C ⇄ D
 
Transpose G^T:
    SCC₁: A ⇄ B
          ↑
          │
    SCC₂: C ⇄ D
 
Same SCCs, but inter-SCC edge is reversed!
This is crucial: SCCs are preserved, but their "dependency order" flips.

Component Graph Also Transposes

Kosaraju's Algorithm — Step by Step

Now we can present the complete algorithm. Kosaraju's algorithm consists of three phases:

Phase 1: First DFS on Original Graph G

Run DFS on G, visiting all vertices. Track the finish times of vertices. Store vertices in a stack or list in the order they finish (last to finish goes on top of stack).

Phase 2: Compute Transpose Graph G^T

Build G^T by reversing all edges in G. This takes O(V + E) time.

Phase 3: Second DFS on G^T in Finish-Time Order

Pop vertices from the stack (starting with the highest finish time). For each unvisited vertex, run DFS on G^T and mark all reachable vertices as belonging to the same SCC.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
function kosarajuSCC(
    n: number,
    edges: [number, number][]
): number[][] {
    // Build adjacency list for G
    const graph: number[][] = Array.from({ length: n }, () => []);
    for (const [u, v] of edges) {
        graph[u].push(v);
    }
    
    // PHASE 1: DFS on G to compute finish order
    const visited = new Array(n).fill(false);
    const finishStack: number[] = [];
    
    function dfs1(v: number): void {
        visited[v] = true;
        for (const neighbor of graph[v]) {
            if (!visited[neighbor]) {
                dfs1(neighbor);
            }
        }
        // Key: push when FINISHED (after exploring all descendants)
        finishStack.push(v);
    }
    
    // Run DFS from all vertices to handle disconnected components
    for (let v = 0; v < n; v++) {
        if (!visited[v]) {
            dfs1(v);
        }
    }
    
    // PHASE 2: Build transpose graph G^T
    const transposeGraph: number[][] = Array.from({ length: n }, () => []);
    for (const [u, v] of edges) {
        transposeGraph[v].push(u); // Reverse the edge
    }
    
    // PHASE 3: DFS on G^T in reverse finish order
    const sccVisited = new Array(n).fill(false);
    const sccs: number[][] = [];
    
    function dfs2(v: number, currentSCC: number[]): void {
        sccVisited[v] = true;
        currentSCC.push(v);
        for (const neighbor of transposeGraph[v]) {
            if (!sccVisited[neighbor]) {
                dfs2(neighbor, currentSCC);
            }
        }
    }
    
    // Process vertices in decreasing finish time (pop from stack)
    while (finishStack.length > 0) {
        const v = finishStack.pop()!;
        if (!sccVisited[v]) {
            const currentSCC: number[] = [];
            dfs2(v, currentSCC);
            sccs.push(currentSCC);
        }
    }
    
    return sccs;
}
 
// Example usage:
// const sccs = kosarajuSCC(6, [[0,1], [1,2], [2,0], [2,3], [3,4], [4,5], [5,3]]);
// Result: [[0, 2, 1], [3, 5, 4]] - two SCCs

Stack = Topological Order Approximation

Why It Works — Correctness Proof

Let's rigorously prove that Kosaraju's algorithm correctly identifies all SCCs.

Lemma 1: Finish Time Ordering

If there is an edge from SCC C₁ to SCC C₂ in the component graph (C₁ → C₂), then: max{finish(v) : v ∈ C₁} > max{finish(v) : v ∈ C₂}

Proof: During DFS on G:

If DFS discovers a vertex in C₁ first, it will eventually explore into C₂ (via the cross-component edge), fully explore C₂, then return to continue/finish C₁. Thus C₁ finishes after C₂.
If DFS discovers a vertex in C₂ first, it explores C₂ completely but cannot reach C₁ (no edge from C₂ to C₁). C₂ finishes. Later, a new DFS tree starts from C₁, and C₁ finishes even later.

In both cases, max finish time in C₁ > max finish time in C₂. ∎

Corollary: Stack Order Reflects Component Dependencies

When we pop from the finish stack, we process vertices from "source" SCCs in the component graph before vertices from "sink" SCCs.

Lemma 2: DFS on Transpose Respects SCC Boundaries

When we run DFS on G^T starting from a vertex v in an SCC C, we will visit exactly the vertices in C (assuming vertices from other SCCs reachable in G^T from C are already marked visited).

Proof: Consider what vertices are reachable from v in G^T:

Other vertices in C: Since C is an SCC, all vertices in C are mutually reachable in G, which means they're also mutually reachable in G^T. These vertices are reachable.
Vertices in other SCCs: If C → C' exists in the component graph of G, then C' → C exists in the component graph of G^T. So from v in C, we might reach vertices in SCCs that "precede" C in the component graph of G.

However! By processing vertices in decreasing finish time order:

Vertices in C' (which precedes C in the G's component graph) have higher maximum finish times than C
They were processed (and marked visited) BEFORE we reached v
So they're already visited and won't be included in the current SCC

Thus, DFS from v visits exactly the unvisited vertices in C. ∎

Theorem: Kosaraju's Algorithm Is Correct

For each SCC, exactly its vertices are grouped together.

Proof: Combining Lemmas 1 and 2:

We process SCCs in topological order of the component graph (reversed in G^T means normal order for reaching "downstream" components)
When we start DFS from an unvisited vertex v, only vertices in v's SCC are reachable among unvisited vertices
All vertices in the SCC are reachable (by mutual reachability property)
Therefore, DFS collects exactly the SCC containing v

This holds for every SCC, proving correctness. ∎

The Beautiful Interplay

Detailed Walkthrough Example

Let's trace through a complete example to see Kosaraju's algorithm in action.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
Graph G with 8 vertices (0-7):
 
    0 → 1       4 → 5
    ↑   ↓       ↑   ↓
    3 ← 2   →   7 ← 6
        ↓
        (2→4 cross edge)
 
Edges: 0→1, 1→2, 2→3, 3→0, 2→4, 4→5, 5→6, 6→7, 7→4
 
Expected SCCs:
    SCC_A: {0, 1, 2, 3} - cycle 0→1→2→3→0
    SCC_B: {4, 5, 6, 7} - cycle 4→5→6→7→4
 
═══════════════════════════════════════════════════
PHASE 1: DFS on G (compute finish order)
═══════════════════════════════════════════════════
 
Start DFS from vertex 0:
    Visit 0
    → Visit 1
      → Visit 2
        → Visit 3
          → Back edge to 0 (already visited)
          FINISH 3 → Stack: [3]
        → Visit 4
          → Visit 5
            → Visit 6
              → Visit 7
                → Back edge to 4 (already visited)
                FINISH 7 → Stack: [3, 7]
              FINISH 6 → Stack: [3, 7, 6]
            FINISH 5 → Stack: [3, 7, 6, 5]
          FINISH 4 → Stack: [3, 7, 6, 5, 4]
        FINISH 2 → Stack: [3, 7, 6, 5, 4, 2]
      FINISH 1 → Stack: [3, 7, 6, 5, 4, 2, 1]
    FINISH 0 → Stack: [3, 7, 6, 5, 4, 2, 1, 0]
 
All vertices visited. Final finish stack (bottom to top):
    [3, 7, 6, 5, 4, 2, 1, 0]
    Pop order: 0, 1, 2, 4, 5, 6, 7, 3
 
═══════════════════════════════════════════════════
PHASE 2: Build transpose graph G^T
═══════════════════════════════════════════════════
 
Original edges reversed:
    0→1 becomes 1→0
    1→2 becomes 2→1
    2→3 becomes 3→2
    3→0 becomes 0→3
    2→4 becomes 4→2
    4→5 becomes 5→4
    5→6 becomes 6→5
    6→7 becomes 7→6
    7→4 becomes 4→7
 
G^T:
    0 ← 1       4 ← 5
    ↓   ↑       ↓   ↑
    3 → 2   ←   7 → 6
        ↑
        (4→2 cross edge)
 
═══════════════════════════════════════════════════
PHASE 3: DFS on G^T in finish order
═══════════════════════════════════════════════════
 
Pop 0: Visit 0 → DFS on G^T
    Visit 0, neighbors in G^T: [3]
    → Visit 3, neighbors: [2]
      → Visit 2, neighbors: [1]
        → Visit 1, neighbors: [0]
          → 0 already visited
        Back from 1
      Back from 2
    Back from 3
    SCC₁ complete: {0, 3, 2, 1}
 
Pop 1: Already visited
 
Pop 2: Already visited
 
Pop 4: Visit 4 → DFS on G^T
    Visit 4, neighbors in G^T: [2, 7]
    → 2 already visited (different SCC, blocked!)
    → Visit 7, neighbors: [6]
      → Visit 6, neighbors: [5]
        → Visit 5, neighbors: [4]
          → 4 already visited
        Back from 5
      Back from 6
    Back from 7
    SCC₂ complete: {4, 7, 6, 5}
 
Pop 5: Already visited
Pop 6: Already visited  
Pop 7: Already visited
Pop 3: Already visited
 
═══════════════════════════════════════════════════
RESULT
═══════════════════════════════════════════════════
 
SCC₁: {0, 1, 2, 3}
SCC₂: {4, 5, 6, 7}
 
Verified correct! ✓

Critical Observation

Complexity Analysis

Let's analyze the time and space complexity of Kosaraju's algorithm.

Time Complexity: O(V + E)

Phase 1 — DFS on G:

Visit every vertex exactly once: O(V)
Traverse every edge exactly once: O(E)
Total: O(V + E)

Phase 2 — Build Transpose:

Iterate through all edges once to build G^T: O(E)
Initialize adjacency lists: O(V)
Total: O(V + E)

Phase 3 — DFS on G^T:

Visit every vertex exactly once: O(V)
Traverse every (reversed) edge exactly once: O(E)
Total: O(V + E)

Overall Time: O(V + E) + O(V + E) + O(V + E) = O(V + E)

The algorithm is linear in the size of the graph—optimal for this problem!

Space Complexity: O(V + E)

Graph Storage:

Adjacency list for G: O(V + E)
Adjacency list for G^T: O(V + E)

DFS Auxiliary:

Visited arrays: O(V)
Finish stack: O(V)
Recursion stack (or explicit stack): O(V) in worst case

Total Space: O(V + E)

Space Optimization:

Kosaraju's Algorithm Complexity Summary
Phase	Time	Space	Notes
Phase 1: DFS on G	O(V + E)	O(V)	Builds finish order stack
Phase 2: Build G^T	O(V + E)	O(V + E)	Can be done during Phase 3
Phase 3: DFS on G^T	O(V + E)	O(V)	Identifies SCCs
Total	O(V + E)	O(V + E)	Optimal for SCC detection

Why O(V + E) Is Optimal

Implementation Variations and Considerations

There are several considerations and variations when implementing Kosaraju's algorithm in practice.

Recursive vs. Iterative DFS:

The recursive implementation is cleaner but may cause stack overflow for very large graphs (deep recursion). An iterative version using an explicit stack handles any graph size:

Handling Large Graphs:

For graphs with millions of vertices:

Use iterative DFS to avoid stack overflow
Consider memory-efficient graph representations
Stream the transpose construction if G doesn't fit in memory

Edge Cases:

Empty graph: Returns no SCCs (or an empty list)
Single vertex, no edges: One trivial SCC {v}
Complete graph: Often one large SCC
DAG: Every vertex is its own SCC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
function kosarajuIterative(
    n: number,
    edges: [number, number][]
): number[][] {
    // Build adjacency lists
    const graph: number[][] = Array.from({ length: n }, () => []);
    const transposeGraph: number[][] = Array.from({ length: n }, () => []);
    
    for (const [u, v] of edges) {
        graph[u].push(v);
        transposeGraph[v].push(u);
    }
    
    // Phase 1: Iterative DFS to compute finish order
    const visited = new Array(n).fill(false);
    const finishStack: number[] = [];
    
    for (let start = 0; start < n; start++) {
        if (visited[start]) continue;
        
        const stack: [number, number][] = [[start, 0]]; // [vertex, neighborIndex]
        
        while (stack.length > 0) {
            const [v, idx] = stack[stack.length - 1];
            
            if (!visited[v]) {
                visited[v] = true;
            }
            
            // Find next unvisited neighbor
            let foundNext = false;
            for (let i = idx; i < graph[v].length; i++) {
                const neighbor = graph[v][i];
                if (!visited[neighbor]) {
                    stack[stack.length - 1] = [v, i + 1]; // Update current
                    stack.push([neighbor, 0]);            // Push neighbor
                    foundNext = true;
                    break;
                }
            }
            
            if (!foundNext) {
                finishStack.push(v);
                stack.pop();
            }
        }
    }
    
    // Phase 2 & 3: DFS on transpose in reverse finish order
    const sccVisited = new Array(n).fill(false);
    const sccs: number[][] = [];
    
    while (finishStack.length > 0) {
        const start = finishStack.pop()!;
        if (sccVisited[start]) continue;
        
        const currentSCC: number[] = [];
        const stack: number[] = [start];
        
        while (stack.length > 0) {
            const v = stack.pop()!;
            if (sccVisited[v]) continue;
            
            sccVisited[v] = true;
            currentSCC.push(v);
            
            for (const neighbor of transposeGraph[v]) {
                if (!sccVisited[neighbor]) {
                    stack.push(neighbor);
                }
            }
        }
        
        sccs.push(currentSCC);
    }
    
    return sccs;
}

Production Considerations

Comparison with Other Approaches

Kosaraju's algorithm is one of two famous O(V + E) algorithms for finding SCCs. The other is Tarjan's algorithm (covered in the next page). Let's compare them and simpler approaches.

Brute Force: O(V² × (V + E))

For each pair of vertices, run DFS to check mutual reachability. Then cluster mutually reachable vertices. This is extremely slow but trivially correct.

Transitive Closure: O(V³) or O(V × (V + E))

Compute the transitive closure (all pairs reachability). Vertices with identical reachability sets (both reach each other) form SCCs. Better than brute force but still too slow for large graphs.

Kosaraju's Algorithm: O(V + E)

Two DFS passes with transpose. Simple to understand and implement. Requires building the transpose graph (extra space and time, though both are O(V + E)).

Tarjan's Algorithm: O(V + E)

Single DFS pass using low-link values. More complex to understand but more space-efficient (no explicit transpose). We'll cover this next.

SCC Algorithm Comparison
Algorithm	Time	Space	Passes	Difficulty
Brute Force	O(V² × (V + E))	O(V)	2V	Easy
Transitive Closure	O(V³)	O(V²)	1	Medium
Kosaraju	O(V + E)	O(V + E)	2	Medium
Tarjan	O(V + E)	O(V)	1	Hard

When to Use Kosaraju vs Tarjan

Summary and Looking Ahead

Kosaraju's algorithm is an elegant solution to the SCC problem, leveraging deep insights about DFS finish times and the transpose graph. Let's summarize the key points:

Key Takeaways

•Two DFS passes — First on G for finish order, second on G^T using that order to identify SCCs.
•Finish time ordering — Vertices in 'upstream' SCCs have higher maximum finish times than 'downstream' SCCs.
•Transpose preserves SCCs — G and G^T have identical strongly connected components, but reversed inter-SCC edges.
•O(V + E) complexity — The algorithm is linear, optimal for SCC detection.
•Correctness via blocking — Processing in finish order ensures we can't accidentally include vertices from other SCCs.
•Two implementations — Recursive for clarity, iterative for large graphs.

What's Next:

Page Complete

3 / 5