Loading learning content...
Given a directed graph with V vertices and E edges, can we find all strongly connected components in linear time O(V + E)? The answer is yes, and one of the most elegant solutions is Kosaraju's algorithm.
Named after S. Rao Kosaraju who discovered it in 1978 (though it was published by Micha Sharir in 1981), this algorithm exploits a beautiful property: a graph and its transpose (with all edges reversed) have exactly the same strongly connected components. By running DFS twice—once on the original graph and once on the transpose—we can identify all SCCs with remarkable efficiency.
By the end of this page, you will understand the key insight behind Kosaraju's algorithm, the role of finish times in DFS, why the transpose graph preserves SCCs, the complete step-by-step algorithm with proof of correctness, implementation in code with detailed walkthrough, and the precise O(V + E) complexity analysis.
To understand Kosaraju's algorithm, we first need to understand DFS finish times and their remarkable properties.
DFS Finish Times:
When running DFS, each vertex has a finish time—the moment when DFS has fully explored the vertex and all its descendants and is about to backtrack. In a timestamp-based formulation:
The Critical Property:
Consider two SCCs, A and B, where there's an edge from some vertex in A to some vertex in B (so A appears before B in a topological sort of the component graph). When we run DFS:
If the DFS starts from (or reaches) component A before starting/reaching component B:
If the DFS starts from component B first:
The Universal Rule: If there's an edge from SCC A to SCC B in the component graph, then the vertex with the maximum finish time in A has a higher finish time than any vertex in B.
1234567891011121314151617181920212223242526272829303132333435363738394041
Graph with two SCCs: ┌───────────┐ ┌───────────┐ │ SCC_A │ │ SCC_B │ │ 1 → 2 │ ──────→ │ 3 → 4 │ │ ↑ ↓ │ │ ↑ ↓ │ │ └── 3 │ │ └── 5 │ └───────────┘ └───────────┘ Wait, I have naming confusion. Let me redo: ┌───────────┐ ┌───────────┐ │ SCC_A │ │ SCC_B │ │ A → B │ │ D → E │ │ ↑ ↓ │ ──────→ │ ↑ ↓ │ │ └── C │ (C→D) │ └── F │ └───────────┘ └───────────┘ Edge C → D connects SCC_A to SCC_B. DFS from A (starting in SCC_A): Visit A (discovery time 1) Visit B (discovery time 2) Visit C (discovery time 3) From C, can go to A (already visited) or D (unvisited) Jump to D (discovery time 4) Visit E (discovery time 5) Visit F (discovery time 6) Backtrack: F finishes (finish time 6) Backtrack: E finishes (finish time 7) Backtrack: D finishes (finish time 8) Back to C: C finishes (finish time 9) Backtrack: B finishes (finish time 10) Backtrack: A finishes (finish time 11) Finish times: SCC_A: A=11, B=10, C=9 (max = 11) SCC_B: D=8, E=7, F=6 (max = 8) The maximum finish time in SCC_A (11) > maximum in SCC_B (8)This confirms: SCC_A should be processed first when finding SCCs.If we process vertices in decreasing order of finish time, we're guaranteed to visit 'source' SCCs (in the component graph) before we can accidentally wander into 'later' SCCs. This is the foundation of Kosaraju's algorithm.
The second key insight of Kosaraju's algorithm involves the transpose graph.
Definition: Transpose Graph
Given G = (V, E), the transpose G^T = (V, E^T) has the same vertices but all edges reversed: E^T = {(v, u) | (u, v) ∈ E}
Critical Theorem: SCCs Are Preserved
G and G^T have exactly the same strongly connected components.
Proof: Let u and v be in the same SCC in G. This means:
Reversing all edges:
So in G^T:
Thus u and v are in the same SCC in G^T.
The converse follows by the same argument (transpose is its own inverse: (G^T)^T = G).
123456789101112131415161718192021222324252627282930313233343536373839404142
Original Graph G: A → B → C ↑ ↓ └───────┘ Edges: A→B, B→C, C→A (one SCC: {A, B, C}) Transpose Graph G^T: A ← B ← C ↓ ↑ └───────┘ Edges: B→A, C→B, A→C (same SCC: {A, B, C}) Paths in G: A → B → C → A (cycle) Paths in G^T: A → C → B → A (cycle, reversed direction) Same vertices are mutually reachable in both graphs! Another example with multiple SCCs: Graph G: SCC₁: A ⇄ B │ ↓ SCC₂: C ⇄ D Transpose G^T: SCC₁: A ⇄ B ↑ │ SCC₂: C ⇄ D Same SCCs, but inter-SCC edge is reversed!This is crucial: SCCs are preserved, but their "dependency order" flips.While SCCs are preserved, the component graph (condensation) of G^T is the transpose of the component graph of G. Source SCCs in G become sink SCCs in G^T, and vice versa. This reversal is key to the algorithm's correctness.
Now we can present the complete algorithm. Kosaraju's algorithm consists of three phases:
Phase 1: First DFS on Original Graph G
Run DFS on G, visiting all vertices. Track the finish times of vertices. Store vertices in a stack or list in the order they finish (last to finish goes on top of stack).
Phase 2: Compute Transpose Graph G^T
Build G^T by reversing all edges in G. This takes O(V + E) time.
Phase 3: Second DFS on G^T in Finish-Time Order
Pop vertices from the stack (starting with the highest finish time). For each unvisited vertex, run DFS on G^T and mark all reachable vertices as belonging to the same SCC.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
function kosarajuSCC( n: number, edges: [number, number][]): number[][] { // Build adjacency list for G const graph: number[][] = Array.from({ length: n }, () => []); for (const [u, v] of edges) { graph[u].push(v); } // PHASE 1: DFS on G to compute finish order const visited = new Array(n).fill(false); const finishStack: number[] = []; function dfs1(v: number): void { visited[v] = true; for (const neighbor of graph[v]) { if (!visited[neighbor]) { dfs1(neighbor); } } // Key: push when FINISHED (after exploring all descendants) finishStack.push(v); } // Run DFS from all vertices to handle disconnected components for (let v = 0; v < n; v++) { if (!visited[v]) { dfs1(v); } } // PHASE 2: Build transpose graph G^T const transposeGraph: number[][] = Array.from({ length: n }, () => []); for (const [u, v] of edges) { transposeGraph[v].push(u); // Reverse the edge } // PHASE 3: DFS on G^T in reverse finish order const sccVisited = new Array(n).fill(false); const sccs: number[][] = []; function dfs2(v: number, currentSCC: number[]): void { sccVisited[v] = true; currentSCC.push(v); for (const neighbor of transposeGraph[v]) { if (!sccVisited[neighbor]) { dfs2(neighbor, currentSCC); } } } // Process vertices in decreasing finish time (pop from stack) while (finishStack.length > 0) { const v = finishStack.pop()!; if (!sccVisited[v]) { const currentSCC: number[] = []; dfs2(v, currentSCC); sccs.push(currentSCC); } } return sccs;} // Example usage:// const sccs = kosarajuSCC(6, [[0,1], [1,2], [2,0], [2,3], [3,4], [4,5], [5,3]]);// Result: [[0, 2, 1], [3, 5, 4]] - two SCCsThe finish stack from Phase 1 approximates a reverse topological order of the component graph. Vertices from 'sink' SCCs finish first (pushed early), vertices from 'source' SCCs finish last (pushed late, on top). Popping from top processes source SCCs first.
Let's rigorously prove that Kosaraju's algorithm correctly identifies all SCCs.
Lemma 1: Finish Time Ordering
If there is an edge from SCC C₁ to SCC C₂ in the component graph (C₁ → C₂), then: max{finish(v) : v ∈ C₁} > max{finish(v) : v ∈ C₂}
Proof: During DFS on G:
In both cases, max finish time in C₁ > max finish time in C₂. ∎
Corollary: Stack Order Reflects Component Dependencies
When we pop from the finish stack, we process vertices from "source" SCCs in the component graph before vertices from "sink" SCCs.
Lemma 2: DFS on Transpose Respects SCC Boundaries
When we run DFS on G^T starting from a vertex v in an SCC C, we will visit exactly the vertices in C (assuming vertices from other SCCs reachable in G^T from C are already marked visited).
Proof: Consider what vertices are reachable from v in G^T:
However! By processing vertices in decreasing finish time order:
Thus, DFS from v visits exactly the unvisited vertices in C. ∎
Theorem: Kosaraju's Algorithm Is Correct
For each SCC, exactly its vertices are grouped together.
Proof: Combining Lemmas 1 and 2:
This holds for every SCC, proving correctness. ∎
The algorithm's brilliance lies in how the first DFS creates an ordering where 'important' (upstream) SCCs are processed first, and the second DFS on the transpose 'blocks' expansion into earlier SCCs because they're already visited. Each SCC is discovered in isolation.
Let's trace through a complete example to see Kosaraju's algorithm in action.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
Graph G with 8 vertices (0-7): 0 → 1 4 → 5 ↑ ↓ ↑ ↓ 3 ← 2 → 7 ← 6 ↓ (2→4 cross edge) Edges: 0→1, 1→2, 2→3, 3→0, 2→4, 4→5, 5→6, 6→7, 7→4 Expected SCCs: SCC_A: {0, 1, 2, 3} - cycle 0→1→2→3→0 SCC_B: {4, 5, 6, 7} - cycle 4→5→6→7→4 ═══════════════════════════════════════════════════PHASE 1: DFS on G (compute finish order)═══════════════════════════════════════════════════ Start DFS from vertex 0: Visit 0 → Visit 1 → Visit 2 → Visit 3 → Back edge to 0 (already visited) FINISH 3 → Stack: [3] → Visit 4 → Visit 5 → Visit 6 → Visit 7 → Back edge to 4 (already visited) FINISH 7 → Stack: [3, 7] FINISH 6 → Stack: [3, 7, 6] FINISH 5 → Stack: [3, 7, 6, 5] FINISH 4 → Stack: [3, 7, 6, 5, 4] FINISH 2 → Stack: [3, 7, 6, 5, 4, 2] FINISH 1 → Stack: [3, 7, 6, 5, 4, 2, 1] FINISH 0 → Stack: [3, 7, 6, 5, 4, 2, 1, 0] All vertices visited. Final finish stack (bottom to top): [3, 7, 6, 5, 4, 2, 1, 0] Pop order: 0, 1, 2, 4, 5, 6, 7, 3 ═══════════════════════════════════════════════════PHASE 2: Build transpose graph G^T═══════════════════════════════════════════════════ Original edges reversed: 0→1 becomes 1→0 1→2 becomes 2→1 2→3 becomes 3→2 3→0 becomes 0→3 2→4 becomes 4→2 4→5 becomes 5→4 5→6 becomes 6→5 6→7 becomes 7→6 7→4 becomes 4→7 G^T: 0 ← 1 4 ← 5 ↓ ↑ ↓ ↑ 3 → 2 ← 7 → 6 ↑ (4→2 cross edge) ═══════════════════════════════════════════════════PHASE 3: DFS on G^T in finish order═══════════════════════════════════════════════════ Pop 0: Visit 0 → DFS on G^T Visit 0, neighbors in G^T: [3] → Visit 3, neighbors: [2] → Visit 2, neighbors: [1] → Visit 1, neighbors: [0] → 0 already visited Back from 1 Back from 2 Back from 3 SCC₁ complete: {0, 3, 2, 1} Pop 1: Already visited Pop 2: Already visited Pop 4: Visit 4 → DFS on G^T Visit 4, neighbors in G^T: [2, 7] → 2 already visited (different SCC, blocked!) → Visit 7, neighbors: [6] → Visit 6, neighbors: [5] → Visit 5, neighbors: [4] → 4 already visited Back from 5 Back from 6 Back from 7 SCC₂ complete: {4, 7, 6, 5} Pop 5: Already visitedPop 6: Already visited Pop 7: Already visitedPop 3: Already visited ═══════════════════════════════════════════════════RESULT═══════════════════════════════════════════════════ SCC₁: {0, 1, 2, 3}SCC₂: {4, 5, 6, 7} Verified correct! ✓Notice how in Phase 3, when we explore from vertex 4, the edge 4→2 in G^T exists, but vertex 2 is already visited. This 'blocks' the DFS from including SCC_A's vertices in SCC_B. The finish-time ordering ensures we visit SCC_A first, preventing cross-contamination.
Let's analyze the time and space complexity of Kosaraju's algorithm.
Time Complexity: O(V + E)
Phase 1 — DFS on G:
Phase 2 — Build Transpose:
Phase 3 — DFS on G^T:
Overall Time: O(V + E) + O(V + E) + O(V + E) = O(V + E)
The algorithm is linear in the size of the graph—optimal for this problem!
Space Complexity: O(V + E)
Graph Storage:
DFS Auxiliary:
Total Space: O(V + E)
Space Optimization:
We can reduce space by reusing the visited array and not storing G^T explicitly if we traverse edges in reverse during Phase 3. However, this complicates the implementation and the constant factors in time may increase. The trade-off is rarely worthwhile for practical implementations.
| Phase | Time | Space | Notes |
|---|---|---|---|
| Phase 1: DFS on G | O(V + E) | O(V) | Builds finish order stack |
| Phase 2: Build G^T | O(V + E) | O(V + E) | Can be done during Phase 3 |
| Phase 3: DFS on G^T | O(V + E) | O(V) | Identifies SCCs |
| Total | O(V + E) | O(V + E) | Optimal for SCC detection |
Any algorithm that identifies all SCCs must examine every vertex (to assign it to an SCC) and potentially every edge (to determine connectivity). Thus, Ω(V + E) is a lower bound, and Kosaraju's O(V + E) is optimal.
There are several considerations and variations when implementing Kosaraju's algorithm in practice.
Recursive vs. Iterative DFS:
The recursive implementation is cleaner but may cause stack overflow for very large graphs (deep recursion). An iterative version using an explicit stack handles any graph size:
Handling Large Graphs:
For graphs with millions of vertices:
Edge Cases:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
function kosarajuIterative( n: number, edges: [number, number][]): number[][] { // Build adjacency lists const graph: number[][] = Array.from({ length: n }, () => []); const transposeGraph: number[][] = Array.from({ length: n }, () => []); for (const [u, v] of edges) { graph[u].push(v); transposeGraph[v].push(u); } // Phase 1: Iterative DFS to compute finish order const visited = new Array(n).fill(false); const finishStack: number[] = []; for (let start = 0; start < n; start++) { if (visited[start]) continue; const stack: [number, number][] = [[start, 0]]; // [vertex, neighborIndex] while (stack.length > 0) { const [v, idx] = stack[stack.length - 1]; if (!visited[v]) { visited[v] = true; } // Find next unvisited neighbor let foundNext = false; for (let i = idx; i < graph[v].length; i++) { const neighbor = graph[v][i]; if (!visited[neighbor]) { stack[stack.length - 1] = [v, i + 1]; // Update current stack.push([neighbor, 0]); // Push neighbor foundNext = true; break; } } if (!foundNext) { finishStack.push(v); stack.pop(); } } } // Phase 2 & 3: DFS on transpose in reverse finish order const sccVisited = new Array(n).fill(false); const sccs: number[][] = []; while (finishStack.length > 0) { const start = finishStack.pop()!; if (sccVisited[start]) continue; const currentSCC: number[] = []; const stack: number[] = [start]; while (stack.length > 0) { const v = stack.pop()!; if (sccVisited[v]) continue; sccVisited[v] = true; currentSCC.push(v); for (const neighbor of transposeGraph[v]) { if (!sccVisited[neighbor]) { stack.push(neighbor); } } } sccs.push(currentSCC); } return sccs;}In production code, consider using libraries that provide optimized graph algorithms. If implementing from scratch, the iterative version is preferred for robustness. Also consider returning additional information like vertex-to-SCC mapping for downstream processing.
Kosaraju's algorithm is one of two famous O(V + E) algorithms for finding SCCs. The other is Tarjan's algorithm (covered in the next page). Let's compare them and simpler approaches.
Brute Force: O(V² × (V + E))
For each pair of vertices, run DFS to check mutual reachability. Then cluster mutually reachable vertices. This is extremely slow but trivially correct.
Transitive Closure: O(V³) or O(V × (V + E))
Compute the transitive closure (all pairs reachability). Vertices with identical reachability sets (both reach each other) form SCCs. Better than brute force but still too slow for large graphs.
Kosaraju's Algorithm: O(V + E)
Two DFS passes with transpose. Simple to understand and implement. Requires building the transpose graph (extra space and time, though both are O(V + E)).
Tarjan's Algorithm: O(V + E)
Single DFS pass using low-link values. More complex to understand but more space-efficient (no explicit transpose). We'll cover this next.
| Algorithm | Time | Space | Passes | Difficulty |
|---|---|---|---|---|
| Brute Force | O(V² × (V + E)) | O(V) | 2V | Easy |
| Transitive Closure | O(V³) | O(V²) | 1 | Medium |
| Kosaraju | O(V + E) | O(V + E) | 2 | Medium |
| Tarjan | O(V + E) | O(V) | 1 | Hard |
Use Kosaraju when code clarity is important and memory is not extremely constrained. Use Tarjan when you need single-pass efficiency or must minimize auxiliary space. Both are O(V + E), so the choice often comes down to implementation preference.
Kosaraju's algorithm is an elegant solution to the SCC problem, leveraging deep insights about DFS finish times and the transpose graph. Let's summarize the key points:
What's Next:
The next page introduces Tarjan's algorithm—an alternative approach that finds all SCCs in a single DFS pass using the concept of low-link values. Tarjan's algorithm is more space-efficient and is the basis for many other graph algorithms.
You now understand Kosaraju's algorithm in depth: the key insights about finish times and transpose graphs, the three-phase algorithm structure, the correctness proof, complete implementation, and complexity analysis. You can implement this algorithm and explain why it works. Next, we explore Tarjan's alternative approach.