Loading content...
DFS isn't just a traversal technique—it's a problem-solving framework. The depth-first structure naturally reveals graph properties that have profound real-world applications:
These two applications appear constantly in production systems:
In this page, we'll implement both applications from scratch with production-quality code, understanding exactly how DFS enables each solution.
This page provides mastery of two essential DFS applications. You'll implement algorithms for finding connected components (undirected) and strongly connected components (directed), detecting cycles in both undirected and directed graphs, and understand how these applications are used in real-world systems.
A connected component is a maximal set of vertices where any two vertices are connected by a path. In other words, it's a group of vertices that can all reach each other, with no paths to vertices outside the group.
The Problem: Given an undirected graph, find all connected components.
Why DFS Works:
Starting from any vertex, DFS visits all vertices reachable from it. By definition, this is exactly the connected component containing that vertex. Running DFS from unvisited vertices until all are visited identifies all components.
The Algorithm:
for each vertex v in V:
if v not visited:
new_component = []
DFS(v, new_component) # Adds all reachable vertices
components.append(new_component)
Correctness Argument:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165
from collections import defaultdictfrom typing import List, Set, Dict, Any class UndirectedGraph: """ Undirected graph with connected components functionality. """ def __init__(self): self.adj = defaultdict(list) self.vertices = set() def add_edge(self, u, v): """Add undirected edge between u and v.""" self.adj[u].append(v) self.adj[v].append(u) self.vertices.add(u) self.vertices.add(v) def add_vertex(self, v): """Add isolated vertex.""" self.vertices.add(v) def find_connected_components(self) -> List[Set[Any]]: """ Find all connected components using DFS. Returns: List of sets, each set contains vertices in one component. Time: O(V + E) Space: O(V) for visited set """ visited = set() components = [] def dfs(v: Any, component: Set[Any]): """DFS to explore entire component from v.""" visited.add(v) component.add(v) for neighbor in self.adj[v]: if neighbor not in visited: dfs(neighbor, component) # Start DFS from each unvisited vertex for vertex in self.vertices: if vertex not in visited: component = set() dfs(vertex, component) components.append(component) return components def count_components(self) -> int: """Return the number of connected components.""" return len(self.find_connected_components()) def are_connected(self, u, v) -> bool: """Check if two vertices are in the same component.""" if u not in self.vertices or v not in self.vertices: return False visited = set() def dfs(current): if current == v: return True visited.add(current) for neighbor in self.adj[current]: if neighbor not in visited: if dfs(neighbor): return True return False return dfs(u) def get_component_of(self, v) -> Set[Any]: """Get all vertices in the same component as v.""" if v not in self.vertices: return set() component = set() visited = set() def dfs(current): visited.add(current) component.add(current) for neighbor in self.adj[current]: if neighbor not in visited: dfs(neighbor) dfs(v) return component def demonstrate_connected_components(): """Demonstrate connected components with a real example.""" # Create a social network with multiple friend groups network = UndirectedGraph() # Group 1: Work colleagues work_connections = [ ('Alice', 'Bob'), ('Bob', 'Charlie'), ('Alice', 'Charlie'), ('Charlie', 'Diana') ] # Group 2: High school friends school_connections = [ ('Eve', 'Frank'), ('Frank', 'Grace') ] # Group 3: Family (one connection) family_connections = [ ('Henry', 'Ivy') ] # Isolated person network.add_vertex('Jack') # Add all edges for u, v in work_connections + school_connections + family_connections: network.add_edge(u, v) # Find components components = network.find_connected_components() print("Social Network Connected Components") print("=" * 50) print(f"Total vertices: {len(network.vertices)}") print(f"Number of components: {len(components)}") print() for i, component in enumerate(components, 1): print(f"Component {i}: {sorted(component)}") print() print("Connectivity Queries:") print(f" Alice connected to Diana? {network.are_connected('Alice', 'Diana')}") print(f" Alice connected to Eve? {network.are_connected('Alice', 'Eve')}") print(f" Jack connected to anyone? {len(network.get_component_of('Jack')) > 1}") if __name__ == "__main__": demonstrate_connected_components() # Output:# Social Network Connected Components# ==================================================# Total vertices: 10# Number of components: 4## Component 1: ['Alice', 'Bob', 'Charlie', 'Diana']# Component 2: ['Eve', 'Frank', 'Grace']# Component 3: ['Henry', 'Ivy']# Component 4: ['Jack']## Connectivity Queries:# Alice connected to Diana? True# Alice connected to Eve? False# Jack connected to anyone? FalseSocial Networks: Find friend clusters, suggest connections. Image Processing: Identify connected regions (flood fill). Network Analysis: Find isolated subnets, assess network resilience. Puzzles: Count islands, connected cells.
For very large graphs, the recursive approach can cause stack overflow. Here's a robust iterative implementation.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112
def find_components_iterative(graph: Dict, all_vertices: Set) -> List[List]: """ Find connected components using iterative DFS. Handles graphs with arbitrarily deep structure without risking stack overflow. Args: graph: Adjacency list representation all_vertices: Set of all vertices (includes isolated ones) Returns: List of components, each component is a list of vertices """ visited = set() components = [] for start_vertex in all_vertices: if start_vertex in visited: continue # Start new component component = [] stack = [start_vertex] while stack: vertex = stack.pop() if vertex in visited: continue visited.add(vertex) component.append(vertex) # Add unvisited neighbors to stack for neighbor in graph.get(vertex, []): if neighbor not in visited: stack.append(neighbor) components.append(component) return components def label_components(graph: Dict, all_vertices: Set) -> Dict: """ Assign a component label to each vertex. Useful when you need to quickly check which component a vertex belongs to. Returns: Dictionary mapping each vertex to its component label (0, 1, 2, ...) """ visited = set() labels = {} current_label = 0 for start_vertex in all_vertices: if start_vertex in visited: continue stack = [start_vertex] while stack: vertex = stack.pop() if vertex in visited: continue visited.add(vertex) labels[vertex] = current_label for neighbor in graph.get(vertex, []): if neighbor not in visited: stack.append(neighbor) current_label += 1 return labels # Example: Large sparse graphimport random def generate_sparse_graph(n: int, edges_per_vertex: int = 3) -> Dict: """Generate a sparse random graph that may have multiple components.""" graph = {i: [] for i in range(n)} for i in range(n): for _ in range(edges_per_vertex): j = random.randint(0, n - 1) if i != j and j not in graph[i]: graph[i].append(j) graph[j].append(i) # Undirected return graph # Performance testn = 10000graph = generate_sparse_graph(n, edges_per_vertex=2)all_vertices = set(range(n)) import timestart = time.perf_counter()components = find_components_iterative(graph, all_vertices)elapsed = time.perf_counter() - start print(f"Graph with {n} vertices")print(f"Found {len(components)} components in {elapsed:.3f}s")print(f"Largest component: {max(len(c) for c in components)} vertices")print(f"Smallest component: {min(len(c) for c in components)} vertices")A cycle in an undirected graph is a path that starts and ends at the same vertex without repeating edges. Detecting cycles is crucial for:
The Challenge in Undirected Graphs:
In an undirected graph, every edge appears twice in the adjacency list: once for each endpoint. When we traverse from A to B, we must not consider the edge back to A as a "cycle."
Solution: Track the Parent
When exploring from vertex v via its parent p, ignore the edge back to p. Any other visited vertex we encounter indicates a cycle.
Algorithm:
has_cycle = False
def dfs(v, parent):
visited[v] = True
for neighbor in adj[v]:
if not visited[neighbor]:
dfs(neighbor, v)
elif neighbor != parent:
has_cycle = True # Found back edge to non-parent ancestor
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176
from typing import Dict, List, Set, Optional, Tuple def has_cycle_undirected(graph: Dict, all_vertices: Set) -> bool: """ Detect if an undirected graph contains a cycle. Uses DFS with parent tracking to distinguish back edges from the edge we came from. Time: O(V + E) Space: O(V) """ visited = set() def dfs(vertex, parent) -> bool: visited.add(vertex) for neighbor in graph.get(vertex, []): if neighbor not in visited: if dfs(neighbor, vertex): return True elif neighbor != parent: # Found edge to visited vertex that isn't our parent # This is a back edge -> cycle exists return True return False # Check all components for start in all_vertices: if start not in visited: if dfs(start, None): return True return False def find_cycle_undirected(graph: Dict, all_vertices: Set) -> Optional[List]: """ Find and return a cycle in an undirected graph. Returns None if no cycle exists. Returns list of vertices forming the cycle if one exists. """ visited = set() parent = {} def dfs(vertex, par) -> Optional[Tuple]: """Returns (cycle_start, cycle_end) if cycle found.""" visited.add(vertex) parent[vertex] = par for neighbor in graph.get(vertex, []): if neighbor not in visited: result = dfs(neighbor, vertex) if result: return result elif neighbor != par: # Found cycle: path from neighbor to vertex, then edge back return (neighbor, vertex) return None # Find a cycle cycle_endpoints = None for start in all_vertices: if start not in visited: cycle_endpoints = dfs(start, None) if cycle_endpoints: break if not cycle_endpoints: return None # Reconstruct cycle from parent pointers cycle_start, cycle_end = cycle_endpoints cycle = [] # Build path from cycle_end back to cycle_start current = cycle_end while current != cycle_start: cycle.append(current) current = parent[current] cycle.append(cycle_start) return cycle def count_edges(graph: Dict) -> int: """Count edges in undirected graph (each edge counted once).""" return sum(len(neighbors) for neighbors in graph.values()) // 2 def is_tree(graph: Dict, all_vertices: Set) -> bool: """ Check if the graph is a tree. A tree is a connected, acyclic graph. Properties: V vertices, V-1 edges, connected, no cycles. """ V = len(all_vertices) E = count_edges(graph) # Quick check: tree must have exactly V-1 edges if E != V - 1: return False # Check connectivity and acyclicity visited = set() start = next(iter(all_vertices)) def dfs(v, parent): visited.add(v) for neighbor in graph.get(v, []): if neighbor not in visited: dfs(neighbor, v) elif neighbor != parent: return False # Cycle found return True is_acyclic = dfs(start, None) is_connected = len(visited) == V return is_acyclic and is_connected # Demonstrationdef demonstrate_cycle_detection(): print("Cycle Detection in Undirected Graphs") print("=" * 50) # Graph without cycle (tree) tree_graph = { 'A': ['B', 'C'], 'B': ['A', 'D', 'E'], 'C': ['A'], 'D': ['B'], 'E': ['B'] } vertices_tree = {'A', 'B', 'C', 'D', 'E'} print("Tree graph (no cycle):") print(f" Has cycle: {has_cycle_undirected(tree_graph, vertices_tree)}") print(f" Is tree: {is_tree(tree_graph, vertices_tree)}") print() # Graph with cycle cycle_graph = { 'A': ['B', 'C'], 'B': ['A', 'C'], # B-C edge creates cycle A-B-C-A 'C': ['A', 'B'] } vertices_cycle = {'A', 'B', 'C'} print("Graph with cycle:") print(f" Has cycle: {has_cycle_undirected(cycle_graph, vertices_cycle)}") cycle = find_cycle_undirected(cycle_graph, vertices_cycle) print(f" Cycle found: {cycle}") print() # Disconnected graph with cycle in one component disconnected = { 'A': ['B'], 'B': ['A'], 'C': ['D', 'E'], 'D': ['C', 'E'], # Cycle in this component 'E': ['C', 'D'] } vertices_disc = {'A', 'B', 'C', 'D', 'E'} print("Disconnected graph (cycle in one component):") print(f" Has cycle: {has_cycle_undirected(disconnected, vertices_disc)}") if __name__ == "__main__": demonstrate_cycle_detection()If your graph can have multiple edges between the same vertices, the parent check isn't sufficient. You'd need to track the specific edge used, not just the parent vertex. For simple graphs (at most one edge per vertex pair), the parent check works perfectly.
Cycle detection in directed graphs is fundamentally different from undirected graphs. In a directed graph:
Solution: Three-Color Marking
We track vertices in three states:
The Key Insight:
A cycle exists if and only if we encounter a GRAY vertex during DFS.
Edge Classification by Color:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199
from typing import Dict, List, Set, Optionalfrom enum import IntEnum class Color(IntEnum): WHITE = 0 # Unvisited GRAY = 1 # In progress (on DFS stack) BLACK = 2 # Finished def has_cycle_directed(graph: Dict, all_vertices: Set) -> bool: """ Detect if a directed graph contains a cycle. Uses three-color marking to identify back edges. A back edge (edge to GRAY vertex) indicates a cycle. Time: O(V + E) Space: O(V) """ color = {v: Color.WHITE for v in all_vertices} def dfs(vertex) -> bool: color[vertex] = Color.GRAY for neighbor in graph.get(vertex, []): if color[neighbor] == Color.GRAY: # Edge to in-progress vertex = back edge = cycle return True if color[neighbor] == Color.WHITE: if dfs(neighbor): return True color[vertex] = Color.BLACK return False for vertex in all_vertices: if color[vertex] == Color.WHITE: if dfs(vertex): return True return False def find_cycle_directed(graph: Dict, all_vertices: Set) -> Optional[List]: """ Find and return a cycle in a directed graph. Returns None if no cycle exists. Returns list of vertices forming the cycle. """ color = {v: Color.WHITE for v in all_vertices} parent = {} def dfs(vertex) -> Optional[str]: """Returns the vertex where cycle starts, or None.""" color[vertex] = Color.GRAY for neighbor in graph.get(vertex, []): if color[neighbor] == Color.GRAY: # Found cycle! neighbor is where cycle starts parent[neighbor] = vertex # Close the loop return neighbor if color[neighbor] == Color.WHITE: parent[neighbor] = vertex result = dfs(neighbor) if result is not None: return result color[vertex] = Color.BLACK return None for vertex in all_vertices: if color[vertex] == Color.WHITE: cycle_start = dfs(vertex) if cycle_start is not None: # Reconstruct cycle cycle = [cycle_start] current = parent[cycle_start] while current != cycle_start: cycle.append(current) current = parent[current] cycle.append(cycle_start) # Complete the cycle return cycle[::-1] # Return in forward direction return None def find_all_cycles(graph: Dict, all_vertices: Set) -> List[List]: """ Find all simple cycles in a directed graph. Warning: This can be exponential in the number of cycles. Use with caution on large graphs. """ cycles = [] blocked = set() blocked_map = {v: set() for v in all_vertices} path = [] def unblock(u): blocked.discard(u) while blocked_map[u]: w = blocked_map[u].pop() if w in blocked: unblock(w) def circuit(v, start): found = False path.append(v) blocked.add(v) for w in graph.get(v, []): if w == start: cycles.append(path + [start]) found = True elif w not in blocked: if circuit(w, start): found = True if found: unblock(v) else: for w in graph.get(v, []): blocked_map[w].add(v) path.pop() return found # Johnson's algorithm: find cycles starting from each vertex for start in sorted(all_vertices, key=str): blocked.clear() for v in all_vertices: blocked_map[v].clear() circuit(start, start) return cycles def demonstrate_directed_cycles(): print("Cycle Detection in Directed Graphs") print("=" * 50) # DAG (Directed Acyclic Graph) dag = { 'A': ['B', 'C'], 'B': ['D'], 'C': ['D'], 'D': ['E'], 'E': [] } dag_vertices = set(dag.keys()) print("DAG (no cycles):") print(f" Has cycle: {has_cycle_directed(dag, dag_vertices)}") print() # Graph with cycle cyclic = { 'A': ['B'], 'B': ['C'], 'C': ['D'], 'D': ['B'] # Back edge D -> B creates cycle B -> C -> D -> B } cyclic_vertices = set(cyclic.keys()) print("Cyclic graph:") print(f" Has cycle: {has_cycle_directed(cyclic, cyclic_vertices)}") print(f" Cycle: {find_cycle_directed(cyclic, cyclic_vertices)}") print() # Self-loop self_loop = { 'A': ['A'] # A -> A } print("Self-loop:") print(f" Has cycle: {has_cycle_directed(self_loop, {'A'})}") # Multiple cycles multi_cycle = { 'A': ['B'], 'B': ['C', 'D'], 'C': ['A'], # Cycle 1: A -> B -> C -> A 'D': ['E'], 'E': ['B'] # Cycle 2: B -> D -> E -> B } print() print("Graph with multiple cycles:") all_cycles = find_all_cycles(multi_cycle, set(multi_cycle.keys())) for i, cycle in enumerate(all_cycles, 1): print(f" Cycle {i}: {' -> '.join(cycle)}") if __name__ == "__main__": demonstrate_directed_cycles()| Aspect | Undirected Graph | Directed Graph |
|---|---|---|
| Cycle definition | Path back to start (any direction) | Path following edge directions |
| Parent tracking | Required (ignore edge we came from) | Not applicable |
| Color scheme | Two colors (visited/unvisited) | Three colors (WHITE/GRAY/BLACK) |
| Cycle indicator | Edge to visited non-parent | Edge to GRAY vertex |
| Edge type that indicates cycle | Back edge | Back edge |
One of the most important real-world applications of cycle detection is validating dependencies. Circular dependencies cause:
Let's build a practical dependency validator:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178
from typing import Dict, List, Set, Tuple, Optionalfrom dataclasses import dataclassfrom collections import defaultdict @dataclassclass DependencyError: """Represents a circular dependency error.""" cycle: List[str] message: str class DependencyValidator: """ Validates dependency graphs for circular dependencies. Use cases: - Package managers (npm, pip, cargo) - Build systems (make, gradle) - Module imports - Task scheduling """ def __init__(self): self.dependencies: Dict[str, List[str]] = defaultdict(list) self.items: Set[str] = set() def add_dependency(self, item: str, depends_on: str): """ Add a dependency: 'item' depends on 'depends_on'. This means 'depends_on' must be processed before 'item'. """ self.dependencies[item].append(depends_on) self.items.add(item) self.items.add(depends_on) def add_item(self, item: str): """Add an item with no dependencies.""" self.items.add(item) def validate(self) -> Tuple[bool, Optional[DependencyError]]: """ Validate the dependency graph has no cycles. Returns: (is_valid, error) where is_valid is True if no cycles, and error contains cycle details if cycles exist. """ WHITE, GRAY, BLACK = 0, 1, 2 color = {item: WHITE for item in self.items} parent = {} def dfs(item) -> Optional[str]: color[item] = GRAY for dep in self.dependencies[item]: if color[dep] == GRAY: parent[dep] = item return dep if color[dep] == WHITE: parent[dep] = item result = dfs(dep) if result: return result color[item] = BLACK return None for item in self.items: if color[item] == WHITE: cycle_start = dfs(item) if cycle_start: # Reconstruct cycle cycle = [cycle_start] current = parent[cycle_start] while current != cycle_start: cycle.append(current) current = parent[current] cycle.append(cycle_start) cycle = cycle[::-1] message = f"Circular dependency detected: {' -> '.join(cycle)}" return False, DependencyError(cycle, message) return True, None def get_build_order(self) -> Optional[List[str]]: """ Get a valid build order (topological sort). Returns None if circular dependency exists. """ WHITE, GRAY, BLACK = 0, 1, 2 color = {item: WHITE for item in self.items} order = [] def dfs(item) -> bool: color[item] = GRAY for dep in self.dependencies[item]: if color[dep] == GRAY: return False # Cycle if color[dep] == WHITE: if not dfs(dep): return False color[item] = BLACK order.append(item) return True for item in self.items: if color[item] == WHITE: if not dfs(item): return None return order # Note: this is reverse topological order def demonstrate_dependency_validation(): print("Dependency Validation Examples") print("=" * 60) # Valid dependencies (can be resolved) valid = DependencyValidator() valid.add_dependency("app", "framework") valid.add_dependency("app", "utils") valid.add_dependency("framework", "core") valid.add_dependency("utils", "core") valid.add_item("core") # No dependencies is_valid, error = valid.validate() print("Valid package dependencies:") print(f" Valid: {is_valid}") print(f" Build order: {valid.get_build_order()}") print() # Circular dependency circular = DependencyValidator() circular.add_dependency("module_a", "module_b") circular.add_dependency("module_b", "module_c") circular.add_dependency("module_c", "module_a") # Creates cycle! is_valid, error = circular.validate() print("Circular module imports:") print(f" Valid: {is_valid}") if error: print(f" Error: {error.message}") print() # Self-dependency self_dep = DependencyValidator() self_dep.add_dependency("recursive", "recursive") is_valid, error = self_dep.validate() print("Self-dependency:") print(f" Valid: {is_valid}") if error: print(f" Error: {error.message}") if __name__ == "__main__": demonstrate_dependency_validation() # Output:# Dependency Validation Examples# ============================================================# Valid package dependencies:# Valid: True# Build order: ['core', 'framework', 'utils', 'app']## Circular module imports:# Valid: False# Error: Circular dependency detected: module_a -> module_b -> module_c -> module_a## Self-dependency:# Valid: False# Error: Circular dependency detected: recursive -> recursiveWhen implementing these algorithms in production, consider these performance factors:
For dynamic connectivity (edges added incrementally, with connectivity queries), Union-Find with path compression and union by rank achieves near-O(1) per operation. DFS-based component finding is O(V + E) per query, making Union-Find far superior for repeated queries in dynamic graphs.
Connected components and cycle detection are fundamental applications of DFS with wide-ranging real-world uses.
Module Complete:
You have now mastered Depth-First Search (DFS) from the ground up. You understand the algorithm's mechanics in both recursive and iterative forms, the stack behavior that drives depth-first exploration, pre-order and post-order traversals, time complexity analysis, and key applications in connectivity and cycle detection.
This foundation prepares you for the next module: Breadth-First Search (BFS), which explores graphs layer by layer and is the tool of choice for shortest paths in unweighted graphs.
Congratulations! You've completed Module 2: Depth-First Search (DFS). You now have the skills to implement DFS, analyze its complexity, and apply it to solve real problems. The next module covers Breadth-First Search (BFS), the other fundamental graph traversal algorithm.