Graph Types - Learning Module

Loading content...

0/276

Directed Graphs — One-Way Edges

The World of One-Way Streets

Imagine navigating a city where every street is one-way. You can travel from point A to point B, but returning might require an entirely different route—or might be impossible altogether. This asymmetry, where connections have a specific direction, is the defining characteristic of directed graphs.

Directed graphs, also known as digraphs, are among the most powerful modeling tools in computer science. They capture relationships where order matters: following someone on Twitter doesn't mean they follow you back; a function calling another function doesn't imply a reverse call; a web page linking to another doesn't guarantee a reciprocal link. This inherent asymmetry makes directed graphs essential for modeling countless real-world systems.

What You Will Learn

By the end of this page, you will understand the formal definition and notation of directed graphs, recognize their applications in software systems and algorithms, master essential digraph-specific concepts like in-degree and out-degree, and appreciate why direction fundamentally changes how we reason about graph problems.

Formal Definition and Notation

A directed graph (or digraph) is a mathematical structure that formalizes the concept of one-way relationships between entities. Unlike undirected graphs where edges represent symmetric connections, directed graphs capture asymmetric relationships where the direction of the connection carries meaning.

Formal Definition:

A directed graph G is defined as an ordered pair G = (V, E), where:

V is a finite set of vertices (also called nodes)
E is a set of directed edges (also called arcs), where each edge is an ordered pair (u, v) with u, v ∈ V

The crucial distinction from undirected graphs lies in the word ordered. In a directed edge (u, v), the vertex u is the source (or tail), and v is the target (or head). This edge represents a connection from u to v, but explicitly not from v to u.

Ordered Pairs Matter

In directed graphs, (u, v) ≠ (v, u). The edge (A, B) means 'A connects to B,' while (B, A) means 'B connects to A.' These are distinct edges. In contrast, undirected graphs treat {A, B} and {B, A} as the same edge.

Example:

Consider a directed graph with V = {A, B, C, D} and E = {(A, B), (A, C), (B, C), (C, D), (D, B)}.

In this graph:

From A, you can reach B and C directly
From B, you can only reach C directly
From C, you can only reach D directly
From D, you can only reach B directly
Notice: There's no direct edge from B to A, even though (A, B) exists

This creates interesting reachability patterns. You can get from A to D (via A→C→D), but you cannot get from D to A through any path—the graph has no directed path from D to A.

Directed Graph Terminology
Term	Definition	Example
Directed Edge (Arc)	An ordered pair (u, v) representing a connection from u to v	(A, B) means A points to B
Source (Tail)	The starting vertex of a directed edge	In (A, B), A is the source
Target (Head)	The ending vertex of a directed edge	In (A, B), B is the target
Out-neighbor	A vertex v such that (u, v) ∈ E	B is an out-neighbor of A if (A, B) exists
In-neighbor	A vertex u such that (u, v) ∈ E	A is an in-neighbor of B if (A, B) exists
Directed Path	A sequence of vertices where each consecutive pair forms a directed edge	A→B→C is a path if (A,B) and (B,C) exist
Directed Cycle	A directed path that starts and ends at the same vertex	B→C→D→B if (B,C), (C,D), (D,B) exist

In-Degree and Out-Degree

In undirected graphs, each vertex has a single degree—the number of edges incident to it. Directed graphs require a more nuanced concept because edges have direction. We distinguish between edges entering a vertex and edges leaving a vertex.

Out-Degree (deg⁺(v)): The out-degree of a vertex v is the number of edges that have v as their source. Formally:

deg⁺(v) = |{(v, u) ∈ E : u ∈ V}|

This counts how many vertices you can reach directly from v in one step.

In-Degree (deg⁻(v)): The in-degree of a vertex v is the number of edges that have v as their target. Formally:

deg⁻(v) = |{(u, v) ∈ E : u ∈ V}|

This counts how many vertices can reach v directly in one step.

The Handshaking Lemma for Digraphs

In any directed graph, the sum of all in-degrees equals the sum of all out-degrees, and both equal the total number of edges: Σdeg⁺(v) = Σdeg⁻(v) = |E|. Every edge contributes exactly 1 to some vertex's out-degree and exactly 1 to some vertex's in-degree.

Special Vertices Based on Degree:

The in-degree and out-degree of vertices reveal their structural role in the graph:

Source vertex: A vertex with in-degree 0 (no incoming edges). No other vertex points to it. In a dependency graph, this represents a task with no prerequisites.
Sink vertex: A vertex with out-degree 0 (no outgoing edges). It doesn't point to any other vertex. In a web graph, this might be a page with no outbound links.
Isolated vertex: A vertex with both in-degree and out-degree equal to 0. It has no connections whatsoever.
Hub vertex: Informally, a vertex with high out-degree, pointing to many other vertices.
Authority vertex: Informally, a vertex with high in-degree, pointed to by many other vertices.

degree_calculation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def calculate_degrees(adjacency_list):
    """
    Calculate in-degree and out-degree for all vertices.
    
    Args:
        adjacency_list: Dict mapping each vertex to its list of out-neighbors
    
    Returns:
        Tuple of (in_degrees, out_degrees) dictionaries
    """
    vertices = set(adjacency_list.keys())
    
    # Initialize degrees
    in_degrees = {v: 0 for v in vertices}
    out_degrees = {v: 0 for v in vertices}
    
    # Count degrees
    for source, neighbors in adjacency_list.items():
        out_degrees[source] = len(neighbors)
        for target in neighbors:
            if target in in_degrees:
                in_degrees[target] += 1
    
    return in_degrees, out_degrees
 
def find_sources_and_sinks(adjacency_list):
    """
    Find all source vertices (in-degree 0) and 
    sink vertices (out-degree 0) in a directed graph.
    """
    in_deg, out_deg = calculate_degrees(adjacency_list)
    
    sources = [v for v, deg in in_deg.items() if deg == 0]
    sinks = [v for v, deg in out_deg.items() if deg == 0]
    
    return sources, sinks
 
# Example usage
graph = {
    'A': ['B', 'C'],
    'B': ['C'],
    'C': ['D'],
    'D': ['B']
}
 
in_deg, out_deg = calculate_degrees(graph)
print("In-degrees:", in_deg)   # {'A': 0, 'B': 2, 'C': 2, 'D': 1}
print("Out-degrees:", out_deg) # {'A': 2, 'B': 1, 'C': 1, 'D': 1}
 
sources, sinks = find_sources_and_sinks(graph)
print("Sources:", sources)  # ['A']
print("Sinks:", sinks)      # []

Real-World Applications of Directed Graphs

Directed graphs are ubiquitous in computing and beyond. The asymmetric nature of their edges makes them the natural choice for modeling relationships where direction carries semantic meaning. Let's explore the most significant applications across different domains.

Major Application Domains

•Web Graphs and PageRank — The World Wide Web is fundamentally a directed graph. Each web page is a vertex, and each hyperlink is a directed edge from the source page to the target page. Google's PageRank algorithm exploits this directed structure: pages with many incoming links from authoritative pages are considered more important. The direction matters—a link from Wikipedia to a random blog carries significance, but the reverse link doesn't grant the blog Wikipedia's authority.
•Dependency Graphs — Package managers (npm, pip, Maven), build systems (Make, Gradle), and task schedulers model dependencies as directed graphs. If package A depends on package B, there's an edge from A to B. This direction is critical: installing A requires B, but installing B doesn't require A. Detecting cycles (circular dependencies) and determining installation order (topological sort) are core operations.
•Social Networks (Asymmetric) — Twitter, Instagram follows, and YouTube subscriptions are directed. Following a celebrity doesn't make them follow you back. Analyzing influence, identifying key influencers, and studying information propagation all require directed graph algorithms. The directed nature reveals asymmetric social relationships.
•Function Call Graphs — Compilers and static analysis tools build directed graphs where vertices are functions and edges represent function calls. If function foo() calls bar(), there's an edge from foo to bar. This enables dead code detection, inlining decisions, and cycle detection (which may indicate recursion or mutual recursion).
•State Machines and Automata — Finite state machines are directed graphs where vertices represent states and edges represent transitions. The direction indicates valid state transitions. This models everything from regex matchers to network protocols to game AI.
•Citation Networks — Academic papers form a directed graph: an edge from paper A to paper B means A cites B. Unlike a follow relationship, citations are permanent and temporal—you can only cite papers published before yours. Analyzing citation patterns reveals influential papers and emerging research areas.

Directed Graph Applications Comparison
Domain	Vertices	Edges	Key Algorithms
Web Graph	Web pages	Hyperlinks	PageRank, HITS, Crawling
Dependencies	Packages/tasks	Dependencies	Topological sort, Cycle detection
Social (asymmetric)	Users	Follows	Influence propagation, Community detection
Call Graphs	Functions	Calls	Reachability, Dead code elimination
State Machines	States	Transitions	DFS/BFS traversal, Minimization
Citation Networks	Papers	Citations	Impact analysis, Trend detection

Recognizing Directed Graph Problems

When analyzing a real-world problem, ask: 'If entity A relates to entity B, does B necessarily relate to A in the same way?' If not—if the relationship is inherently one-directional—you're dealing with a directed graph problem.

Directed Paths and Reachability

The concept of reachability in directed graphs is fundamentally different from undirected graphs. In an undirected connected graph, you can reach any vertex from any other vertex. In directed graphs, connectivity is far more nuanced—you might reach vertex B from A but not reach A from B.

Directed Path:

A directed path from vertex u to vertex v is a sequence of vertices (u = v₀, v₁, v₂, ..., vₖ = v) such that for each consecutive pair (vᵢ, vᵢ₊₁), the directed edge (vᵢ, vᵢ₊₁) exists in E. The direction of edges must be respected.

Reachability:

Vertex v is reachable from vertex u if there exists a directed path from u to v. We denote this as u ⟿ v. Note that u ⟿ v does NOT imply v ⟿ u in general.

Reachability Facts

•Every vertex reaches itself (reflexive)
•Reachability is transitive: if u ⟿ v and v ⟿ w, then u ⟿ w
•Reachability is NOT symmetric: u ⟿ v does not imply v ⟿ u
•BFS/DFS from u finds all vertices reachable from u
•The transitive closure captures all reachability pairs

Common Pitfalls

•Assuming bidirectional reachability
•Forgetting that reverse edges don't exist
•Using undirected graph algorithms on digraphs
•Not distinguishing forward vs backward edges
•Ignoring edge direction during traversal

Transitive Closure:

The transitive closure of a directed graph G = (V, E) is a graph G* = (V, E*) where (u, v) ∈ E* if and only if there exists a directed path from u to v in G. The transitive closure makes reachability queries O(1) after O(V³) preprocessing—useful when you need many reachability queries.

Practical Implications:

Consider a dependency graph where edges represent 'depends on.' If package A depends on B, and B depends on C, then A transitively depends on C. The transitive closure reveals all transitive dependencies, critical for determining what needs to be installed or rebuilt.

reachability.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
from collections import deque
 
def find_reachable_vertices(graph, start):
    """
    Find all vertices reachable from a starting vertex
    using BFS. Respects edge direction.
    
    Args:
        graph: Adjacency list representation
        start: Starting vertex
    
    Returns:
        Set of all reachable vertices (including start)
    """
    reachable = set()
    queue = deque([start])
    reachable.add(start)
    
    while queue:
        current = queue.popleft()
        for neighbor in graph.get(current, []):
            if neighbor not in reachable:
                reachable.add(neighbor)
                queue.append(neighbor)
    
    return reachable
 
def is_reachable(graph, source, target):
    """
    Check if target is reachable from source.
    """
    return target in find_reachable_vertices(graph, source)
 
def compute_transitive_closure(graph):
    """
    Compute the transitive closure using Floyd-Warshall.
    Returns a set of all (u, v) pairs where u can reach v.
    
    Time Complexity: O(V³)
    Space Complexity: O(V²)
    """
    vertices = list(graph.keys())
    n = len(vertices)
    vertex_to_idx = {v: i for i, v in enumerate(vertices)}
    
    # Initialize reachability matrix
    reach = [[False] * n for _ in range(n)]
    
    # Direct edges
    for u, neighbors in graph.items():
        i = vertex_to_idx[u]
        reach[i][i] = True  # Self-reachability
        for v in neighbors:
            j = vertex_to_idx.get(v, -1)
            if j != -1:
                reach[i][j] = True
    
    # Floyd-Warshall for transitive closure
    for k in range(n):
        for i in range(n):
            for j in range(n):
                reach[i][j] = reach[i][j] or (reach[i][k] and reach[k][j])
    
    # Convert back to vertex pairs
    closure = set()
    for i in range(n):
        for j in range(n):
            if reach[i][j]:
                closure.add((vertices[i], vertices[j]))
    
    return closure
 
# Example
graph = {'A': ['B', 'C'], 'B': ['C'], 'C': ['D'], 'D': ['B']}
 
print("Reachable from A:", find_reachable_vertices(graph, 'A'))
# Output: {'A', 'B', 'C', 'D'}
 
print("Reachable from D:", find_reachable_vertices(graph, 'D'))
# Output: {'D', 'B', 'C'} - Note: A is not reachable from D!

Directed Cycles and Directed Acyclic Graphs

Directed Cycles:

A directed cycle is a directed path that starts and ends at the same vertex, with at least one edge. Formally, it's a sequence (v₀, v₁, ..., vₖ) where k ≥ 1, edges (vᵢ, vᵢ₊₁) exist for all i, and v₀ = vₖ.

Cycles in directed graphs have profound implications:

In dependency graphs, cycles indicate circular dependencies (A depends on B, B depends on C, C depends on A)—typically an error
In state machines, cycles represent loops that a system can enter
In recursive function calls, cycles indicate mutual recursion

Directed Acyclic Graphs (DAGs):

A DAG is a directed graph with no directed cycles. DAGs are extraordinarily important because:

They support topological ordering: an arrangement of vertices such that for every edge (u, v), u appears before v
They model partial orders: dependencies, prerequisites, inheritance hierarchies
Many algorithms that are hard on general digraphs become tractable on DAGs

Why DAGs Are Special

DAGs hit a sweet spot: they're more expressive than trees (a node can have multiple parents), but more structured than arbitrary digraphs (no cycles). This makes them ideal for version control (Git's commit graph), expression evaluation, data pipelines, and scheduling problems.

Cycle Detection in Directed Graphs:

Detecting cycles in directed graphs is subtly different from undirected graphs. In undirected graphs, any back edge during DFS indicates a cycle. In directed graphs, we must be more careful—we need to track whether a vertex is currently in our DFS recursion stack.

We classify edges during DFS into:

Tree edges: Edges to unvisited vertices
Back edges: Edges to ancestors in the current DFS path → indicates a cycle
Forward edges: Edges to descendants (already visited via a tree path)
Cross edges: Edges to vertices in different subtrees

A directed graph has a cycle if and only if DFS encounters a back edge.

cycle_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def has_cycle_directed(graph):
    """
    Detect if a directed graph contains a cycle.
    Uses DFS with a recursion stack tracking.
    
    Args:
        graph: Adjacency list (dict of vertex -> list of neighbors)
    
    Returns:
        True if cycle exists, False otherwise
    """
    # States: 0 = unvisited, 1 = in current path, 2 = completed
    state = {v: 0 for v in graph}
    
    def dfs(vertex):
        if state[vertex] == 1:  # Back edge: cycle found
            return True
        if state[vertex] == 2:  # Already processed: no cycle here
            return False
        
        state[vertex] = 1  # Mark as being processed
        
        for neighbor in graph.get(vertex, []):
            if neighbor in state and dfs(neighbor):
                return True
        
        state[vertex] = 2  # Mark as completed
        return False
    
    # Check from all vertices (graph may be disconnected)
    for vertex in graph:
        if state[vertex] == 0:
            if dfs(vertex):
                return True
    
    return False
 
def topological_sort(graph):
    """
    Returns a topological ordering of a DAG.
    Raises ValueError if the graph contains a cycle.
    
    Uses Kahn's algorithm (BFS-based).
    """
    # Calculate in-degrees
    in_degree = {v: 0 for v in graph}
    for v, neighbors in graph.items():
        for neighbor in neighbors:
            if neighbor in in_degree:
                in_degree[neighbor] += 1
    
    # Start with all sources (in-degree 0)
    queue = [v for v, deg in in_degree.items() if deg == 0]
    result = []
    
    while queue:
        vertex = queue.pop(0)
        result.append(vertex)
        
        for neighbor in graph.get(vertex, []):
            if neighbor in in_degree:
                in_degree[neighbor] -= 1
                if in_degree[neighbor] == 0:
                    queue.append(neighbor)
    
    if len(result) != len(graph):
        raise ValueError("Graph contains a cycle - no topological order exists")
    
    return result
 
# Example: DAG
dag = {'A': ['B', 'C'], 'B': ['D'], 'C': ['D'], 'D': []}
print("Has cycle:", has_cycle_directed(dag))  # False
print("Topological order:", topological_sort(dag))  # ['A', 'B', 'C', 'D'] or similar
 
# Example: Graph with cycle
cyclic = {'A': ['B'], 'B': ['C'], 'C': ['A']}
print("Has cycle:", has_cycle_directed(cyclic))  # True

Strongly Connected Components

In undirected graphs, we talk about connected components—maximal sets of vertices where every pair is connected by a path. Directed graphs require a more nuanced concept: strongly connected components (SCCs).

Definition:

A strongly connected component of a directed graph is a maximal set of vertices such that for every pair of vertices u and v in the set, both u ⟿ v (u can reach v) AND v ⟿ u (v can reach u) hold.

In other words, within an SCC, you can travel between any two vertices in both directions. Every vertex can reach every other vertex, and vice versa.

Key Properties:

Every vertex belongs to exactly one SCC (SCCs partition the vertex set)
A single vertex forms its own SCC (trivially strongly connected to itself)
The condensation graph (formed by contracting each SCC to a single node) is always a DAG
If we compute SCCs, many problems on general digraphs reduce to problems on DAGs

The Power of SCC Decomposition

Once you decompose a graph into SCCs and form the condensation DAG, you've reduced an arbitrary digraph problem to a DAG problem. This is a powerful technique: analyze the macro-level structure between components (DAG structure), then handle intra-component details (each SCC is strongly connected).

Finding SCCs:

Two classic linear-time algorithms find all SCCs:

Kosaraju's Algorithm: Run two DFS passes—one on the original graph, one on the transpose (reversed edges)—using finish times from the first pass to order the second pass.
Tarjan's Algorithm: Single DFS with low-link values to identify SCC roots during the traversal.

Applications of SCCs:

Dead code elimination: In a call graph, functions in an SCC are mutually reachable. If no function in an SCC is reachable from main, the entire component is dead code.
2-SAT problems: The satisfiability of boolean formulas in CNF with exactly 2 literals per clause can be solved by building an implication graph and analyzing SCCs.
Website structure analysis: SCCs in web graphs reveal tightly interlinked website clusters.
Simplifying dependency analysis: Treat each SCC as a single 'super-node' to analyze dependencies at a higher level.

scc_kosaraju.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
def find_sccs_kosaraju(graph):
    """
    Find all strongly connected components using Kosaraju's algorithm.
    
    Args:
        graph: Adjacency list (dict of vertex -> list of neighbors)
    
    Returns:
        List of SCCs, where each SCC is a list of vertices
    
    Time Complexity: O(V + E)
    Space Complexity: O(V)
    """
    # Step 1: Build transpose graph
    vertices = set(graph.keys())
    for neighbors in graph.values():
        vertices.update(neighbors)
    
    transpose = {v: [] for v in vertices}
    for u, neighbors in graph.items():
        for v in neighbors:
            transpose[v].append(u)
    
    # Step 2: First DFS to get finish order
    visited = set()
    finish_order = []
    
    def dfs1(vertex):
        visited.add(vertex)
        for neighbor in graph.get(vertex, []):
            if neighbor not in visited:
                dfs1(neighbor)
        finish_order.append(vertex)
    
    for vertex in vertices:
        if vertex not in visited:
            dfs1(vertex)
    
    # Step 3: Second DFS on transpose in reverse finish order
    visited.clear()
    sccs = []
    
    def dfs2(vertex, component):
        visited.add(vertex)
        component.append(vertex)
        for neighbor in transpose[vertex]:
            if neighbor not in visited:
                dfs2(neighbor, component)
    
    for vertex in reversed(finish_order):
        if vertex not in visited:
            component = []
            dfs2(vertex, component)
            sccs.append(component)
    
    return sccs
 
# Example
graph = {
    'A': ['B'],
    'B': ['C', 'E', 'F'],
    'C': ['D', 'G'],
    'D': ['C', 'H'],
    'E': ['A', 'F'],
    'F': ['G'],
    'G': ['F', 'H'],
    'H': ['H']
}
 
sccs = find_sccs_kosaraju(graph)
print("Strongly Connected Components:")
for i, scc in enumerate(sccs, 1):
    print(f"  SCC {i}: {scc}")
 
# Output (order may vary):
# SCC 1: ['H']
# SCC 2: ['G', 'F']
# SCC 3: ['D', 'C']
# SCC 4: ['E', 'B', 'A']

Summary: Directed Graphs

We've explored directed graphs comprehensively—from their formal mathematical foundation to practical algorithms and real-world applications. Let's consolidate the key concepts:

Key Takeaways

•Directed graphs model asymmetric relationships — The edge (u, v) exists independently of (v, u). Direction carries semantic meaning: dependencies, follows, citations, function calls.
•In-degree and out-degree — Unlike undirected graphs with a single degree, digraph vertices have both. Sources (in-degree 0) and sinks (out-degree 0) have special structural roles.
•Reachability is not symmetric — Just because u reaches v doesn't mean v reaches u. BFS/DFS from a vertex finds only what's reachable respecting edge direction.
•DAGs are special and powerful — Directed acyclic graphs support topological ordering and enable efficient algorithms that fail on cyclic graphs.
•Strongly connected components — SCCs partition a digraph into maximal bidirectionally-connected subsets. The condensation graph is always a DAG.
•Cycle detection differs from undirected graphs — We track the DFS recursion stack, looking for back edges to ancestors in the current path.

What's Next:

We've mastered directed graphs with their one-way edges. Next, we'll explore undirected graphs, where every edge is bidirectional. While simpler in some ways (no in/out-degree distinction), undirected graphs have their own rich properties and application domains. Understanding both types—and when to use each—is essential for effective graph modeling.

Page Complete

You now have a thorough understanding of directed graphs: their formal definition, degree concepts, reachability, cycles, DAGs, and strongly connected components. You can recognize when a problem requires directed graph modeling and apply appropriate algorithms to analyze digraph structure.