Floyd-Warshall - Learning Module

Loading content...

0/276

Dynamic Programming Approach

The Elegance of Structured Problem Decomposition

Floyd-Warshall is a masterclass in dynamic programming. While many DP algorithms feel like clever tricks applied to specific problems, Floyd-Warshall demonstrates the pure, distilled essence of the paradigm: identify a structured way to build from smaller solutions to larger ones, cache intermediate results, and systematically explore the entire solution space.

What makes Floyd-Warshall particularly instructive is its simplicity. The entire algorithm is three nested loops with a one-line update. Yet this simple structure solves a problem that seems to require considering an exponential number of possible paths. Understanding why this works reveals deep insights about dynamic programming itself.

What You Will Learn

By the end of this page, you will understand how Floyd-Warshall embodies the two pillars of dynamic programming—optimal substructure and overlapping subproblems. You'll see the formal derivation of the recurrence, understand why in-place updates work, trace through the complete algorithm, and internalize the state transition that powers the computation.

DP Foundations: Why Dynamic Programming Applies

For dynamic programming to apply, a problem must exhibit two key properties:

Optimal Substructure — An optimal solution contains optimal solutions to subproblems
Overlapping Subproblems — The same subproblems are solved multiple times in a recursive approach

Let's verify that All-Pairs Shortest Path satisfies both.

Optimal Substructure in Shortest Paths

A subpath of a shortest path is itself a shortest path. If the shortest path from A to C goes through B, then the A→B portion is the shortest path from A to B, and the B→C portion is the shortest path from B to C. If either wasn't shortest, we could substitute a shorter one, contradicting the original path being shortest.

Formal Statement of Optimal Substructure:

Let p = ⟨v₁, v₂, ..., vₖ⟩ be a shortest path from v₁ to vₖ. For any indices 1 ≤ i ≤ j ≤ k, the subpath pᵢⱼ = ⟨vᵢ, vᵢ₊₁, ..., vⱼ⟩ is a shortest path from vᵢ to vⱼ.

Proof by contradiction: Suppose pᵢⱼ is not a shortest path from vᵢ to vⱼ. Then there exists a shorter path p'ᵢⱼ. Replacing pᵢⱼ with p'ᵢⱼ in the original path p gives a path from v₁ to vₖ shorter than p. This contradicts p being a shortest path. ∎

Overlapping Subproblems:

Consider computing shortest paths recursively. To find the shortest path from A to D, you might consider paths through intermediate vertices B and C. Computing A→B and B→D for the 'through B' option, and A→C and C→D for the 'through C' option. But if you also need to find the shortest path from A to E through B, you'll recompute A→B.

In a naive recursive approach, these shared subpaths get recomputed exponentially many times. Dynamic programming caches them—compute once, reuse everywhere.

overlapping-subproblems.txt
DEMONSTRATING OVERLAPPING SUBPROBLEMS:
 
Consider shortest paths in a 5-vertex graph:
 
Path A→D needs: A→B, A→C, B→D, C→D, ...
Path A→E needs: A→B, A→C, B→E, C→E, ...
Path B→E needs: B→C, B→D, C→E, D→E, ...
                            
OVERLAP:
- A→B appears in: A→D, A→E, A→C (if through B), ...
- B→C appears in: B→D (if through C), B→E, A→D (if through B→C), ...
- etc.
 
Without memoization: Exponential recomputation
With memoization:    O(V³) total work
 
Floyd-Warshall structures computation to compute each
subproblem exactly once, organized by intermediate vertex sets.

Defining the DP State

The art of dynamic programming lies in choosing the right state representation—the parameters that uniquely define each subproblem. Floyd-Warshall's genius is its choice of state.

The Intermediate Vertex Parameterization:

Instead of thinking about paths by their length or number of edges, Floyd-Warshall parameterizes paths by which vertices are allowed as intermediates.

Let the vertices be numbered 1, 2, 3, ..., V.

Define: d^(k)[i][j] = The length of the shortest path from vertex i to vertex j, using only vertices from the set {1, 2, ..., k} as intermediate vertices.

Understanding the State Parameter k
k	Allowed Intermediates	What d^(k)[i][j] Represents
0	None	Direct edge from i to j (or ∞ if no edge)
1	{1}	Shortest path using only vertex 1 as intermediate (if helpful)
2	{1, 2}	Shortest path using vertices 1 and/or 2 as intermediates
k	{1, 2, ..., k}	Shortest path using any subset of vertices 1 through k
V	{1, 2, ..., V}	Shortest path with all vertices available (FINAL ANSWER)

Why This State Works:

This parameterization is powerful because:

Clear Starting Point (k=0): With no intermediates allowed, d^(0)[i][j] is simply the edge weight from i to j (or ∞).
Incremental Expansion: Adding one more vertex to the allowed set changes the problem in a controlled, analyzable way.
Natural Termination (k=V): When all vertices are allowed as intermediates, we have the complete shortest paths—no restriction remains.
Polynomial State Space: There are O(V²) pairs (i, j) and O(V) values of k, giving O(V³) total states. Each state is computed exactly once.

The Brilliance of the Parameterization

Many DP problems parameterize by 'number of steps taken' or 'prefix processed.' Floyd-Warshall parameterizes by 'which resources are available.' This creates a lattice of subproblems where each level expands the tool set (available intermediate vertices) rather than extending a path. It's a fundamentally different—and elegant—perspective.

Deriving the Recurrence Relation

With our state defined, we now derive the recurrence that relates d^(k) to d^(k-1). This is the heart of the algorithm.

The Key Question: How do we compute d^(k)[i][j] given we already know all values of d^(k-1)?

The Critical Observation:

Consider the shortest path from i to j using vertices {1, 2, ..., k} as intermediates. This path either:

Case 1: Does NOT use vertex k at all

Then it uses only vertices {1, ..., k-1} as intermediates
Its length is d^(k-1)[i][j]

Case 2: DOES use vertex k (at least once)

Since shortest paths don't repeat vertices (in graphs without negative cycles), k appears exactly once
The path decomposes into: i → ... → k → ... → j
The i→k portion uses only {1, ..., k-1} as intermediates (since k is being visited, not used as intermediate)
The k→j portion uses only {1, ..., k-1} as intermediates
Total length: d^(k-1)[i][k] + d^(k-1)[k][j]

recurrence-derivation.txt
THE FLOYD-WARSHALL RECURRENCE DERIVATION:
 
GIVEN: d^(k-1)[i][j] for all i, j (shortest paths with intermediates ⊆ {1,...,k-1})
GOAL:  d^(k)[i][j] for all i, j (shortest paths with intermediates ⊆ {1,...,k})
 
ANALYSIS OF SHORTEST PATH i → j WITH INTERMEDIATES ⊆ {1,...,k}:
 
      CASE 1: Path avoids k          CASE 2: Path goes through k
      ────────────────────           ─────────────────────────────
      
           i ───────→ j                    i ─────→ k ─────→ j
              path P                          P1        P2
                                        
      P uses only {1,...,k-1}          P1 uses {1,...,k-1}
      Length: d^(k-1)[i][j]            P2 uses {1,...,k-1}
                                       Total: d^(k-1)[i][k] + d^(k-1)[k][j]
 
OPTIMAL CHOICE: Take the minimum of both cases
 
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   d^(k)[i][j] = min( d^(k-1)[i][j],                            │
│                      d^(k-1)[i][k] + d^(k-1)[k][j] )           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
 
This is the COMPLETE recurrence for Floyd-Warshall.

Base Case: k = 0

With no intermediate vertices allowed, paths must be direct edges:

d^(0)[i][j] = {
  0           if i = j
  w(i,j)      if there is an edge from i to j with weight w(i,j)
  ∞           otherwise
}

This is simply the adjacency matrix of the graph, with zeros on the diagonal.

Computing the Final Answer:

After computing d^(V), the matrix contains shortest paths using all possible intermediates—which is the unrestricted shortest path. So d^(V)[i][j] is the final answer for each pair (i, j).

Why No Repeated Visits?

We claimed that k appears at most once on a shortest path. This is because if a shortest path visited k twice, it would contain a cycle. In graphs without negative cycles, a cycle can only add to the path length (or keep it same if zero-weight cycle). Removing the cycle gives a shorter or equal path—contradicting the original path being shortest (unless the path was already optimal and contained a zero-weight cycle, which we can assume is removed).

The In-Place Optimization

A naive implementation would maintain V+1 separate matrices: d^(0), d^(1), ..., d^(V). This requires O(V³) space—storing V matrices each of size V×V.

However, Floyd-Warshall can be implemented with just one V×V matrix, updated in place. This reduces space to O(V²). This isn't just an optimization—it's standard practice.

Why In-Place Updates Work:

The recurrence d^(k)[i][j] = min(d^(k-1)[i][j], d^(k-1)[i][k] + d^(k-1)[k][j]) only reads from d^(k-1). When we update the matrix in place, we're modifying d[i][j] potentially before we've finished using all d^(k-1) values.

But here's the critical insight: the values d[i][k] and d[k][j] don't change during the k-th iteration.

Why Row k and Column k Are Stable

During iteration k, we compute d[i][k] = min(d[i][k], d[i][k] + d[k][k]). Since d[k][k] = 0 (a valid shortest path from a vertex to itself), this simplifies to d[i][k] = min(d[i][k], d[i][k]) = d[i][k]. The value doesn't change! Similarly, d[k][j] doesn't change. So we can safely read 'old' values from row k and column k while updating other entries.

Formal Proof of In-Place Correctness:

Consider the update: d[i][j] = min(d[i][j], d[i][k] + d[k][j])

At the start of iteration k:

d[i][j] contains d^(k-1)[i][j]
d[i][k] contains d^(k-1)[i][k]
d[k][j] contains d^(k-1)[k][j]

We need to show d[i][k] and d[k][j] remain unchanged throughout iteration k.

For d[i][k]: When we compute d[i][k], we use: d[i][k] = min(d[i][k], d[i][k] + d[k][k]) = min(d[i][k], d[i][k] + 0) = d[i][k] ✓

For d[k][j]: Similarly: d[k][j] = min(d[k][j], d[k][k] + d[k][j]) = min(d[k][j], 0 + d[k][j]) = d[k][j] ✓

Therefore, when we update any cell (i, j), the values d[i][k] and d[k][j] are guaranteed to still be the d^(k-1) values we need. QED.

in-place-visualization.txt
IN-PLACE UPDATE VISUALIZATION:
 
Matrix during iteration k=3:
 
     │  1   2   3   4   5  │  Row 3 reads are
 ────┼─────────────────────┼  from d^(k-1)[i][3]
  1  │ ... ... [*] ... ... │  
  2  │ ... ... [*] ... ... │  ← These values DON'T change
  3  │ [*] [*]  0  [*] [*] │  ← Row 3 and Col 3 are STABLE
  4  │ ... ... [*] ... ... │  
  5  │ ... ... [*] ... ... │  Column 3 reads are
 ────┴─────────────────────┘  from d^(k-1)[3][j]
 
For each cell (i,j) with i≠3 and j≠3:
  d[i][j] = min(d[i][j], d[i][3] + d[3][j])
                          ↑         ↑
                    STABLE     STABLE
 
The [*] cells (row 3 and column 3) are read but never modified
to a different value, so they provide correct d^(k-1) values.

The Complete Algorithm

Now we assemble everything into the complete Floyd-Warshall algorithm. The implementation is remarkably concise—three nested loops and a one-line update.

floyd-warshall.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
def floyd_warshall(graph):
    """
    Compute all-pairs shortest paths using Floyd-Warshall algorithm.
    
    Args:
        graph: V×V matrix where graph[i][j] is the weight of edge i→j
               Use float('inf') for no edge, 0 on diagonal
    
    Returns:
        V×V matrix of shortest path distances
        V×V matrix of predecessors for path reconstruction
    """
    V = len(graph)
    
    # Initialize distance matrix with direct edge weights
    dist = [[graph[i][j] for j in range(V)] for i in range(V)]
    
    # Initialize predecessor matrix for path reconstruction
    # pred[i][j] = the vertex before j on the shortest path from i to j
    pred = [[None if i == j or graph[i][j] == float('inf') else i 
             for j in range(V)] for i in range(V)]
    
    # Main Floyd-Warshall: consider each vertex as intermediate
    for k in range(V):
        # For each source vertex
        for i in range(V):
            # For each destination vertex
            for j in range(V):
                # Check if path through k is shorter
                if dist[i][k] + dist[k][j] < dist[i][j]:
                    dist[i][j] = dist[i][k] + dist[k][j]
                    # Predecessor on path i→j is same as predecessor on path k→j
                    pred[i][j] = pred[k][j]
    
    return dist, pred
 
 
def reconstruct_path(pred, i, j):
    """
    Reconstruct the shortest path from i to j using predecessor matrix.
    
    Returns:
        List of vertices from i to j, or None if no path exists
    """
    if pred[i][j] is None and i != j:
        return None  # No path exists
    
    path = []
    current = j
    while current is not None:
        path.append(current)
        if current == i:
            break
        current = pred[i][current]
    
    if not path or path[-1] != i:
        return None  # Path doesn't reach source
    
    return path[::-1]  # Reverse to get i → j order
 
 
# Example usage
if __name__ == "__main__":
    INF = float('inf')
    
    # Example graph with 4 vertices
    graph = [
        [0,   3,   INF, 5  ],  # Edges from vertex 0
        [2,   0,   INF, 4  ],  # Edges from vertex 1
        [INF, 1,   0,   INF],  # Edges from vertex 2
        [INF, INF, 2,   0  ]   # Edges from vertex 3
    ]
    
    dist, pred = floyd_warshall(graph)
    
    print("Shortest distances:")
    for row in dist:
        print([x if x != INF else "∞" for x in row])
    
    print("\nPath from 0 to 2:", reconstruct_path(pred, 0, 2))

Understanding the Loop Order:

The loop order k → i → j is crucial. The outer loop over k ensures that when we compute updates for intermediate vertex k, all updates for vertices 1 through k-1 are complete. This respects the dependency in our recurrence.

Common mistake: Some students try to reorder the loops for cache efficiency. While i and j can be swapped, k MUST be the outermost loop. Changing this order breaks the algorithm's correctness.

Loop Order is Critical

The k loop MUST be outermost. The i and j loops (source and destination) can be swapped, but k must come first. This is because d^(k) depends on d^(k-1), so we must fully complete level k-1 before starting level k.

Tracing the Algorithm Step by Step

Let's trace Floyd-Warshall on a concrete example to internalize the mechanics. Consider this 4-vertex graph:

trace-example-graph.txt
Graph (4 vertices, 0-indexed):
 
    0 ──(3)──→ 1
    │          │
   (5)        (4)
    │          │
    ↓          ↓
    3 ←─(2)── 2 ←─(1)── 1
    
Edge list:
  0 → 1 (weight 3)
  0 → 3 (weight 5)
  1 → 2 (undefined—no edge, so ∞)
  1 → 3 (weight 4)
  2 → 1 (weight 1)
  3 → 2 (weight 2)
 
Initial adjacency matrix:
        0    1    2    3
   0 [  0    3    ∞    5  ]
   1 [  ∞    0    ∞    4  ]
   2 [  ∞    1    0    ∞  ]
   3 [  ∞    ∞    2    0  ]

Iteration k = 0 (using vertex 0 as intermediate):

For each pair (i, j), check if i → 0 → j is shorter than i → j directly.

d[1][3] via 0? d[1][0] + d[0][3] = ∞ + 5 = ∞. No improvement.
d[2][3] via 0? d[2][0] + d[0][3] = ∞ + 5 = ∞. No improvement.
d[3][1] via 0? d[3][0] + d[0][1] = ∞ + 3 = ∞. No improvement.

Vertex 0 has no incoming edges (except from itself), so nothing improves.

Matrix after k = 0: (unchanged)

Distance matrix after k = 0 (unchanged)
	0	1	2	3
0	0	3	∞	5
1	∞	0	∞	4
2	∞	1	0	∞
3	∞	∞	2	0

Iteration k = 1 (using vertex 1 as intermediate):

Now vertex 1 can be an intermediate. Let's check meaningful pairs:

d[0][3] via 1? d[0][1] + d[1][3] = 3 + 4 = 7 > 5. No improvement.
d[2][3] via 1? d[2][1] + d[1][3] = 1 + 4 = 5 < ∞. Improvement! d[2][3] = 5
d[2][0] via 1? d[2][1] + d[1][0] = 1 + ∞ = ∞. No improvement.

Matrix after k = 1:

Distance matrix after k = 1
	0	1	2	3
0	0	3	∞	5
1	∞	0	∞	4
2	∞	1	0	5
3	∞	∞	2	0

Iteration k = 2 (using vertex 2 as intermediate):

d[0][1] via 2? d[0][2] + d[2][1] = ∞ + 1 = ∞. No improvement.
d[1][3] via 2? d[1][2] + d[2][3] = ∞ + 5 = ∞. No improvement.
d[3][1] via 2? d[3][2] + d[2][1] = 2 + 1 = 3 < ∞. Improvement! d[3][1] = 3
d[3][3] via 2? d[3][2] + d[2][3] = 2 + 5 = 7 > 0. No improvement.

Matrix after k = 2:

Distance matrix after k = 2
	0	1	2	3
0	0	3	∞	5
1	∞	0	∞	4
2	∞	1	0	5
3	∞	3	2	0

Iteration k = 3 (using vertex 3 as intermediate):

d[0][2] via 3? d[0][3] + d[3][2] = 5 + 2 = 7 < ∞. Improvement! d[0][2] = 7
d[0][1] via 3? d[0][3] + d[3][1] = 5 + 3 = 8 > 3. No improvement.
d[1][2] via 3? d[1][3] + d[3][2] = 4 + 2 = 6 < ∞. Improvement! d[1][2] = 6
d[1][1] via 3? d[1][3] + d[3][1] = 4 + 3 = 7 > 0. No improvement.
d[2][2] via 3? d[2][3] + d[3][2] = 5 + 2 = 7 > 0. No improvement.

Final matrix after k = 3:

Final Distance Matrix (all shortest paths)
	0	1	2	3
0	0	3	7	5
1	∞	0	6	4
2	∞	1	0	5
3	∞	3	2	0

Trace Complete

The final matrix shows all-pairs shortest paths. For example, the shortest path from 0 to 2 has distance 7 and goes 0 → 3 → 2. From 1 to 2, distance 6 via 1 → 3 → 2. Every ∞ entry means no path exists (e.g., no path from 1 to 0).

Understanding the State Transition

Let's visualize how the algorithm explores the solution space. The state transition can be viewed as expanding a "allowed intermediate set" one vertex at a time.

state-transition.txt
STATE TRANSITION DIAGRAM:
 
Level 0:  ∅               (No intermediates - direct edges only)
          │
          │ Add vertex 1
          ▼
Level 1:  {1}             (Can route through vertex 1)
          │
          │ Add vertex 2
          ▼
Level 2:  {1,2}           (Can route through vertices 1 or 2)
          │
          │ Add vertex 3
          ▼
Level 3:  {1,2,3}         (Can route through vertices 1, 2, or 3)
          │
          │ Add vertex 4
          ▼
Level 4:  {1,2,3,4}       (All intermediates available - FINAL)
 
 
TRANSITION FROM LEVEL k-1 TO LEVEL k:
 
For every pair (i,j):
    
    Current best: d^(k-1)[i][j] ───┐
                                   │
                                   ├──→ min ──→ d^(k)[i][j]
                                   │
    Path via k: d^(k-1)[i][k]      │
                    +          ────┘
                d^(k-1)[k][j]
 
Each transition considers V² pairs, making 1 comparison each.
Total transitions: V levels × V² comparisons = V³ operations.

The Power of Incremental Expansion:

By the time we consider vertex k as an intermediate, we've already optimized all paths using vertices 1 through k-1. This means:

d[i][k] is already the shortest path from i to k using any of {1, ..., k-1}
d[k][j] is already the shortest path from k to j using any of {1, ..., k-1}

So when we check d[i][k] + d[k][j], we're checking the best possible path through k, not just one arbitrary path. This is why the algorithm is correct—each phase builds on fully optimized previous phases.

Why Order Doesn't Matter Within a Phase:

Within a single iteration k, the order in which we process (i, j) pairs doesn't matter. Each update uses values from row k and column k, which (as we proved) don't change during iteration k. So whether we process (1, 2) before (3, 4) or vice versa, we get the same result.

Summary: The Dynamic Programming Foundation

We've thoroughly explored the dynamic programming mechanics that power Floyd-Warshall. Let's consolidate the key insights:

Key Takeaways

•Optimal substructure and overlapping subproblems — APSP satisfies both DP requirements: subpaths of shortest paths are shortest, and same subproblems (partial paths) appear repeatedly.
•State: d^(k)[i][j] — Shortest path from i to j using vertices {1,...,k} as intermediates. This parameterization by available resources is elegant and powerful.
•Recurrence: min(d^(k-1)[i][j], d^(k-1)[i][k] + d^(k-1)[k][j]) — Either don't use k, or use k exactly once. Take the better option.
•In-place updates work — Row k and column k don't change during iteration k, so we can update the matrix in place with O(V²) space.
•Loop order matters — k MUST be outermost. i and j can be swapped, but k controls the phase of the algorithm.
•V iterations, V² updates each — The algorithm systematically explores V³ state transitions to produce the complete distance matrix.

What's Next:

With the dynamic programming foundation solid, we'll analyze the algorithm's complexity in detail. The next page examines the O(V³) time complexity—why it's optimal for this approach, how constants matter in practice, and the O(V²) space requirement. We'll also discuss how this compares to alternative approaches for different graph densities.

Page Complete

You now understand the dynamic programming paradigm as applied to Floyd-Warshall—the state definition, recurrence derivation, in-place optimization, and complete algorithm. Next, we'll analyze exactly why this runs in O(V³) time.