Loading content...
Floyd-Warshall is a masterclass in dynamic programming. While many DP algorithms feel like clever tricks applied to specific problems, Floyd-Warshall demonstrates the pure, distilled essence of the paradigm: identify a structured way to build from smaller solutions to larger ones, cache intermediate results, and systematically explore the entire solution space.
What makes Floyd-Warshall particularly instructive is its simplicity. The entire algorithm is three nested loops with a one-line update. Yet this simple structure solves a problem that seems to require considering an exponential number of possible paths. Understanding why this works reveals deep insights about dynamic programming itself.
By the end of this page, you will understand how Floyd-Warshall embodies the two pillars of dynamic programming—optimal substructure and overlapping subproblems. You'll see the formal derivation of the recurrence, understand why in-place updates work, trace through the complete algorithm, and internalize the state transition that powers the computation.
For dynamic programming to apply, a problem must exhibit two key properties:
Let's verify that All-Pairs Shortest Path satisfies both.
A subpath of a shortest path is itself a shortest path. If the shortest path from A to C goes through B, then the A→B portion is the shortest path from A to B, and the B→C portion is the shortest path from B to C. If either wasn't shortest, we could substitute a shorter one, contradicting the original path being shortest.
Formal Statement of Optimal Substructure:
Let p = ⟨v₁, v₂, ..., vₖ⟩ be a shortest path from v₁ to vₖ. For any indices 1 ≤ i ≤ j ≤ k, the subpath pᵢⱼ = ⟨vᵢ, vᵢ₊₁, ..., vⱼ⟩ is a shortest path from vᵢ to vⱼ.
Proof by contradiction: Suppose pᵢⱼ is not a shortest path from vᵢ to vⱼ. Then there exists a shorter path p'ᵢⱼ. Replacing pᵢⱼ with p'ᵢⱼ in the original path p gives a path from v₁ to vₖ shorter than p. This contradicts p being a shortest path. ∎
Overlapping Subproblems:
Consider computing shortest paths recursively. To find the shortest path from A to D, you might consider paths through intermediate vertices B and C. Computing A→B and B→D for the 'through B' option, and A→C and C→D for the 'through C' option. But if you also need to find the shortest path from A to E through B, you'll recompute A→B.
In a naive recursive approach, these shared subpaths get recomputed exponentially many times. Dynamic programming caches them—compute once, reuse everywhere.
DEMONSTRATING OVERLAPPING SUBPROBLEMS: Consider shortest paths in a 5-vertex graph: Path A→D needs: A→B, A→C, B→D, C→D, ...Path A→E needs: A→B, A→C, B→E, C→E, ...Path B→E needs: B→C, B→D, C→E, D→E, ... OVERLAP:- A→B appears in: A→D, A→E, A→C (if through B), ...- B→C appears in: B→D (if through C), B→E, A→D (if through B→C), ...- etc. Without memoization: Exponential recomputationWith memoization: O(V³) total work Floyd-Warshall structures computation to compute eachsubproblem exactly once, organized by intermediate vertex sets.The art of dynamic programming lies in choosing the right state representation—the parameters that uniquely define each subproblem. Floyd-Warshall's genius is its choice of state.
The Intermediate Vertex Parameterization:
Instead of thinking about paths by their length or number of edges, Floyd-Warshall parameterizes paths by which vertices are allowed as intermediates.
Let the vertices be numbered 1, 2, 3, ..., V.
Define: d^(k)[i][j] = The length of the shortest path from vertex i to vertex j, using only vertices from the set {1, 2, ..., k} as intermediate vertices.
| k | Allowed Intermediates | What d^(k)[i][j] Represents |
|---|---|---|
| 0 | None | Direct edge from i to j (or ∞ if no edge) |
| 1 | {1} | Shortest path using only vertex 1 as intermediate (if helpful) |
| 2 | {1, 2} | Shortest path using vertices 1 and/or 2 as intermediates |
| k | {1, 2, ..., k} | Shortest path using any subset of vertices 1 through k |
| V | {1, 2, ..., V} | Shortest path with all vertices available (FINAL ANSWER) |
Why This State Works:
This parameterization is powerful because:
Clear Starting Point (k=0): With no intermediates allowed, d^(0)[i][j] is simply the edge weight from i to j (or ∞).
Incremental Expansion: Adding one more vertex to the allowed set changes the problem in a controlled, analyzable way.
Natural Termination (k=V): When all vertices are allowed as intermediates, we have the complete shortest paths—no restriction remains.
Polynomial State Space: There are O(V²) pairs (i, j) and O(V) values of k, giving O(V³) total states. Each state is computed exactly once.
Many DP problems parameterize by 'number of steps taken' or 'prefix processed.' Floyd-Warshall parameterizes by 'which resources are available.' This creates a lattice of subproblems where each level expands the tool set (available intermediate vertices) rather than extending a path. It's a fundamentally different—and elegant—perspective.
With our state defined, we now derive the recurrence that relates d^(k) to d^(k-1). This is the heart of the algorithm.
The Key Question: How do we compute d^(k)[i][j] given we already know all values of d^(k-1)?
The Critical Observation:
Consider the shortest path from i to j using vertices {1, 2, ..., k} as intermediates. This path either:
Case 1: Does NOT use vertex k at all
Case 2: DOES use vertex k (at least once)
THE FLOYD-WARSHALL RECURRENCE DERIVATION: GIVEN: d^(k-1)[i][j] for all i, j (shortest paths with intermediates ⊆ {1,...,k-1})GOAL: d^(k)[i][j] for all i, j (shortest paths with intermediates ⊆ {1,...,k}) ANALYSIS OF SHORTEST PATH i → j WITH INTERMEDIATES ⊆ {1,...,k}: CASE 1: Path avoids k CASE 2: Path goes through k ──────────────────── ───────────────────────────── i ───────→ j i ─────→ k ─────→ j path P P1 P2 P uses only {1,...,k-1} P1 uses {1,...,k-1} Length: d^(k-1)[i][j] P2 uses {1,...,k-1} Total: d^(k-1)[i][k] + d^(k-1)[k][j] OPTIMAL CHOICE: Take the minimum of both cases ┌─────────────────────────────────────────────────────────────────┐│ ││ d^(k)[i][j] = min( d^(k-1)[i][j], ││ d^(k-1)[i][k] + d^(k-1)[k][j] ) ││ │└─────────────────────────────────────────────────────────────────┘ This is the COMPLETE recurrence for Floyd-Warshall.Base Case: k = 0
With no intermediate vertices allowed, paths must be direct edges:
d^(0)[i][j] = {
0 if i = j
w(i,j) if there is an edge from i to j with weight w(i,j)
∞ otherwise
}
This is simply the adjacency matrix of the graph, with zeros on the diagonal.
Computing the Final Answer:
After computing d^(V), the matrix contains shortest paths using all possible intermediates—which is the unrestricted shortest path. So d^(V)[i][j] is the final answer for each pair (i, j).
We claimed that k appears at most once on a shortest path. This is because if a shortest path visited k twice, it would contain a cycle. In graphs without negative cycles, a cycle can only add to the path length (or keep it same if zero-weight cycle). Removing the cycle gives a shorter or equal path—contradicting the original path being shortest (unless the path was already optimal and contained a zero-weight cycle, which we can assume is removed).
A naive implementation would maintain V+1 separate matrices: d^(0), d^(1), ..., d^(V). This requires O(V³) space—storing V matrices each of size V×V.
However, Floyd-Warshall can be implemented with just one V×V matrix, updated in place. This reduces space to O(V²). This isn't just an optimization—it's standard practice.
Why In-Place Updates Work:
The recurrence d^(k)[i][j] = min(d^(k-1)[i][j], d^(k-1)[i][k] + d^(k-1)[k][j]) only reads from d^(k-1). When we update the matrix in place, we're modifying d[i][j] potentially before we've finished using all d^(k-1) values.
But here's the critical insight: the values d[i][k] and d[k][j] don't change during the k-th iteration.
During iteration k, we compute d[i][k] = min(d[i][k], d[i][k] + d[k][k]). Since d[k][k] = 0 (a valid shortest path from a vertex to itself), this simplifies to d[i][k] = min(d[i][k], d[i][k]) = d[i][k]. The value doesn't change! Similarly, d[k][j] doesn't change. So we can safely read 'old' values from row k and column k while updating other entries.
Formal Proof of In-Place Correctness:
Consider the update: d[i][j] = min(d[i][j], d[i][k] + d[k][j])
At the start of iteration k:
We need to show d[i][k] and d[k][j] remain unchanged throughout iteration k.
For d[i][k]: When we compute d[i][k], we use: d[i][k] = min(d[i][k], d[i][k] + d[k][k]) = min(d[i][k], d[i][k] + 0) = d[i][k] ✓
For d[k][j]: Similarly: d[k][j] = min(d[k][j], d[k][k] + d[k][j]) = min(d[k][j], 0 + d[k][j]) = d[k][j] ✓
Therefore, when we update any cell (i, j), the values d[i][k] and d[k][j] are guaranteed to still be the d^(k-1) values we need. QED.
IN-PLACE UPDATE VISUALIZATION: Matrix during iteration k=3: │ 1 2 3 4 5 │ Row 3 reads are ────┼─────────────────────┼ from d^(k-1)[i][3] 1 │ ... ... [*] ... ... │ 2 │ ... ... [*] ... ... │ ← These values DON'T change 3 │ [*] [*] 0 [*] [*] │ ← Row 3 and Col 3 are STABLE 4 │ ... ... [*] ... ... │ 5 │ ... ... [*] ... ... │ Column 3 reads are ────┴─────────────────────┘ from d^(k-1)[3][j] For each cell (i,j) with i≠3 and j≠3: d[i][j] = min(d[i][j], d[i][3] + d[3][j]) ↑ ↑ STABLE STABLE The [*] cells (row 3 and column 3) are read but never modifiedto a different value, so they provide correct d^(k-1) values.Now we assemble everything into the complete Floyd-Warshall algorithm. The implementation is remarkably concise—three nested loops and a one-line update.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
def floyd_warshall(graph): """ Compute all-pairs shortest paths using Floyd-Warshall algorithm. Args: graph: V×V matrix where graph[i][j] is the weight of edge i→j Use float('inf') for no edge, 0 on diagonal Returns: V×V matrix of shortest path distances V×V matrix of predecessors for path reconstruction """ V = len(graph) # Initialize distance matrix with direct edge weights dist = [[graph[i][j] for j in range(V)] for i in range(V)] # Initialize predecessor matrix for path reconstruction # pred[i][j] = the vertex before j on the shortest path from i to j pred = [[None if i == j or graph[i][j] == float('inf') else i for j in range(V)] for i in range(V)] # Main Floyd-Warshall: consider each vertex as intermediate for k in range(V): # For each source vertex for i in range(V): # For each destination vertex for j in range(V): # Check if path through k is shorter if dist[i][k] + dist[k][j] < dist[i][j]: dist[i][j] = dist[i][k] + dist[k][j] # Predecessor on path i→j is same as predecessor on path k→j pred[i][j] = pred[k][j] return dist, pred def reconstruct_path(pred, i, j): """ Reconstruct the shortest path from i to j using predecessor matrix. Returns: List of vertices from i to j, or None if no path exists """ if pred[i][j] is None and i != j: return None # No path exists path = [] current = j while current is not None: path.append(current) if current == i: break current = pred[i][current] if not path or path[-1] != i: return None # Path doesn't reach source return path[::-1] # Reverse to get i → j order # Example usageif __name__ == "__main__": INF = float('inf') # Example graph with 4 vertices graph = [ [0, 3, INF, 5 ], # Edges from vertex 0 [2, 0, INF, 4 ], # Edges from vertex 1 [INF, 1, 0, INF], # Edges from vertex 2 [INF, INF, 2, 0 ] # Edges from vertex 3 ] dist, pred = floyd_warshall(graph) print("Shortest distances:") for row in dist: print([x if x != INF else "∞" for x in row]) print("\nPath from 0 to 2:", reconstruct_path(pred, 0, 2))Understanding the Loop Order:
The loop order k → i → j is crucial. The outer loop over k ensures that when we compute updates for intermediate vertex k, all updates for vertices 1 through k-1 are complete. This respects the dependency in our recurrence.
Common mistake: Some students try to reorder the loops for cache efficiency. While i and j can be swapped, k MUST be the outermost loop. Changing this order breaks the algorithm's correctness.
The k loop MUST be outermost. The i and j loops (source and destination) can be swapped, but k must come first. This is because d^(k) depends on d^(k-1), so we must fully complete level k-1 before starting level k.
Let's trace Floyd-Warshall on a concrete example to internalize the mechanics. Consider this 4-vertex graph:
Graph (4 vertices, 0-indexed): 0 ──(3)──→ 1 │ │ (5) (4) │ │ ↓ ↓ 3 ←─(2)── 2 ←─(1)── 1 Edge list: 0 → 1 (weight 3) 0 → 3 (weight 5) 1 → 2 (undefined—no edge, so ∞) 1 → 3 (weight 4) 2 → 1 (weight 1) 3 → 2 (weight 2) Initial adjacency matrix: 0 1 2 3 0 [ 0 3 ∞ 5 ] 1 [ ∞ 0 ∞ 4 ] 2 [ ∞ 1 0 ∞ ] 3 [ ∞ ∞ 2 0 ]Iteration k = 0 (using vertex 0 as intermediate):
For each pair (i, j), check if i → 0 → j is shorter than i → j directly.
Vertex 0 has no incoming edges (except from itself), so nothing improves.
Matrix after k = 0: (unchanged)
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 0 | 0 | 3 | ∞ | 5 |
| 1 | ∞ | 0 | ∞ | 4 |
| 2 | ∞ | 1 | 0 | ∞ |
| 3 | ∞ | ∞ | 2 | 0 |
Iteration k = 1 (using vertex 1 as intermediate):
Now vertex 1 can be an intermediate. Let's check meaningful pairs:
Matrix after k = 1:
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 0 | 0 | 3 | ∞ | 5 |
| 1 | ∞ | 0 | ∞ | 4 |
| 2 | ∞ | 1 | 0 | 5 |
| 3 | ∞ | ∞ | 2 | 0 |
Iteration k = 2 (using vertex 2 as intermediate):
Matrix after k = 2:
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 0 | 0 | 3 | ∞ | 5 |
| 1 | ∞ | 0 | ∞ | 4 |
| 2 | ∞ | 1 | 0 | 5 |
| 3 | ∞ | 3 | 2 | 0 |
Iteration k = 3 (using vertex 3 as intermediate):
Final matrix after k = 3:
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 0 | 0 | 3 | 7 | 5 |
| 1 | ∞ | 0 | 6 | 4 |
| 2 | ∞ | 1 | 0 | 5 |
| 3 | ∞ | 3 | 2 | 0 |
The final matrix shows all-pairs shortest paths. For example, the shortest path from 0 to 2 has distance 7 and goes 0 → 3 → 2. From 1 to 2, distance 6 via 1 → 3 → 2. Every ∞ entry means no path exists (e.g., no path from 1 to 0).
Let's visualize how the algorithm explores the solution space. The state transition can be viewed as expanding a "allowed intermediate set" one vertex at a time.
STATE TRANSITION DIAGRAM: Level 0: ∅ (No intermediates - direct edges only) │ │ Add vertex 1 ▼Level 1: {1} (Can route through vertex 1) │ │ Add vertex 2 ▼Level 2: {1,2} (Can route through vertices 1 or 2) │ │ Add vertex 3 ▼Level 3: {1,2,3} (Can route through vertices 1, 2, or 3) │ │ Add vertex 4 ▼Level 4: {1,2,3,4} (All intermediates available - FINAL) TRANSITION FROM LEVEL k-1 TO LEVEL k: For every pair (i,j): Current best: d^(k-1)[i][j] ───┐ │ ├──→ min ──→ d^(k)[i][j] │ Path via k: d^(k-1)[i][k] │ + ────┘ d^(k-1)[k][j] Each transition considers V² pairs, making 1 comparison each.Total transitions: V levels × V² comparisons = V³ operations.The Power of Incremental Expansion:
By the time we consider vertex k as an intermediate, we've already optimized all paths using vertices 1 through k-1. This means:
So when we check d[i][k] + d[k][j], we're checking the best possible path through k, not just one arbitrary path. This is why the algorithm is correct—each phase builds on fully optimized previous phases.
Why Order Doesn't Matter Within a Phase:
Within a single iteration k, the order in which we process (i, j) pairs doesn't matter. Each update uses values from row k and column k, which (as we proved) don't change during iteration k. So whether we process (1, 2) before (3, 4) or vice versa, we get the same result.
We've thoroughly explored the dynamic programming mechanics that power Floyd-Warshall. Let's consolidate the key insights:
What's Next:
With the dynamic programming foundation solid, we'll analyze the algorithm's complexity in detail. The next page examines the O(V³) time complexity—why it's optimal for this approach, how constants matter in practice, and the O(V²) space requirement. We'll also discuss how this compares to alternative approaches for different graph densities.
You now understand the dynamic programming paradigm as applied to Floyd-Warshall—the state definition, recurrence derivation, in-place optimization, and complete algorithm. Next, we'll analyze exactly why this runs in O(V³) time.