Loading content...
Given the astronomical number of possible join orderings, how do real database systems find optimal or near-optimal plans? The answer lies in a beautiful application of dynamic programming—a technique that transforms an apparently intractable problem into one solvable in polynomial time (within restricted search spaces).
The core insight dates back to IBM's System R project in the 1970s, and the algorithm remains foundational to virtually every commercial and open-source database system today. Understanding this algorithm reveals not just how optimizers work, but why they exhibit particular behaviors—why some query forms optimize well while others struggle.
By the end of this page, you will understand how dynamic programming transforms join ordering from exponential enumeration to efficient computation, how the principle of optimality enables this transformation, and how production systems implement these ideas. You'll be able to trace through the algorithm and understand its practical implications.
Dynamic programming works when a problem exhibits optimal substructure—when the optimal solution to the whole problem contains optimal solutions to its subproblems. For join ordering, this means:
If the optimal ordering for joining {A, B, C, D} pairs (AB, CD) and then joins these results, then AB must be the optimal ordering for {A, B} and CD must be the optimal ordering for {C, D}.
This principle is profound. It means we don't need to re-evaluate how to join A and B for every possible ordering of the larger query. We solve the subproblem once, record the optimal plan for {A, B}, and reuse it whenever we consider plans involving this subset.
The number of subsets of n tables is 2^n, while the number of orderings is super-exponential. By reasoning about subsets rather than orderings, dynamic programming reduces exponential complexity to (manageable) exponential complexity—a dramatic improvement.
Why does optimal substructure hold for join ordering?
The total cost of a join plan is the sum of costs for each join operation, where each join's cost depends on:
Critically, the cost of joining AB with CD depends only on the cardinality and properties of AB and CD—not on how they were internally constructed. This independence makes optimal substructure valid.
The recurrence relation:
Let OPT(S) be the optimal cost for joining all tables in set S. Then:
OPT(S) = min over all partitions (S1, S2) of S:
OPT(S1) + OPT(S2) + cost(join(S1, S2))
This recurrence captures the essence of the algorithm: try all ways to split S into two parts, recursively optimize each part, add the cost of the final join, and keep the minimum.
IBM's System R introduced the foundational dynamic programming approach for join ordering in the late 1970s. The algorithm proceeds in phases, building larger optimal plans from smaller ones:
1234567891011121314151617181920212223242526272829303132
function SystemRJoinOptimization(tables: Set<Table>): n = |tables| // Phase 1: Single-table access plans for each table T in tables: optPlan[{T}] = best access plan for T // Consider: table scan, index scan, index-only scan // Record: cost, cardinality, interesting orderings // Phase 2: Build optimal plans for increasingly larger subsets for size = 2 to n: for each subset S of tables where |S| = size: optPlan[S] = infinity // Try all ways to partition S into (S1, join partner) for each proper subset S1 of S: S2 = S - S1 // Skip if no join predicate connects S1 and S2 if not connectedByJoinPredicate(S1, S2): continue // Try joining S1 with S2 for each join algorithm in {NestedLoop, HashJoin, MergeJoin}: cost = optPlan[S1].cost + optPlan[S2].cost + joinCost(optPlan[S1], optPlan[S2], algorithm) if cost < optPlan[S].cost: optPlan[S] = makePlan(optPlan[S1], optPlan[S2], algorithm) // Phase 3: Return optimal plan for full table set return optPlan[tables]Key observations about the algorithm:
The algorithm evaluates O(3^n) subset pairs (each subset partitioned in all ways), computes join costs for each pair and each algorithm, yielding O(3^n × |algorithms|) total evaluations. While still exponential, this is far better than evaluating all (2n-2)!/((n-1)!) orderings directly. For 10 tables: 3^10 ≈ 59,000 vs 17 billion orderings.
Let's trace through the algorithm for a 4-table query joining tables A, B, C, and D. Assume the following statistics and join conditions:
| Table | Rows | Best Access Cost | Output Cardinality |
|---|---|---|---|
| A | 10,000 | 100 | 10,000 |
| B | 50,000 | 500 | 50,000 |
| C | 5,000 | 50 | 5,000 |
| D | 1,000 | 10 | 1,000 |
Phase 1: Single-table plans
optPlan[{A}] = cost: 100, cardinality: 10,000
optPlan[{B}] = cost: 500, cardinality: 50,000
optPlan[{C}] = cost: 50, cardinality: 5,000
optPlan[{D}] = cost: 10, cardinality: 1,000
Phase 2a: Two-table plans
We only consider pairs with join predicates:
{A,B}: A.id = B.a_id
A ⋈ B: 100 + 500 + join(10K, 50K) → cost: 2,600, card: 10,000
B ⋈ A: 500 + 100 + join(50K, 10K) → cost: 2,100, card: 10,000
optPlan[{A,B}] = B ⋈ A, cost: 2,100
{B,C}: B.id = C.b_id
B ⋈ C: 500 + 50 + join(50K, 5K) → cost: 1,550, card: 5,000
C ⋈ B: 50 + 500 + join(5K, 50K) → cost: 800, card: 5,000
optPlan[{B,C}] = C ⋈ B, cost: 800
{C,D}: C.id = D.c_id
C ⋈ D: 50 + 10 + join(5K, 1K) → cost: 160, card: 1,000
D ⋈ C: 10 + 50 + join(1K, 5K) → cost: 110, card: 1,000
optPlan[{C,D}] = D ⋈ C, cost: 110
Phase 2b: Three-table plans
{A,B,C}: Must connect A-B and B-C
{A,B} ⋈ C: 2,100 + 50 + join(10K, 5K) → cost: 2,650, card: 5,000
{B,C} ⋈ A: 800 + 100 + join(5K, 10K) → cost: 1,400, card: 5,000
optPlan[{A,B,C}] = {B,C} ⋈ A, cost: 1,400
{B,C,D}: Must connect B-C and C-D
{B,C} ⋈ D: 800 + 10 + join(5K, 1K) → cost: 860, card: 1,000
{C,D} ⋈ B: 110 + 500 + join(1K, 50K) → cost: 660, card: 1,000
optPlan[{B,C,D}] = {C,D} ⋈ B, cost: 660
Phase 2c: Four-table plans (final)
{A,B,C,D}:
{A,B,C} ⋈ D: 1,400 + 10 + join(5K, 1K) → cost: 1,510
{B,C,D} ⋈ A: 660 + 100 + join(1K, 10K) → cost: 860
{A,B} ⋈ {C,D}: 2,100 + 110 + join(10K, 1K) → cost: 2,310
optPlan[{A,B,C,D}] = {B,C,D} ⋈ A, cost: 860
The optimal order is: ((D ⋈ C) ⋈ B) ⋈ A. We start with the smallest tables (D and C), progressively adding larger tables. This minimizes intermediate result sizes at each step—exactly the pattern we identified as crucial in the previous pages.
The basic algorithm tracks only the cheapest plan for each subset. But sometimes a slightly more expensive plan produces sorted output that eliminates a later sort operation—making the overall query cheaper.
System R introduced the concept of interesting orderings to handle this:
With interesting orderings, the DP state becomes (subset, ordering) pairs rather than just subsets. For each subset, we track not just the cheapest plan, but the cheapest plan for each interesting ordering. This multiplies state space by the number of orderings—typically manageable since interesting orderings are few.
Example: When interesting orderings matter
Consider:
SELECT * FROM A JOIN B ON A.x = B.x ORDER BY A.x
Two plans might exist for the join:
Without tracking interesting orderings, the optimizer chooses the hash join (cost 1000 < 1100) and adds a sort, yielding total cost 1200. With interesting orderings, it recognizes the merge join's 'sorted by A.x' property saves 200 in sort costs, yielding optimal total cost 1100.
The key insight: A locally suboptimal choice can be globally optimal when physical properties propagate through the query plan.
| Approach | Local Choice | Sort Cost | Total Cost | Optimal? |
|---|---|---|---|---|
| Basic DP | Hash join (1000) | +200 | 1200 | No |
| With Interesting Orderings | Merge join (1100) | +0 | 1100 | Yes |
When restricting to left-deep plans, the dynamic programming approach simplifies. Instead of considering all subset partitions, we only consider extensions by single tables:
12345678910111213141516171819202122232425262728
function LeftDeepDP(tables: Set<Table>): n = |tables| // Initialize single-table plans for each table T in tables: optPlan[{T}] = best access plan for T // Build left-deep plans by extending with one table at a time for size = 2 to n: for each subset S of tables where |S| = size: optPlan[S] = infinity // Try extending each (size-1) subset with one new table for each table T in S: S' = S - {T} // The "left" part if not connectedByJoinPredicate(S', T): continue for each join algorithm in {NestedLoop, HashJoin, MergeJoin}: // Left-deep: intermediate ⋈ base table cost = optPlan[S'].cost + accessCost(T) + joinCost(optPlan[S'], T, algorithm) if cost < optPlan[S].cost: optPlan[S] = makePlan(optPlan[S'], T, algorithm) return optPlan[tables]Complexity comparison:
| Approach | Number of Subsets | Partitions/Extensions per Subset | Total Evaluations |
|---|---|---|---|
| General DP (bushy) | 2^n | 2^(size-1) partitions | O(3^n) |
| Left-deep DP | 2^n | S |
The left-deep restriction reduces complexity from O(3^n) to O(n × 2^n), a significant improvement. For n=10:
This 6x reduction compounds with pruning to make left-deep enumeration substantially faster.
Even with left-deep restriction, we enumerate subsets rather than permutations because not all permutations are valid—the join graph constrains which tables can be added next. Subset-based DP naturally handles these constraints; we simply skip extensions that lack join predicates.
Even with dynamic programming's efficiency gains, evaluating all subset pairs becomes costly for large queries. Pruning strategies reduce the search space by eliminating provably suboptimal options early:
Aggressive pruning can accidentally eliminate the optimal plan if pruning conditions are too strict or if cost estimates are inaccurate. Most systems err on the side of keeping more plans, accepting some redundant work to ensure good plans aren't missed.
Branch-and-bound in action:
Suppose we're optimizing a 6-table join and a greedy algorithm quickly finds a plan with cost 10,000. We use this as our upper bound.
Evaluating subset {A,B,C}:
Best plan so far: cost 8,000
Trying to extend to {A,B,C,D}:
Extension cost ≥ 3,000
Total ≥ 11,000 > 10,000 (bound)
→ PRUNE this branch
By establishing an upper bound early and pruning branches that can't improve it, we often evaluate only a small fraction of the full search space.
Effectiveness depends on:
Dynamic programming provides strong optimality guarantees—but these guarantees come with important caveats that practitioners must understand:
| Guarantee | Scope | Limitation |
|---|---|---|
| Finds optimal ordering | Within search space | May miss better orderings outside space (e.g., bushy if restricted to left-deep) |
| No redundant evaluation | Each subproblem solved once | Memory required for memoization can be significant |
| Polynomial in subsets | O(3^n) for n tables | Still exponential; impractical for n > 15-20 |
| Considers all join algorithms | If modeled in cost function | Unusual or exotic algorithms may not be considered |
| Handles arbitrary cost models | Any additive cost function | Non-additive costs (e.g., parallel resources) may violate assumptions |
The estimation dependency:
Perhaps the most critical limitation is that DP finds the optimal ordering according to the cost model's estimates. If cardinality estimates are wrong, the 'optimal' plan may be far from truly optimal.
Consider a join where:
The optimizer positions this join early (thinking it reduces data), but at runtime it explodes intermediate result sizes. No amount of algorithmic sophistication can overcome fundamentally wrong estimates.
Research consistently shows that estimation errors—not algorithmic limitations—are the primary cause of suboptimal join orderings in practice. A simple algorithm with good estimates often outperforms a sophisticated algorithm with poor estimates. This drives ongoing research into adaptive query processing and learned optimizers.
Real database systems implement variations of the System R algorithm with numerous engineering refinements. Here's how major systems approach join optimization:
geqo_threshold.Most databases expose configuration parameters for join optimization behavior: search depth limits, time budgets, algorithm selection, and parallelism thresholds. Understanding these parameters lets DBAs tune the optimization/execution time tradeoff for their workloads.
Common implementation patterns:
Memo-based optimization — Store optimal plans in a 'memo' structure indexed by (subset, physical properties). This generalizes the optPlan array to handle complex plan spaces.
Group-based enumeration — Represent logically equivalent expressions as 'groups' and enumerate physical implementations per group. This is the foundation of the Cascades/Volcano optimization framework.
Batch cost computations — Evaluate multiple plan alternatives in batch to amortize cost model overhead and enable vectorized cost comparisons.
Lazy enumeration — Don't enumerate all subsets upfront; generate them on demand as the search progresses. Reduces memory pressure and enables early termination.
Hybrid strategies — Use exhaustive DP for the first k tables (where k ≈ 8-10), then use heuristics to extend to the remaining tables. Balances optimality with tractability.
We've explored the foundational techniques for finding optimal join orderings. Let's consolidate the key insights:
What's next:
Dynamic programming works well for moderate-sized queries but struggles with very large join counts. The next page explores heuristic approaches—greedy algorithms, genetic algorithms, and rule-based systems—that trade optimality guarantees for scalability to arbitrary query sizes.
You now understand how dynamic programming transforms join ordering from intractable enumeration to efficient computation. The principle of optimality, bottom-up subset construction, interesting orderings, and pruning strategies together form the foundation of virtually all production query optimizers. Next, we'll explore what happens when even DP isn't enough.