Join Ordering - Learning Module

Loading content...

0/241

Optimal Join Order

Finding the Needle in the Exponential Haystack

Given the astronomical number of possible join orderings, how do real database systems find optimal or near-optimal plans? The answer lies in a beautiful application of dynamic programming—a technique that transforms an apparently intractable problem into one solvable in polynomial time (within restricted search spaces).

The core insight dates back to IBM's System R project in the 1970s, and the algorithm remains foundational to virtually every commercial and open-source database system today. Understanding this algorithm reveals not just how optimizers work, but why they exhibit particular behaviors—why some query forms optimize well while others struggle.

What You Will Learn

By the end of this page, you will understand how dynamic programming transforms join ordering from exponential enumeration to efficient computation, how the principle of optimality enables this transformation, and how production systems implement these ideas. You'll be able to trace through the algorithm and understand its practical implications.

The Principle of Optimality

Dynamic programming works when a problem exhibits optimal substructure—when the optimal solution to the whole problem contains optimal solutions to its subproblems. For join ordering, this means:

If the optimal ordering for joining {A, B, C, D} pairs (AB, CD) and then joins these results, then AB must be the optimal ordering for {A, B} and CD must be the optimal ordering for {C, D}.

This principle is profound. It means we don't need to re-evaluate how to join A and B for every possible ordering of the larger query. We solve the subproblem once, record the optimal plan for {A, B}, and reuse it whenever we consider plans involving this subset.

The Key Insight

The number of subsets of n tables is 2^n, while the number of orderings is super-exponential. By reasoning about subsets rather than orderings, dynamic programming reduces exponential complexity to (manageable) exponential complexity—a dramatic improvement.

Why does optimal substructure hold for join ordering?

The total cost of a join plan is the sum of costs for each join operation, where each join's cost depends on:

The sizes of its two input relations
The join algorithm used
The physical properties of inputs (sorted, indexed, etc.)

Critically, the cost of joining AB with CD depends only on the cardinality and properties of AB and CD—not on how they were internally constructed. This independence makes optimal substructure valid.

The recurrence relation:

Let OPT(S) be the optimal cost for joining all tables in set S. Then:

OPT(S) = min over all partitions (S1, S2) of S:
         OPT(S1) + OPT(S2) + cost(join(S1, S2))

This recurrence captures the essence of the algorithm: try all ways to split S into two parts, recursively optimize each part, add the cost of the final join, and keep the minimum.

The System R Dynamic Programming Algorithm

IBM's System R introduced the foundational dynamic programming approach for join ordering in the late 1970s. The algorithm proceeds in phases, building larger optimal plans from smaller ones:

System R Join Ordering Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
function SystemRJoinOptimization(tables: Set<Table>):
    n = |tables|
    
    // Phase 1: Single-table access plans
    for each table T in tables:
        optPlan[{T}] = best access plan for T
        // Consider: table scan, index scan, index-only scan
        // Record: cost, cardinality, interesting orderings
    
    // Phase 2: Build optimal plans for increasingly larger subsets
    for size = 2 to n:
        for each subset S of tables where |S| = size:
            optPlan[S] = infinity
            
            // Try all ways to partition S into (S1, join partner)
            for each proper subset S1 of S:
                S2 = S - S1
                
                // Skip if no join predicate connects S1 and S2
                if not connectedByJoinPredicate(S1, S2):
                    continue
                
                // Try joining S1 with S2
                for each join algorithm in {NestedLoop, HashJoin, MergeJoin}:
                    cost = optPlan[S1].cost + optPlan[S2].cost 
                           + joinCost(optPlan[S1], optPlan[S2], algorithm)
                    
                    if cost < optPlan[S].cost:
                        optPlan[S] = makePlan(optPlan[S1], optPlan[S2], algorithm)
    
    // Phase 3: Return optimal plan for full table set
    return optPlan[tables]

Key observations about the algorithm:

Bottom-up construction — We start with single tables and build up to the full set
Memoization — Each subset is evaluated once; results are cached
Subset enumeration — The outer loop iterates over 2^n subsets
Partition consideration — For each subset, we try all ways to split it
Algorithm selection — Join ordering interleaves with join algorithm choice

Complexity Analysis

The algorithm evaluates O(3^n) subset pairs (each subset partitioned in all ways), computes join costs for each pair and each algorithm, yielding O(3^n × |algorithms|) total evaluations. While still exponential, this is far better than evaluating all (2n-2)!/((n-1)!) orderings directly. For 10 tables: 3^10 ≈ 59,000 vs 17 billion orderings.

A Worked Example

Let's trace through the algorithm for a 4-table query joining tables A, B, C, and D. Assume the following statistics and join conditions:

A-B join predicate exists (A.id = B.a_id)
B-C join predicate exists (B.id = C.b_id)
C-D join predicate exists (C.id = D.c_id)

Base Table Statistics
Table	Rows	Best Access Cost	Output Cardinality
A	10,000	100	10,000
B	50,000	500	50,000
C	5,000	50	5,000
D	1,000	10	1,000

Phase 1: Single-table plans

optPlan[{A}] = cost: 100, cardinality: 10,000
optPlan[{B}] = cost: 500, cardinality: 50,000  
optPlan[{C}] = cost: 50, cardinality: 5,000
optPlan[{D}] = cost: 10, cardinality: 1,000

Phase 2a: Two-table plans

We only consider pairs with join predicates:

{A,B}: A.id = B.a_id
  A ⋈ B: 100 + 500 + join(10K, 50K) → cost: 2,600, card: 10,000
  B ⋈ A: 500 + 100 + join(50K, 10K) → cost: 2,100, card: 10,000
  optPlan[{A,B}] = B ⋈ A, cost: 2,100

{B,C}: B.id = C.b_id  
  B ⋈ C: 500 + 50 + join(50K, 5K) → cost: 1,550, card: 5,000
  C ⋈ B: 50 + 500 + join(5K, 50K) → cost: 800, card: 5,000
  optPlan[{B,C}] = C ⋈ B, cost: 800

{C,D}: C.id = D.c_id
  C ⋈ D: 50 + 10 + join(5K, 1K) → cost: 160, card: 1,000
  D ⋈ C: 10 + 50 + join(1K, 5K) → cost: 110, card: 1,000
  optPlan[{C,D}] = D ⋈ C, cost: 110

Phase 2b: Three-table plans

{A,B,C}: Must connect A-B and B-C
  {A,B} ⋈ C: 2,100 + 50 + join(10K, 5K) → cost: 2,650, card: 5,000
  {B,C} ⋈ A: 800 + 100 + join(5K, 10K) → cost: 1,400, card: 5,000
  optPlan[{A,B,C}] = {B,C} ⋈ A, cost: 1,400

{B,C,D}: Must connect B-C and C-D
  {B,C} ⋈ D: 800 + 10 + join(5K, 1K) → cost: 860, card: 1,000
  {C,D} ⋈ B: 110 + 500 + join(1K, 50K) → cost: 660, card: 1,000
  optPlan[{B,C,D}] = {C,D} ⋈ B, cost: 660

Phase 2c: Four-table plans (final)

{A,B,C,D}:
  {A,B,C} ⋈ D: 1,400 + 10 + join(5K, 1K) → cost: 1,510
  {B,C,D} ⋈ A: 660 + 100 + join(1K, 10K) → cost: 860
  {A,B} ⋈ {C,D}: 2,100 + 110 + join(10K, 1K) → cost: 2,310
  
  optPlan[{A,B,C,D}] = {B,C,D} ⋈ A, cost: 860

The Winning Plan

The optimal order is: ((D ⋈ C) ⋈ B) ⋈ A. We start with the smallest tables (D and C), progressively adding larger tables. This minimizes intermediate result sizes at each step—exactly the pattern we identified as crucial in the previous pages.

Interesting Orderings Extension

The basic algorithm tracks only the cheapest plan for each subset. But sometimes a slightly more expensive plan produces sorted output that eliminates a later sort operation—making the overall query cheaper.

System R introduced the concept of interesting orderings to handle this:

What Makes an Ordering Interesting?

•GROUP BY columns — If the query groups by certain columns, a plan producing sorted output on those columns eliminates a separate sort
•ORDER BY columns — Similar benefit for final result ordering
•Merge join inputs — Merge joins require sorted inputs; sorted intermediate results enable merge joins later in the plan
•DISTINCT processing — Sorted output enables streaming duplicate elimination
•Set operations — UNION, INTERSECT, EXCEPT benefit from sorted inputs

Extended DP State

With interesting orderings, the DP state becomes (subset, ordering) pairs rather than just subsets. For each subset, we track not just the cheapest plan, but the cheapest plan for each interesting ordering. This multiplies state space by the number of orderings—typically manageable since interesting orderings are few.

Example: When interesting orderings matter

Consider:

SELECT * FROM A JOIN B ON A.x = B.x ORDER BY A.x

Two plans might exist for the join:

Hash join: Cost 1000, unsorted output → Requires sort, total cost 1200
Merge join: Cost 1100, sorted on A.x → No sort needed, total cost 1100

Without tracking interesting orderings, the optimizer chooses the hash join (cost 1000 < 1100) and adds a sort, yielding total cost 1200. With interesting orderings, it recognizes the merge join's 'sorted by A.x' property saves 200 in sort costs, yielding optimal total cost 1100.

The key insight: A locally suboptimal choice can be globally optimal when physical properties propagate through the query plan.

Plan Selection With and Without Interesting Orderings
Approach	Local Choice	Sort Cost	Total Cost	Optimal?
Basic DP	Hash join (1000)	+200	1200	No
With Interesting Orderings	Merge join (1100)	+0	1100	Yes

Dynamic Programming for Left-Deep Plans

When restricting to left-deep plans, the dynamic programming approach simplifies. Instead of considering all subset partitions, we only consider extensions by single tables:

Left-Deep DP Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
function LeftDeepDP(tables: Set<Table>):
    n = |tables|
    
    // Initialize single-table plans
    for each table T in tables:
        optPlan[{T}] = best access plan for T
    
    // Build left-deep plans by extending with one table at a time
    for size = 2 to n:
        for each subset S of tables where |S| = size:
            optPlan[S] = infinity
            
            // Try extending each (size-1) subset with one new table
            for each table T in S:
                S' = S - {T}  // The "left" part
                
                if not connectedByJoinPredicate(S', T):
                    continue
                
                for each join algorithm in {NestedLoop, HashJoin, MergeJoin}:
                    // Left-deep: intermediate ⋈ base table
                    cost = optPlan[S'].cost + accessCost(T) + 
                           joinCost(optPlan[S'], T, algorithm)
                    
                    if cost < optPlan[S].cost:
                        optPlan[S] = makePlan(optPlan[S'], T, algorithm)
    
    return optPlan[tables]

Complexity comparison:

Approach	Number of Subsets	Partitions/Extensions per Subset	Total Evaluations
General DP (bushy)	2^n	2^(size-1) partitions	O(3^n)
Left-deep DP	2^n		S

The left-deep restriction reduces complexity from O(3^n) to O(n × 2^n), a significant improvement. For n=10:

General DP: ~59,000 evaluations
Left-deep DP: ~10,000 evaluations

This 6x reduction compounds with pruning to make left-deep enumeration substantially faster.

Why Left-Deep Still Uses Subset Enumeration

Even with left-deep restriction, we enumerate subsets rather than permutations because not all permutations are valid—the join graph constrains which tables can be added next. Subset-based DP naturally handles these constraints; we simply skip extensions that lack join predicates.

Pruning Strategies

Even with dynamic programming's efficiency gains, evaluating all subset pairs becomes costly for large queries. Pruning strategies reduce the search space by eliminating provably suboptimal options early:

Common Pruning Techniques

•Cost bounding — Maintain a global upper bound (from a quickly-found heuristic plan). Abandon any partial plan exceeding this bound; it can't lead to an optimal solution.
•Dominance pruning — If plan P1 has lower cost AND better physical properties than P2, discard P2 entirely. P1 'dominates' P2.
•Cross-product avoidance — Never consider joining subsets without shared join predicates unless all other options are exhausted. Cross products are almost always suboptimal.
•Selectivity-based ordering — Evaluate more-selective joins first in the search; they're more likely to be in the optimal plan and provide tighter cost bounds.
•Memory-based pruning — Prune plans requiring more memory than available; they would spill to disk and likely be suboptimal.

Pruning Trade-offs

Aggressive pruning can accidentally eliminate the optimal plan if pruning conditions are too strict or if cost estimates are inaccurate. Most systems err on the side of keeping more plans, accepting some redundant work to ensure good plans aren't missed.

Branch-and-bound in action:

Suppose we're optimizing a 6-table join and a greedy algorithm quickly finds a plan with cost 10,000. We use this as our upper bound.

Evaluating subset {A,B,C}:
  Best plan so far: cost 8,000
  
  Trying to extend to {A,B,C,D}:
    Extension cost ≥ 3,000
    Total ≥ 11,000 > 10,000 (bound)
    → PRUNE this branch

By establishing an upper bound early and pruning branches that can't improve it, we often evaluate only a small fraction of the full search space.

Effectiveness depends on:

Quality of the initial bound (tighter is better)
Accuracy of cost estimates (inaccurate estimates cause false pruning)
Search order (evaluating promising plans first tightens bounds faster)

Optimality Guarantees and Limitations

Dynamic programming provides strong optimality guarantees—but these guarantees come with important caveats that practitioners must understand:

What DP Guarantees (and Doesn't)
Guarantee	Scope	Limitation
Finds optimal ordering	Within search space	May miss better orderings outside space (e.g., bushy if restricted to left-deep)
No redundant evaluation	Each subproblem solved once	Memory required for memoization can be significant
Polynomial in subsets	O(3^n) for n tables	Still exponential; impractical for n > 15-20
Considers all join algorithms	If modeled in cost function	Unusual or exotic algorithms may not be considered
Handles arbitrary cost models	Any additive cost function	Non-additive costs (e.g., parallel resources) may violate assumptions

The estimation dependency:

Perhaps the most critical limitation is that DP finds the optimal ordering according to the cost model's estimates. If cardinality estimates are wrong, the 'optimal' plan may be far from truly optimal.

Consider a join where:

Estimated cardinality: 1,000 rows
Actual cardinality: 1,000,000 rows

The optimizer positions this join early (thinking it reduces data), but at runtime it explodes intermediate result sizes. No amount of algorithmic sophistication can overcome fundamentally wrong estimates.

The Estimation Paradox

Research consistently shows that estimation errors—not algorithmic limitations—are the primary cause of suboptimal join orderings in practice. A simple algorithm with good estimates often outperforms a sophisticated algorithm with poor estimates. This drives ongoing research into adaptive query processing and learned optimizers.

When DP Finds Suboptimal Plans

•Cardinality estimation errors — The most common cause; off-by-orders-of-magnitude estimates select wrong orderings
•Search space restrictions — Left-deep restriction excludes potentially better bushy plans
•Missing statistics — Without histograms, optimizers assume uniform distribution—often wrong
•Correlated columns — Independence assumption fails when join columns are correlated
•Complex predicates — UDFs, LIKE patterns, and complex expressions resist accurate estimation

Production Implementation Details

Real database systems implement variations of the System R algorithm with numerous engineering refinements. Here's how major systems approach join optimization:

System-Specific Implementations

•PostgreSQL — Uses GEQO (Genetic Query Optimizer) for queries with more than 12 tables, falling back to exhaustive DP for smaller queries. The threshold is configurable via geqo_threshold.
•MySQL — Employs greedy search with limited lookahead. For small join counts, it approaches exhaustive; for larger counts, it becomes increasingly heuristic.
•SQL Server — Sophisticated DP implementation with memo-based optimization. Uses bushy trees when parallelism is beneficial. Extensive cost model with histogram-based estimates.
•Oracle — Adaptive optimization with dynamic plan switching. Can modify join orders at runtime based on actual cardinalities observed during execution.
•SQLite — Simple greedy ordering for most queries, optimized for lightweight embedded use. Prioritizes query start time over absolute plan quality.

Configuration Matters

Most databases expose configuration parameters for join optimization behavior: search depth limits, time budgets, algorithm selection, and parallelism thresholds. Understanding these parameters lets DBAs tune the optimization/execution time tradeoff for their workloads.

Common implementation patterns:

Memo-based optimization — Store optimal plans in a 'memo' structure indexed by (subset, physical properties). This generalizes the optPlan array to handle complex plan spaces.
Group-based enumeration — Represent logically equivalent expressions as 'groups' and enumerate physical implementations per group. This is the foundation of the Cascades/Volcano optimization framework.
Batch cost computations — Evaluate multiple plan alternatives in batch to amortize cost model overhead and enable vectorized cost comparisons.
Lazy enumeration — Don't enumerate all subsets upfront; generate them on demand as the search progresses. Reduces memory pressure and enables early termination.
Hybrid strategies — Use exhaustive DP for the first k tables (where k ≈ 8-10), then use heuristics to extend to the remaining tables. Balances optimality with tractability.

Summary: Finding Optimal Join Orders

We've explored the foundational techniques for finding optimal join orderings. Let's consolidate the key insights:

Key Takeaways

•Optimal substructure enables DP — The optimal plan for a table set contains optimal plans for all subsets, allowing memoized solutions.
•System R's algorithm remains foundational — Four decades later, most optimizers still build on its bottom-up subset enumeration approach.
•Complexity is O(3^n) for bushy, O(n×2^n) for left-deep — Dramatic improvement over factorial, but still exponential.
•Interesting orderings extend the state space — Physical properties like sorted output justify tracking multiple plans per subset.
•Pruning is essential — Cost bounding, dominance pruning, and cross-product avoidance dramatically reduce actual enumeration.
•DP guarantees are conditional — Optimality holds only within the search space and only given accurate estimates.
•Production systems use hybrid approaches — DP for small queries, heuristics for large queries, with tunable thresholds.

What's next:

Dynamic programming works well for moderate-sized queries but struggles with very large join counts. The next page explores heuristic approaches—greedy algorithms, genetic algorithms, and rule-based systems—that trade optimality guarantees for scalability to arbitrary query sizes.

Page Complete

You now understand how dynamic programming transforms join ordering from intractable enumeration to efficient computation. The principle of optimality, bottom-up subset construction, interesting orderings, and pruning strategies together form the foundation of virtually all production query optimizers. Next, we'll explore what happens when even DP isn't enough.