Database Management SystemsJoin Ordering

Join Ordering in Query Optimization

LevelAdvanced

Duration60 mins

TopicJoin Ordering

2 / 5

Exponential Possibilities

The Combinatorial Nightmare

Here's a question that seems simple but has profound implications: How many different ways can you order 10 table joins?

If your intuition says '10 factorial' or about 3.6 million, you're underestimating by several orders of magnitude. The actual number of distinct join orderings for 10 tables exceeds 17.6 trillion—more than 4 million times larger than a simple permutation count.

This exponential explosion lies at the heart of query optimization complexity. It explains why optimizers can't simply try all possibilities, why heuristics are essential, and why even sophisticated search algorithms must make difficult tradeoffs. Understanding this combinatorial landscape is crucial for anyone who wants to comprehend how database systems approach the join ordering problem.

What You Will Learn

By the end of this page, you will understand how the number of possible join orderings is calculated, why it grows faster than factorial, and what constraints optimizers impose to make the search space tractable. You'll develop intuition for the scale of the problem that query optimizers must solve.

Counting Join Trees

To understand why join orderings are so numerous, we first need to recognize what we're actually counting. A join ordering isn't just a sequence of tables—it's a binary tree structure where:

Each leaf node is a base table
Each internal node represents a join operation
The tree structure defines the order of join execution (bottom-up)

This tree perspective is essential because joins are binary operations—each join takes exactly two inputs. When joining 4 tables (A, B, C, D), we don't simply specify 'join A, then B, then C, then D.' We specify a particular tree structure, such as:

       ⋈           or          ⋈
      / \                      / \
     ⋈   D                    ⋈   ⋈
    / \                      / \ / \
   ⋈   C                    A  B C  D
  / \
 A   B

These two structures represent fundamentally different join plans—the left is a left-deep tree, while the right is a bushy tree.

Tree Structure Matters

The difference between tree structures isn't just academic. Left-deep trees allow pipelining intermediate results (no materialization), while bushy trees can exploit parallelism but may require materializing intermediate results. The optimizer must consider both the ordering and the tree shape.

The components of join ordering:

When counting distinct join orderings for n tables, we must account for:

Tree topology — The shape of the binary tree (how operations are grouped)
Leaf assignment — Which table goes where in the tree
Join commutativity — Left and right inputs to each join can be swapped

Each of these multiplies the search space. Let's examine them in detail.

Binary Trees and Catalan Numbers

The number of distinct binary tree topologies with n leaves is given by the (n-1)th Catalan number, denoted C_{n-1}:

C_n = (2n)! / ((n+1)! × n!)

Alternatively, using the binomial coefficient:

C_n = C(2n, n) / (n + 1)

The Catalan numbers grow rapidly but are themselves just one component of the total ordering count.

Catalan Numbers for Tree Topologies
Tables (n)	Tree Topologies (C_{n-1})	Growth Pattern
2	1	Base case
3	2	×2
4	5	×2.5
5	14	×2.8
6	42	×3.0
7	132	×3.14
8	429	×3.25
9	1,430	×3.33
10	4,862	×3.4
15	2,674,440	~4x per table added
20	1,767,263,190	Rapidly growing

Asymptotic behavior:

The Catalan numbers grow roughly as:

C_n ∼ 4^n / (n^{3/2} × √π)

This means tree topologies alone contribute exponential growth of approximately 4^n. But we're not done—we still need to assign tables to leaves.

Why Catalan Numbers Appear Here

Catalan numbers count many combinatorial structures: balanced parentheses, binary search tree shapes, non-crossing partitions, and more. Join trees are isomorphic to full binary trees, which is exactly what a Catalan number counts with n leaves and n-1 internal nodes.

The Complete Ordering Count

Now let's assemble the complete formula. For n tables:

Total orderings = Tree topologies × Leaf assignments × Commutativity options

Tree topologies: C_{n-1} (Catalan number)
Leaf assignments: n! (permutation of tables across leaves)
Commutativity: 2^{n-1} (each of n-1 joins can swap inputs)

Combining these:

Total orderings = C_{n-1} × n! × 2^{n-1}
                = C_{n-1} × n! × 2^{n-1}

Using the Catalan formula:

= [(2n-2)! / (n! × (n-1)!)] × n! × 2^{n-1}
= (2n-2)! × 2^{n-1} / (n-1)!

This simplifies to the product of all odd numbers from 1 to 2n-3, multiplied by n!, which grows extremely fast.

Total Join Orderings by Number of Tables
Tables (n)	Total Orderings	Scientific Notation	Time to Enumerate (1μs each)
2	2	2 × 10^0	2 microseconds
3	12	1.2 × 10^1	12 microseconds
4	120	1.2 × 10^2	120 microseconds
5	1,680	1.68 × 10^3	1.7 milliseconds
6	30,240	3.02 × 10^4	30 milliseconds
7	665,280	6.65 × 10^5	0.67 seconds
8	17,297,280	1.73 × 10^7	17 seconds
9	518,918,400	5.19 × 10^8	8.6 minutes
10	17,643,225,600	1.76 × 10^10	4.9 hours
15	~6.2 × 10^17	6.2 × 10^17	~20,000 years

Beyond Factorial Growth

Note that join orderings grow faster than n! because of the additional Catalan number and commutativity factors. While 10! = 3.6 million, actual 10-table orderings exceed 17 billion. This super-factorial growth is what makes exhaustive enumeration impossible for typical analytical queries.

Practical interpretation:

With modern hardware evaluating roughly 1 million orderings per second:

5 tables: Exhaustive search completes in ~1.7 milliseconds ✓
8 tables: Exhaustive search takes ~17 seconds — borderline acceptable
10 tables: Exhaustive search takes ~5 hours — unacceptable
12 tables: Exhaustive search would take months — impossible

Since analytical queries commonly join 10-20+ tables, exhaustive enumeration is fundamentally infeasible for real-world workloads.

Left-Deep versus Bushy Trees

One of the most important techniques for taming the combinatorial explosion is restricting the search space to specific tree shapes. The most common restriction is to consider only left-deep trees.

Left-Deep Trees:

The right input to every join (except the first) is a base table. The left input is always an intermediate result from a previous join.

Bushy Trees:

       ⋈
      / \
     ⋈   ⋈
    / \ / \
   A  B C  D

Intermediate results can appear on both sides of a join. Both subtrees are fully evaluated before the root join executes.

Left-Deep vs Bushy: Search Space Comparison
Tables (n)	Left-Deep Only	All Trees (Bushy)	Reduction Factor
4	24	120	5×
5	120	1,680	14×
6	720	30,240	42×
8	40,320	17,297,280	429×
10	3,628,800	17,643,225,600	4,862×
12	479,001,600	~2.6 × 10^13	~55,000×

Why restrict to left-deep?

The reduction from O((2n-2)!/((n-1)!)) to O(n!) is dramatic—often several orders of magnitude for queries joining 8+ tables. But is this restriction safe? What do we lose?

Advantages of left-deep restriction:

Pipelining — Left-deep plans enable streaming execution; intermediate results flow directly to the next join without materialization
Memory efficiency — Only one intermediate result exists at a time; no need to hold multiple partial results
Index nested-loop compatibility — Outer relation always comes from intermediate results, inner from base tables with indexes
Implementation simplicity — Easier to generate, compare, and execute

Disadvantages:

Lost parallelism — Bushy trees can execute independent subtrees in parallel
Potentially suboptimal — The globally optimal plan may be bushy; left-deep restriction may miss it
Large intermediate results — Some query shapes naturally favor bushy plans

The Practical Compromise

Most commercial database systems default to left-deep enumeration but may consider bushy plans under specific conditions: when explicit parallelism is requested, when subquery decorrelation creates bushy opportunities, or when advanced hints are provided. This hybrid approach balances optimization tractability with plan quality.

Other Tree Shape Restrictions

While left-deep trees are the most common restriction, other tree shapes have specific use cases and trade-offs.

Tree Shape Categories

•Right-Deep Trees — Mirror image of left-deep, where every left input is a base table. Right-deep plans are useful when hash joins are preferred: all build tables can be constructed first (base tables), then a single probe pass streams through all of them.
•Zig-Zag Trees — Alternate between left and right inputs being base tables. These combine some pipelining advantages with some hash join benefits, though they're rarely the explicit optimization target.
•Symmetric Trees — Perfectly balanced bushy trees. These minimize tree height (and thus maximum intermediate result lifetime) but are only possible for 2^k tables.
•Linear Trees — Either pure left-deep or pure right-deep. These are the most common restrictions because they reduce to simple permutation enumeration.

Tree Shape Trade-offs
Tree Shape	Enumeration Complexity	Execution Advantage	Best Use Case
Left-deep	n!	Pipelining, index NL joins	OLTP, indexed lookups
Right-deep	n!	Hash join cascades	Hash-join dominated plans
Bushy	(2n-2)!/((n-1)!)	Parallelism	Parallel execution, MPP
Balanced bushy	Special cases only	Minimal height	Perfectly sized queries

The hash join connection:

Right-deep trees have an interesting property for hash joins. In a hash join:

Build phase: Construct hash table from 'build' (inner) relation
Probe phase: Stream through 'probe' (outer) relation, matching against hash table

In a right-deep tree with all base tables on the left, all hash tables are built first (on base tables), then a single probe pass streams an intermediate result through the entire cascade. This can be very efficient when:

Hash tables fit in memory
Base tables are the smaller relations
Probe relations (intermediates) are large

However, this requires knowing which tables are small enough to build hash tables—which again depends on accurate cardinality estimates.

Modern Optimizer Flexibility

Sophisticated optimizers like those in PostgreSQL, SQL Server, and Oracle don't rigidly enforce tree shape restrictions. They may use left-deep enumeration as a baseline but consider bushy alternatives for specific subexpressions, especially when common table expressions, derived tables, or lateral joins are involved.

Complexity Reduction Techniques

Beyond tree shape restrictions, optimizers employ several additional techniques to manage the exponential search space:

Search Space Reduction Strategies

•Join graph constraints — Not all table pairs have join predicates. The 'join graph' captures which tables are directly joinable. Orderings that require Cartesian products are avoided or heavily penalized.
•Interesting orderings — Some join orderings produce sorted output useful for later operations (GROUP BY, ORDER BY, merge joins). These 'interesting' orderings may be preserved even if slightly costlier.
•Query decomposition — Large queries can sometimes be broken into independent subproblems, solved separately, then combined. This turns one huge search into several smaller ones.
•Cost-based pruning — Maintain a running upper bound on acceptable cost. Abandon any partial plan that already exceeds this bound, avoiding exploration of doomed branches.
•Heuristic filtering — Quickly reject orderings that violate known-good heuristics (e.g., always filter before joining, avoid Cartesian products except as last resort).
•Sampling and randomization — For very large search spaces, randomly sample orderings and use the best found within a time budget. This trades optimality for predictable optimization time.

The Join Graph Insight

Real queries rarely allow arbitrary orderings. If table A only joins with B and C, but not D, then any sensible ordering must connect A to either B or C before involving D. The join graph structure often reduces the effective search space far below theoretical maximums.

Join graph topology effects:

The structure of join predicates critically affects search space size:

Star schema: One central fact table joined to multiple dimension tables. The central table must be joined early, drastically limiting orderings.
Chain topology: Tables form a linear chain of joins (A-B-C-D). Only O(n) fundamentally different orderings exist.
Clique topology: Every table joins with every other table (rare in practice). This is the worst case—full exponential search space.
Snowflake schema: Multiple chains branching from a center. Search space is polynomial in chain lengths.

Most real-world schemas fall between these extremes, with partial connectivity that significantly constrains the effective search space.

Join Graph Topology and Search Space
Topology	Join Pattern	Effective Search Space	Example
Chain	A-B-C-D (linear)	O(n)	Time-series joins
Star	Center connected to all	O(n!)	Data warehouse facts/dims
Snowflake	Star with chains	Polynomial in chain count	Extended dim tables
Clique	All-to-all	Super-exponential	Rare/contrived

Visualizing the Search Space

To develop intuition for the scale of the join ordering problem, let's visualize how quickly it explodes.

Scale Comparisons for Join Ordering Complexity
Tables	Orderings	Physical Analogy
4	120	Cards in two decks
6	30,240	People in a sports stadium section
8	17 million	Population of a major metro area
10	17 billion	More than double Earth's population
12	26 trillion	100× US national debt in dollars
15	600 quadrillion	Grains of sand on all beaches
20	10^25	Stars in the observable universe × 1000

The Cliff at 10-12 Tables

Notice the transition around 10-12 tables. Below 8, optimization time is negligible. Between 8-10, optimization becomes noticeable but manageable. Above 12, exhaustive search transitions from 'impractical' to 'physically impossible.' This threshold shapes optimizer design decisions.

The pruning imperative:

Given these numbers, effective optimization requires pruning the search space by many orders of magnitude. Consider a 12-table query:

Total orderings: ~26 trillion
Evaluations possible in 1 second: ~1 million
Evaluation needed for coverage: 26 billion seconds (800+ years)

Yet real optimizers handle such queries in under a second. This is only possible through:

Dynamic programming — Each subproblem solved once, not re-evaluated per occurrence
Aggressive pruning — Most orderings never evaluated; known-worse options discarded early
Heuristic filtering — Orderings violating rules of thumb rejected without costing
Search limits — Hard caps on enumeration, accepting potentially suboptimal plans

The next pages will explore how optimizers achieve this remarkable efficiency.

Implications for Optimizer Design

The combinatorial explosion has profound implications for how query optimizers are designed and implemented:

Design Principles Driven by Combinatorics

•Optimization must be time-bounded — Unbounded search is unacceptable. Optimizers typically impose limits on enumeration count, elapsed time, or search depth.
•Heuristics are not optional — Even sophisticated algorithms need heuristic guidance to focus search on promising regions of the space.
•Optimality is not guaranteed — For large queries, optimizers find 'good' plans, not provably optimal ones. This is a fundamental engineering tradeoff.
•Different algorithms suit different scales — Dynamic programming for small queries (≤10 tables), randomized or greedy for large queries (15+ tables).
•Partial optimization is valuable — Even crude improvements over the worst ordering yield significant performance gains. Perfect is the enemy of good.

The Commercial Reality

Commercial databases like Oracle, SQL Server, and PostgreSQL all use hybrid approaches. They combine exhaustive enumeration for small subproblems (≤8-10 tables) with heuristic or randomized search for larger problems. Configuration parameters often control the tradeoff between optimization time and plan quality.

The optimization budget concept:

Modern optimizers often work with an 'optimization budget'—a limit on how many plans can be explored. This budget might be expressed as:

Maximum number of plan alternatives considered
Maximum optimization time allowed
Maximum memory for the search state

When the budget is exhausted, the optimizer returns the best plan found so far, even if better plans theoretically exist. This approach:

Guarantees bounded optimization time
Works well in practice (early pruning usually finds good plans quickly)
Allows tuning the optimization time vs plan quality tradeoff
Prevents catastrophic optimization overhead on pathological queries

Summary: Exponential Possibilities

We've explored the daunting combinatorial landscape of join ordering. Let's consolidate the key insights:

Key Takeaways

•Join orderings are binary trees — Each ordering specifies both the sequence of joins and the tree structure (left-deep, right-deep, or bushy).
•The count involves Catalan numbers — Tree topologies are counted by Catalan numbers, multiplied by permutations and commutativity choices.
•Growth is super-exponential — For n tables, orderings grow roughly as (2n-2)!/((n-1)!), far exceeding n!.
•Exhaustive search is infeasible — Beyond 10-12 tables, evaluating all orderings would take longer than the age of the universe.
•Tree shape restrictions help enormously — Limiting to left-deep trees reduces complexity from super-exponential to factorial.
•Join graph topology matters — Real schemas with constrained join relationships have much smaller effective search spaces.
•Optimizers use bounded search — Time limits, heuristics, and pruning make the problem tractable at the cost of guaranteed optimality.

What's next:

Understanding the size of the search space sets the stage for understanding how optimizers navigate it. The next page explores techniques for finding optimal join orders—particularly dynamic programming, which finds provably optimal orderings within restricted search spaces.

Page Complete

You now understand the combinatorial explosion underlying join ordering. The super-exponential growth in possible orderings—driven by tree topologies, permutations, and commutativity—explains why optimizers can't try everything and must rely on smart algorithms, heuristics, and search restrictions. Next, we'll explore how to find optimal orderings within these constraints.

2 / 5

Loading learning content...

Database Management SystemsJoin Ordering

Join Ordering in Query Optimization

LevelAdvanced

Duration60 mins

TopicJoin Ordering

2 / 5

Exponential Possibilities

The Combinatorial Nightmare

Here's a question that seems simple but has profound implications: How many different ways can you order 10 table joins?

What You Will Learn

Counting Join Trees

To understand why join orderings are so numerous, we first need to recognize what we're actually counting. A join ordering isn't just a sequence of tables—it's a binary tree structure where:

Each leaf node is a base table
Each internal node represents a join operation
The tree structure defines the order of join execution (bottom-up)

       ⋈           or          ⋈
      / \                      / \
     ⋈   D                    ⋈   ⋈
    / \                      / \ / \
   ⋈   C                    A  B C  D
  / \
 A   B

These two structures represent fundamentally different join plans—the left is a left-deep tree, while the right is a bushy tree.

Tree Structure Matters

The components of join ordering:

When counting distinct join orderings for n tables, we must account for:

Tree topology — The shape of the binary tree (how operations are grouped)
Leaf assignment — Which table goes where in the tree
Join commutativity — Left and right inputs to each join can be swapped

Each of these multiplies the search space. Let's examine them in detail.

Binary Trees and Catalan Numbers

The number of distinct binary tree topologies with n leaves is given by the (n-1)th Catalan number, denoted C_{n-1}:

C_n = (2n)! / ((n+1)! × n!)

Alternatively, using the binomial coefficient:

C_n = C(2n, n) / (n + 1)

The Catalan numbers grow rapidly but are themselves just one component of the total ordering count.

Catalan Numbers for Tree Topologies
Tables (n)	Tree Topologies (C_{n-1})	Growth Pattern
2	1	Base case
3	2	×2
4	5	×2.5
5	14	×2.8
6	42	×3.0
7	132	×3.14
8	429	×3.25
9	1,430	×3.33
10	4,862	×3.4
15	2,674,440	~4x per table added
20	1,767,263,190	Rapidly growing

Asymptotic behavior:

The Catalan numbers grow roughly as:

C_n ∼ 4^n / (n^{3/2} × √π)

This means tree topologies alone contribute exponential growth of approximately 4^n. But we're not done—we still need to assign tables to leaves.

Why Catalan Numbers Appear Here

The Complete Ordering Count

Now let's assemble the complete formula. For n tables:

Total orderings = Tree topologies × Leaf assignments × Commutativity options

Tree topologies: C_{n-1} (Catalan number)
Leaf assignments: n! (permutation of tables across leaves)
Commutativity: 2^{n-1} (each of n-1 joins can swap inputs)

Combining these:

Total orderings = C_{n-1} × n! × 2^{n-1}
                = C_{n-1} × n! × 2^{n-1}

Using the Catalan formula:

= [(2n-2)! / (n! × (n-1)!)] × n! × 2^{n-1}
= (2n-2)! × 2^{n-1} / (n-1)!

This simplifies to the product of all odd numbers from 1 to 2n-3, multiplied by n!, which grows extremely fast.

Total Join Orderings by Number of Tables
Tables (n)	Total Orderings	Scientific Notation	Time to Enumerate (1μs each)
2	2	2 × 10^0	2 microseconds
3	12	1.2 × 10^1	12 microseconds
4	120	1.2 × 10^2	120 microseconds
5	1,680	1.68 × 10^3	1.7 milliseconds
6	30,240	3.02 × 10^4	30 milliseconds
7	665,280	6.65 × 10^5	0.67 seconds
8	17,297,280	1.73 × 10^7	17 seconds
9	518,918,400	5.19 × 10^8	8.6 minutes
10	17,643,225,600	1.76 × 10^10	4.9 hours
15	~6.2 × 10^17	6.2 × 10^17	~20,000 years

Beyond Factorial Growth

Practical interpretation:

With modern hardware evaluating roughly 1 million orderings per second:

5 tables: Exhaustive search completes in ~1.7 milliseconds ✓
8 tables: Exhaustive search takes ~17 seconds — borderline acceptable
10 tables: Exhaustive search takes ~5 hours — unacceptable
12 tables: Exhaustive search would take months — impossible

Since analytical queries commonly join 10-20+ tables, exhaustive enumeration is fundamentally infeasible for real-world workloads.

Left-Deep versus Bushy Trees

Left-Deep Trees:

The right input to every join (except the first) is a base table. The left input is always an intermediate result from a previous join.

Bushy Trees:

       ⋈
      / \
     ⋈   ⋈
    / \ / \
   A  B C  D

Intermediate results can appear on both sides of a join. Both subtrees are fully evaluated before the root join executes.

Left-Deep vs Bushy: Search Space Comparison
Tables (n)	Left-Deep Only	All Trees (Bushy)	Reduction Factor
4	24	120	5×
5	120	1,680	14×
6	720	30,240	42×
8	40,320	17,297,280	429×
10	3,628,800	17,643,225,600	4,862×
12	479,001,600	~2.6 × 10^13	~55,000×

Why restrict to left-deep?

The reduction from O((2n-2)!/((n-1)!)) to O(n!) is dramatic—often several orders of magnitude for queries joining 8+ tables. But is this restriction safe? What do we lose?

Advantages of left-deep restriction:

Pipelining — Left-deep plans enable streaming execution; intermediate results flow directly to the next join without materialization
Memory efficiency — Only one intermediate result exists at a time; no need to hold multiple partial results
Index nested-loop compatibility — Outer relation always comes from intermediate results, inner from base tables with indexes
Implementation simplicity — Easier to generate, compare, and execute

Disadvantages:

Lost parallelism — Bushy trees can execute independent subtrees in parallel
Potentially suboptimal — The globally optimal plan may be bushy; left-deep restriction may miss it
Large intermediate results — Some query shapes naturally favor bushy plans

The Practical Compromise

Other Tree Shape Restrictions

While left-deep trees are the most common restriction, other tree shapes have specific use cases and trade-offs.

Tree Shape Categories

•Right-Deep Trees — Mirror image of left-deep, where every left input is a base table. Right-deep plans are useful when hash joins are preferred: all build tables can be constructed first (base tables), then a single probe pass streams through all of them.
•Zig-Zag Trees — Alternate between left and right inputs being base tables. These combine some pipelining advantages with some hash join benefits, though they're rarely the explicit optimization target.
•Symmetric Trees — Perfectly balanced bushy trees. These minimize tree height (and thus maximum intermediate result lifetime) but are only possible for 2^k tables.
•Linear Trees — Either pure left-deep or pure right-deep. These are the most common restrictions because they reduce to simple permutation enumeration.

Tree Shape Trade-offs
Tree Shape	Enumeration Complexity	Execution Advantage	Best Use Case
Left-deep	n!	Pipelining, index NL joins	OLTP, indexed lookups
Right-deep	n!	Hash join cascades	Hash-join dominated plans
Bushy	(2n-2)!/((n-1)!)	Parallelism	Parallel execution, MPP
Balanced bushy	Special cases only	Minimal height	Perfectly sized queries

The hash join connection:

Right-deep trees have an interesting property for hash joins. In a hash join:

Build phase: Construct hash table from 'build' (inner) relation
Probe phase: Stream through 'probe' (outer) relation, matching against hash table

Hash tables fit in memory
Base tables are the smaller relations
Probe relations (intermediates) are large

However, this requires knowing which tables are small enough to build hash tables—which again depends on accurate cardinality estimates.

Modern Optimizer Flexibility

Complexity Reduction Techniques

Beyond tree shape restrictions, optimizers employ several additional techniques to manage the exponential search space:

Search Space Reduction Strategies

•Join graph constraints — Not all table pairs have join predicates. The 'join graph' captures which tables are directly joinable. Orderings that require Cartesian products are avoided or heavily penalized.
•Interesting orderings — Some join orderings produce sorted output useful for later operations (GROUP BY, ORDER BY, merge joins). These 'interesting' orderings may be preserved even if slightly costlier.
•Query decomposition — Large queries can sometimes be broken into independent subproblems, solved separately, then combined. This turns one huge search into several smaller ones.
•Cost-based pruning — Maintain a running upper bound on acceptable cost. Abandon any partial plan that already exceeds this bound, avoiding exploration of doomed branches.
•Heuristic filtering — Quickly reject orderings that violate known-good heuristics (e.g., always filter before joining, avoid Cartesian products except as last resort).
•Sampling and randomization — For very large search spaces, randomly sample orderings and use the best found within a time budget. This trades optimality for predictable optimization time.

The Join Graph Insight

Join graph topology effects:

The structure of join predicates critically affects search space size:

Star schema: One central fact table joined to multiple dimension tables. The central table must be joined early, drastically limiting orderings.
Chain topology: Tables form a linear chain of joins (A-B-C-D). Only O(n) fundamentally different orderings exist.
Clique topology: Every table joins with every other table (rare in practice). This is the worst case—full exponential search space.
Snowflake schema: Multiple chains branching from a center. Search space is polynomial in chain lengths.

Most real-world schemas fall between these extremes, with partial connectivity that significantly constrains the effective search space.

Join Graph Topology and Search Space
Topology	Join Pattern	Effective Search Space	Example
Chain	A-B-C-D (linear)	O(n)	Time-series joins
Star	Center connected to all	O(n!)	Data warehouse facts/dims
Snowflake	Star with chains	Polynomial in chain count	Extended dim tables
Clique	All-to-all	Super-exponential	Rare/contrived

Visualizing the Search Space

To develop intuition for the scale of the join ordering problem, let's visualize how quickly it explodes.

Scale Comparisons for Join Ordering Complexity
Tables	Orderings	Physical Analogy
4	120	Cards in two decks
6	30,240	People in a sports stadium section
8	17 million	Population of a major metro area
10	17 billion	More than double Earth's population
12	26 trillion	100× US national debt in dollars
15	600 quadrillion	Grains of sand on all beaches
20	10^25	Stars in the observable universe × 1000

The Cliff at 10-12 Tables

The pruning imperative:

Given these numbers, effective optimization requires pruning the search space by many orders of magnitude. Consider a 12-table query:

Total orderings: ~26 trillion
Evaluations possible in 1 second: ~1 million
Evaluation needed for coverage: 26 billion seconds (800+ years)

Yet real optimizers handle such queries in under a second. This is only possible through:

Dynamic programming — Each subproblem solved once, not re-evaluated per occurrence
Aggressive pruning — Most orderings never evaluated; known-worse options discarded early
Heuristic filtering — Orderings violating rules of thumb rejected without costing
Search limits — Hard caps on enumeration, accepting potentially suboptimal plans

The next pages will explore how optimizers achieve this remarkable efficiency.

Implications for Optimizer Design

The combinatorial explosion has profound implications for how query optimizers are designed and implemented:

Design Principles Driven by Combinatorics

•Optimization must be time-bounded — Unbounded search is unacceptable. Optimizers typically impose limits on enumeration count, elapsed time, or search depth.
•Heuristics are not optional — Even sophisticated algorithms need heuristic guidance to focus search on promising regions of the space.
•Optimality is not guaranteed — For large queries, optimizers find 'good' plans, not provably optimal ones. This is a fundamental engineering tradeoff.
•Different algorithms suit different scales — Dynamic programming for small queries (≤10 tables), randomized or greedy for large queries (15+ tables).
•Partial optimization is valuable — Even crude improvements over the worst ordering yield significant performance gains. Perfect is the enemy of good.

The Commercial Reality

The optimization budget concept:

Modern optimizers often work with an 'optimization budget'—a limit on how many plans can be explored. This budget might be expressed as:

Maximum number of plan alternatives considered
Maximum optimization time allowed
Maximum memory for the search state

When the budget is exhausted, the optimizer returns the best plan found so far, even if better plans theoretically exist. This approach:

Guarantees bounded optimization time
Works well in practice (early pruning usually finds good plans quickly)
Allows tuning the optimization time vs plan quality tradeoff
Prevents catastrophic optimization overhead on pathological queries

Summary: Exponential Possibilities

We've explored the daunting combinatorial landscape of join ordering. Let's consolidate the key insights:

Key Takeaways

•Join orderings are binary trees — Each ordering specifies both the sequence of joins and the tree structure (left-deep, right-deep, or bushy).
•The count involves Catalan numbers — Tree topologies are counted by Catalan numbers, multiplied by permutations and commutativity choices.
•Growth is super-exponential — For n tables, orderings grow roughly as (2n-2)!/((n-1)!), far exceeding n!.
•Exhaustive search is infeasible — Beyond 10-12 tables, evaluating all orderings would take longer than the age of the universe.
•Tree shape restrictions help enormously — Limiting to left-deep trees reduces complexity from super-exponential to factorial.
•Join graph topology matters — Real schemas with constrained join relationships have much smaller effective search spaces.
•Optimizers use bounded search — Time limits, heuristics, and pruning make the problem tractable at the cost of guaranteed optimality.

What's next:

Page Complete

2 / 5