Loading learning content...
Here's a question that seems simple but has profound implications: How many different ways can you order 10 table joins?
If your intuition says '10 factorial' or about 3.6 million, you're underestimating by several orders of magnitude. The actual number of distinct join orderings for 10 tables exceeds 17.6 trillion—more than 4 million times larger than a simple permutation count.
This exponential explosion lies at the heart of query optimization complexity. It explains why optimizers can't simply try all possibilities, why heuristics are essential, and why even sophisticated search algorithms must make difficult tradeoffs. Understanding this combinatorial landscape is crucial for anyone who wants to comprehend how database systems approach the join ordering problem.
By the end of this page, you will understand how the number of possible join orderings is calculated, why it grows faster than factorial, and what constraints optimizers impose to make the search space tractable. You'll develop intuition for the scale of the problem that query optimizers must solve.
To understand why join orderings are so numerous, we first need to recognize what we're actually counting. A join ordering isn't just a sequence of tables—it's a binary tree structure where:
This tree perspective is essential because joins are binary operations—each join takes exactly two inputs. When joining 4 tables (A, B, C, D), we don't simply specify 'join A, then B, then C, then D.' We specify a particular tree structure, such as:
⋈ or ⋈
/ \ / \
⋈ D ⋈ ⋈
/ \ / \ / \
⋈ C A B C D
/ \
A B
These two structures represent fundamentally different join plans—the left is a left-deep tree, while the right is a bushy tree.
The difference between tree structures isn't just academic. Left-deep trees allow pipelining intermediate results (no materialization), while bushy trees can exploit parallelism but may require materializing intermediate results. The optimizer must consider both the ordering and the tree shape.
The components of join ordering:
When counting distinct join orderings for n tables, we must account for:
Each of these multiplies the search space. Let's examine them in detail.
The number of distinct binary tree topologies with n leaves is given by the (n-1)th Catalan number, denoted C_{n-1}:
C_n = (2n)! / ((n+1)! × n!)
Alternatively, using the binomial coefficient:
C_n = C(2n, n) / (n + 1)
The Catalan numbers grow rapidly but are themselves just one component of the total ordering count.
| Tables (n) | Tree Topologies (C_{n-1}) | Growth Pattern |
|---|---|---|
| 2 | 1 | Base case |
| 3 | 2 | ×2 |
| 4 | 5 | ×2.5 |
| 5 | 14 | ×2.8 |
| 6 | 42 | ×3.0 |
| 7 | 132 | ×3.14 |
| 8 | 429 | ×3.25 |
| 9 | 1,430 | ×3.33 |
| 10 | 4,862 | ×3.4 |
| 15 | 2,674,440 | ~4x per table added |
| 20 | 1,767,263,190 | Rapidly growing |
Asymptotic behavior:
The Catalan numbers grow roughly as:
C_n ∼ 4^n / (n^{3/2} × √π)
This means tree topologies alone contribute exponential growth of approximately 4^n. But we're not done—we still need to assign tables to leaves.
Catalan numbers count many combinatorial structures: balanced parentheses, binary search tree shapes, non-crossing partitions, and more. Join trees are isomorphic to full binary trees, which is exactly what a Catalan number counts with n leaves and n-1 internal nodes.
Now let's assemble the complete formula. For n tables:
Total orderings = Tree topologies × Leaf assignments × Commutativity options
Combining these:
Total orderings = C_{n-1} × n! × 2^{n-1}
= C_{n-1} × n! × 2^{n-1}
Using the Catalan formula:
= [(2n-2)! / (n! × (n-1)!)] × n! × 2^{n-1}
= (2n-2)! × 2^{n-1} / (n-1)!
This simplifies to the product of all odd numbers from 1 to 2n-3, multiplied by n!, which grows extremely fast.
| Tables (n) | Total Orderings | Scientific Notation | Time to Enumerate (1μs each) |
|---|---|---|---|
| 2 | 2 | 2 × 10^0 | 2 microseconds |
| 3 | 12 | 1.2 × 10^1 | 12 microseconds |
| 4 | 120 | 1.2 × 10^2 | 120 microseconds |
| 5 | 1,680 | 1.68 × 10^3 | 1.7 milliseconds |
| 6 | 30,240 | 3.02 × 10^4 | 30 milliseconds |
| 7 | 665,280 | 6.65 × 10^5 | 0.67 seconds |
| 8 | 17,297,280 | 1.73 × 10^7 | 17 seconds |
| 9 | 518,918,400 | 5.19 × 10^8 | 8.6 minutes |
| 10 | 17,643,225,600 | 1.76 × 10^10 | 4.9 hours |
| 15 | ~6.2 × 10^17 | 6.2 × 10^17 | ~20,000 years |
Note that join orderings grow faster than n! because of the additional Catalan number and commutativity factors. While 10! = 3.6 million, actual 10-table orderings exceed 17 billion. This super-factorial growth is what makes exhaustive enumeration impossible for typical analytical queries.
Practical interpretation:
With modern hardware evaluating roughly 1 million orderings per second:
Since analytical queries commonly join 10-20+ tables, exhaustive enumeration is fundamentally infeasible for real-world workloads.
One of the most important techniques for taming the combinatorial explosion is restricting the search space to specific tree shapes. The most common restriction is to consider only left-deep trees.
Left-Deep Trees:
⋈
/ \
⋈ D
/ \
⋈ C
/ \
A B
The right input to every join (except the first) is a base table. The left input is always an intermediate result from a previous join.
Bushy Trees:
⋈
/ \
⋈ ⋈
/ \ / \
A B C D
Intermediate results can appear on both sides of a join. Both subtrees are fully evaluated before the root join executes.
| Tables (n) | Left-Deep Only | All Trees (Bushy) | Reduction Factor |
|---|---|---|---|
| 4 | 24 | 120 | 5× |
| 5 | 120 | 1,680 | 14× |
| 6 | 720 | 30,240 | 42× |
| 8 | 40,320 | 17,297,280 | 429× |
| 10 | 3,628,800 | 17,643,225,600 | 4,862× |
| 12 | 479,001,600 | ~2.6 × 10^13 | ~55,000× |
Why restrict to left-deep?
The reduction from O((2n-2)!/((n-1)!)) to O(n!) is dramatic—often several orders of magnitude for queries joining 8+ tables. But is this restriction safe? What do we lose?
Advantages of left-deep restriction:
Disadvantages:
Most commercial database systems default to left-deep enumeration but may consider bushy plans under specific conditions: when explicit parallelism is requested, when subquery decorrelation creates bushy opportunities, or when advanced hints are provided. This hybrid approach balances optimization tractability with plan quality.
While left-deep trees are the most common restriction, other tree shapes have specific use cases and trade-offs.
| Tree Shape | Enumeration Complexity | Execution Advantage | Best Use Case |
|---|---|---|---|
| Left-deep | n! | Pipelining, index NL joins | OLTP, indexed lookups |
| Right-deep | n! | Hash join cascades | Hash-join dominated plans |
| Bushy | (2n-2)!/((n-1)!) | Parallelism | Parallel execution, MPP |
| Balanced bushy | Special cases only | Minimal height | Perfectly sized queries |
The hash join connection:
Right-deep trees have an interesting property for hash joins. In a hash join:
In a right-deep tree with all base tables on the left, all hash tables are built first (on base tables), then a single probe pass streams an intermediate result through the entire cascade. This can be very efficient when:
However, this requires knowing which tables are small enough to build hash tables—which again depends on accurate cardinality estimates.
Sophisticated optimizers like those in PostgreSQL, SQL Server, and Oracle don't rigidly enforce tree shape restrictions. They may use left-deep enumeration as a baseline but consider bushy alternatives for specific subexpressions, especially when common table expressions, derived tables, or lateral joins are involved.
Beyond tree shape restrictions, optimizers employ several additional techniques to manage the exponential search space:
Real queries rarely allow arbitrary orderings. If table A only joins with B and C, but not D, then any sensible ordering must connect A to either B or C before involving D. The join graph structure often reduces the effective search space far below theoretical maximums.
Join graph topology effects:
The structure of join predicates critically affects search space size:
Most real-world schemas fall between these extremes, with partial connectivity that significantly constrains the effective search space.
| Topology | Join Pattern | Effective Search Space | Example |
|---|---|---|---|
| Chain | A-B-C-D (linear) | O(n) | Time-series joins |
| Star | Center connected to all | O(n!) | Data warehouse facts/dims |
| Snowflake | Star with chains | Polynomial in chain count | Extended dim tables |
| Clique | All-to-all | Super-exponential | Rare/contrived |
To develop intuition for the scale of the join ordering problem, let's visualize how quickly it explodes.
| Tables | Orderings | Physical Analogy |
|---|---|---|
| 4 | 120 | Cards in two decks |
| 6 | 30,240 | People in a sports stadium section |
| 8 | 17 million | Population of a major metro area |
| 10 | 17 billion | More than double Earth's population |
| 12 | 26 trillion | 100× US national debt in dollars |
| 15 | 600 quadrillion | Grains of sand on all beaches |
| 20 | 10^25 | Stars in the observable universe × 1000 |
Notice the transition around 10-12 tables. Below 8, optimization time is negligible. Between 8-10, optimization becomes noticeable but manageable. Above 12, exhaustive search transitions from 'impractical' to 'physically impossible.' This threshold shapes optimizer design decisions.
The pruning imperative:
Given these numbers, effective optimization requires pruning the search space by many orders of magnitude. Consider a 12-table query:
Yet real optimizers handle such queries in under a second. This is only possible through:
The next pages will explore how optimizers achieve this remarkable efficiency.
The combinatorial explosion has profound implications for how query optimizers are designed and implemented:
Commercial databases like Oracle, SQL Server, and PostgreSQL all use hybrid approaches. They combine exhaustive enumeration for small subproblems (≤8-10 tables) with heuristic or randomized search for larger problems. Configuration parameters often control the tradeoff between optimization time and plan quality.
The optimization budget concept:
Modern optimizers often work with an 'optimization budget'—a limit on how many plans can be explored. This budget might be expressed as:
When the budget is exhausted, the optimizer returns the best plan found so far, even if better plans theoretically exist. This approach:
We've explored the daunting combinatorial landscape of join ordering. Let's consolidate the key insights:
What's next:
Understanding the size of the search space sets the stage for understanding how optimizers navigate it. The next page explores techniques for finding optimal join orders—particularly dynamic programming, which finds provably optimal orderings within restricted search spaces.
You now understand the combinatorial explosion underlying join ordering. The super-exponential growth in possible orderings—driven by tree topologies, permutations, and commutativity—explains why optimizers can't try everything and must rely on smart algorithms, heuristics, and search restrictions. Next, we'll explore how to find optimal orderings within these constraints.