Loading content...
When joining multiple tables, the optimizer doesn't just choose which tables to join first—it also decides the shape of the join tree. Should all joins be chained linearly, forming a left-deep or right-deep tree? Or should intermediate results be combined in a balanced, bushy structure?
This seemingly abstract structural decision has profound practical implications:
This page explores the taxonomy of join tree shapes, their properties, and how optimizers choose between them—or explore all simultaneously.
By the end of this page, you will understand: (1) The definition of left-deep, right-deep, and bushy join trees, (2) The number of trees of each type, (3) Execution characteristics and trade-offs, (4) When each shape is advantageous, (5) How optimizers restrict or explore different tree shapes, and (6) Modern approaches that consider all shapes adaptively.
A join tree is a binary tree where leaf nodes are base tables and internal nodes are join operations. For n tables, there are n-1 joins and many possible tree structures.
Left-Deep Trees:
In a left-deep tree, all right children are base relations (leaf nodes):
⋈
/ \
⋈ D
/ \
⋈ C
/ \
A B
Execution order: ((A ⋈ B) ⋈ C) ⋈ D
The join result always flows down the left spine; each join combines the running result with a fresh base table.
Right-Deep Trees:
Symmetrically, all left children are base relations:
⋈
/ \
A ⋈
/ \
B ⋈
/ \
C D
Execution order: A ⋈ (B ⋈ (C ⋈ D))
Bushy Trees (Zig-Zag Trees):
Both children of an internal node can be intermediate results:
⋈
/ \
⋈ ⋈
/ \ / \
A B C D
Execution order: (A ⋈ B) ⋈ (C ⋈ D)
Bushy trees are more general—they include left-deep and right-deep as special cases.
| Tree Type | Definition | Visual Pattern | Key Property |
|---|---|---|---|
| Left-Deep | Right children are always leaves | Leans left (chain) | Sequential/pipeline friendly |
| Right-Deep | Left children are always leaves | Leans right (chain) | Build-side ready for hash |
| Bushy | Either child can be intermediate | Balanced binary tree | Enables parallelism |
| Zig-Zag | Neither strictly left nor right deep | Irregular shape | Intermediate case |
The number of possible join trees varies dramatically by shape:
Left-Deep Trees:
For n tables, we choose which table goes first, which goes second, etc.—a permutation:
Number of left-deep trees = n!
For 10 tables: 10! = 3,628,800
All Binary Trees (Bushy):
The number of different binary tree shapes with n leaves is the (n-1)th Catalan number:
Cₙ₋₁ = (2(n-1))! / (n! × (n-1)!)
For each tree shape, we can assign tables to leaves in n! ways:
Number of bushy trees = Cₙ₋₁ × n! = (2n-2)! / (n-1)!
For 10 tables: C₉ × 10! = 4862 × 3,628,800 ≈ 17.6 billion trees!
| Tables (n) | Left-Deep (n!) | Bushy ((2n-2)!/(n-1)!) | Ratio |
|---|---|---|---|
| 3 | 6 | 12 | 2× |
| 4 | 24 | 120 | 5× |
| 5 | 120 | 1,680 | 14× |
| 6 | 720 | 30,240 | 42× |
| 8 | 40,320 | 17,297,280 | 429× |
| 10 | 3,628,800 | 17,643,225,600 | 4,862× |
| 12 | 479,001,600 | 6.9 × 10¹⁴ | 1.4 million× |
Optimizer implications:
If optimization explores only left-deep trees, the search space is O(n!), already challenging but manageable with DP.
If exploring all bushy trees, the search space explodes—17 billion possibilities for just 10 tables.
This is why many optimizers restrict to left-deep trees by default:
Catalan numbers (1, 1, 2, 5, 14, 42, 132, ...) appear throughout combinatorics: binary trees, parenthesizations, non-crossing partitions. They grow approximately as 4^n / (n^(3/2) × √π), ensuring bushy enumeration is infeasible beyond ~10-12 tables without pruning.
Left-deep trees have execution properties that make them particularly suitable for traditional, single-threaded, disk-based database systems:
Property 1: Pipelining
In a left-deep tree with nested loop or hash joins, intermediate results can flow directly to the next join without full materialization:
((A ⋈ B) ⋈ C) ⋈ D
Execution:
1. For each tuple from A:
2. Find matching tuples in B (using index or hash)
3. For each match, find matching tuples in C
4. For each match, find matching tuples in D
5. Output result
No intermediate result is ever fully materialized to disk (unless memory overflows). This is streaming execution.
Property 2: Index Nested Loop Friendly
With a left-deep tree and index nested loop joins:
((A ⋈ B) ⋈ C) ⋈ D
idx idx idx
- For each tuple flowing through:
- Probe index on B → probe index on C → probe index on D
- Optimal when indexes exist on right-side tables
- Memory requirement: O(1) per join level (just input buffers)
System R (1979) restricted to left-deep trees partly due to limited memory and the prevalence of nested loop joins with indexes. As systems gained memory and hash joins became common, this restriction relaxed—but left-deep remains preferred for many OLTP workloads.
Despite larger search space and materialization needs, bushy trees offer significant advantages in specific scenarios:
Advantage 1: Smaller Intermediate Results
Consider four tables with these cardinalities:
Left-deep plan:
((A ⋈ B) ⋈ C) ⋈ D
Intermediate sizes:
A ⋈ B = 1,000 rows
(A ⋈ B) ⋈ C = 500,000 rows (cross-product-ish if no predicate!)
final = ... depends on D
Bushy plan:
(A ⋈ B) ⋈ (C ⋈ D)
Intermediate sizes:
A ⋈ B = 1,000 rows
C ⋈ D = 100 rows
final = 1,000 × 100 / selectivity = perhaps 500 rows
The bushy plan independently reduces both branches before the final join, avoiding the massive intermediate.
Advantage 2: Parallelism Within Query
In a bushy tree, independent branches can execute in parallel:
⋈ (wait for both)
/ \
⋈ ⋈
/| |\
A B C D
Worker 1 Worker 2
Computes A⋈B Computes C⋈D
↓ ↓
Wait for both...
↓
Final join
With 8 tables and a perfectly balanced bushy tree (4 leaves per branch), speedup can approach 2× for that stage. More branches enable more parallelism.
Advantage 3: Hash Join Build Side Optimization
Hash joins perform best when the build side (inner relation) is small. Bushy trees allow computing small intermediates before joining:
Bushy: (D ⋈ E) served as build side for join with larger (A ⋈ B ⋈ C)
→ Small hash table, efficient probing
Left-deep: Must build on base tables only
→ If all base tables are large, hash tables are huge
Right-deep trees are less common but have their place:
Right-Deep Structure:
⋈
/ \
A ⋈
/ \
B ⋈
/ \
C D
Execution: A ⋈ (B ⋈ (C ⋈ D))
Execution with hash joins:
In a right-deep tree with hash joins:
Each join's build side is the result of the next-inner join.
Advantages of right-deep for hash joins:
When right-deep helps:
The trade-off:
Right-deep requires building all hash tables before any output is produced—fully blocking on build phases. This increases memory pressure and latency to first result.
Zig-Zag and Mixed Trees:
Some plans are neither purely left-deep nor right-deep:
⋈
/ \
⋈ C
/ \
A ⋈
/ \
B D
(A ⋈ (B ⋈ D)) ⋈ C
These "zig-zag" plans combine properties of both. Modern optimizers explore these when beneficial.
Right-deep trees can implement "hash join pipelines" where probe tuples flow through multiple hash tables in sequence. This is efficient for analytical workloads with one large fact table (probe side) joining many small dimensions (build sides).
Optimizers use several strategies to control which tree shapes are considered:
Strategy 1: Restrict to Left-Deep
Only generate left-deep plans during enumeration:
for each permutation P of tables:
plan = ((P[0] ⋈ P[1]) ⋈ P[2]) ⋈ ... ⋈ P[n-1]
cost = evaluate(plan)
update best if cheaper
Search space: O(n!), easily handled by DP.
Strategy 2: Bushy with Pruning
Explore bushy trees but aggressively prune:
for each subset S in increasing size:
for each partition (S₁, S₂) of S:
if estimate(S₁ ⋈ S₂) > current_best × threshold:
skip (prune)
else:
evaluate and update
Pruning makes bushy exploration tractable for moderate query sizes.
Strategy 3: Shape Heuristics
Use query characteristics to select shape:
if query is star schema:
prefer bushy (dimension joins first)
else if query has chain structure:
prefer left-deep following chain
else if parallel execution:
prefer bushy for parallelism
else:
default to left-deep
| System | Default Behavior | Bushy Support | Control Mechanism |
|---|---|---|---|
| PostgreSQL | Left-deep preferred | Considers some bushy | GEQO for large queries |
| MySQL | Left-deep (historically) | Recent versions explore more | Optimizer hints |
| SQL Server | Explores bushy | Full support | Search levels and timeouts |
| Oracle | Flexible | Full support | Optimizer mode settings |
| Spark SQL | Bushy by default | Native parallel | Adaptive query execution |
Many systems allow hints to force tree shape. Oracle's /*+ ORDERED */ hint enforces left-deep following FROM clause order. SQL Server's OPTION (FORCE ORDER) does similarly. Use sparingly—the optimizer usually knows best.
The relationship between tree shape and parallelism is crucial for modern analytics systems:
Intra-Query Parallelism with Bushy Trees:
⋈ (parallel merge)
/ \
⋈ ⋈
/ \ / \
A B C D
[Worker Pool 1] [Worker Pool 2]
Processes A⋈B Processes C⋈D
↓ ↓
Result 1 Result 2
↓
Final Join
Bushy structure naturally exposes independent work units. A perfectly balanced bushy tree of 2^k leaves can achieve up to k levels of binary parallelism.
Pipeline Parallelism with Left-Deep:
Left-deep trees enable a different form of parallelism:
((A ⋈ B) ⋈ C) ⋈ D
Stage 1 Stage 2 Stage 3
A ⋈ B → _ ⋈ C → _ ⋈ D
↘ tuple flow ↘
Tuples pipeline through stages;
each stage can use multiple workers for its operation.
This is pipeline parallelism: stages operate concurrently on different tuples.
Distributed Systems Considerations:
| Aspect | Left-Deep | Bushy |
|---|---|---|
| Data movement | Sequential: result of join n ships to node(s) for join n+1 | May ship multiple intermediates in parallel |
| Parallelism within query | Limited to worker parallelism per stage | Multiple independent stages run simultaneously |
| Network traffic | Potentially lower (one intermediate at a time) | Potentially higher (multiple intermediates) |
| Load balancing | Easier (single flow) | More complex (multiple flows) |
| Fault tolerance | Simpler checkpointing | More complex state management |
Modern MPP systems (Spark, Presto, Snowflake):
These systems aggressively use bushy plans:
Bushy trees align naturally with this execution model, enabling full cluster utilization.
The cost model adapts:
In parallel systems, the cost model accounts for:
Parallel plans introduce 'exchange' operators that repartition or broadcast data. These appear at plan boundaries—where data must move between parallel workers. Exchange costs are major factors in parallel plan optimization.
Given the trade-offs, how do real optimizers choose tree shapes? Here's a practical decision framework:
Consider Left-Deep When:
Adaptive approaches:
Modern optimizers don't commit to one shape dogmatically:
Costing considers both: Enumerate left-deep AND promising bushy plans; let cost decide
Bushy sampling: For large queries, sample random bushy plans and keep the best
Heuristic triggers: Switch to bushy exploration when cardinality estimates suggest left-deep will have huge intermediates
Runtime adaptation: Start with left-deep; if intermediate materialization is needed anyway (memory overflow), consider restructuring
Example decision process:
1. Estimate cardinalities for all joins
2. Compute left-deep plan cost (quick since O(n!) is tractable)
3. If left-deep has acceptable cost → use it
4. If intermediates are huge → explore bushy alternatives
5. If parallel execution → weight bushy for parallelism benefits
6. Compare and select
For single-node OLTP: left-deep is usually fine. For analytics on data warehouses or MPP systems: bushy trees often win significantly. When in doubt, let the optimizer decide—but understand WHY it chose what it did when analyzing slow queries.
We've explored the structural dimension of query optimization. Let's consolidate the key insights:
What's next:
With tree shapes understood, the final piece of cost-based optimization is the search strategy itself. The next page explores Search Strategies—how optimizers balance thoroughness with speed, employing techniques from greedy algorithms to genetic optimization to find excellent plans within time budgets.
You now understand the taxonomy of join tree shapes and their trade-offs. This structural dimension—distinct from join ordering and algorithm selection—critically affects execution performance. Recognizing when each shape excels helps explain optimizer behavior and guides performance tuning.