Database Management SystemsQuery Optimization

Optimization Overview

LevelAdvanced

Duration60 mins

TopicQuery Optimization

2 / 5

Search Space

The Universe of Possible Plans

Imagine you're planning a road trip visiting 10 cities. How many different routes could you take? The answer—over 3.6 million—hints at a combinatorial explosion that query optimizers face with every query. But the optimizer's challenge is far worse: it's not just choosing an order, but also selecting how to travel between each city, what vehicle to use, whether to make stops, and countless other decisions.

The search space is the complete set of all possible execution plans for a query. Understanding its structure, size, and properties is essential for understanding how optimizers work—and why they sometimes fail. In this page, we'll map this vast landscape and understand the mathematical principles that govern it.

What You Will Learn

By the end of this page, you will understand how the search space of execution plans is structured, why it grows explosively with query complexity, the key factors that determine search space size, and how different plan topologies (linear vs. bushy trees) affect optimization. This knowledge is foundational for understanding the enumeration and pruning strategies covered in subsequent pages.

Defining the Search Space

The search space for a query is the set of all possible execution plans that correctly implement the query's semantics. Each point in this space represents a distinct way of executing the query—a unique combination of operators, algorithms, orderings, and access methods.

Formally, we can define the search space as:

Search Space S(Q) = { P | P is a valid execution plan for query Q }

Where a valid plan P satisfies:

Semantic Correctness: P produces the same result as any other valid plan for Q
Completeness: P includes all necessary operations to produce the result
Executability: P uses only operators and algorithms available in the system

The search space is implicit—it's not explicitly enumerated in memory but is defined by the grammar of possible plan structures. The optimizer explores this space using generation rules that produce valid plans.

1.1 Dimensions of the Search Space

Every execution plan is defined by choices across multiple dimensions. Each dimension is independent, and the total search space is the Cartesian product of choices across all dimensions:

Search Space Dimensions
Dimension	Description	Example Choices	Typical Options
Join Order	Sequence in which tables are joined	R ⋈ S ⋈ T vs R ⋈ T ⋈ S	n! / 2 orderings
Join Algorithms	Physical algorithm for each join	Hash, Merge, Nested-Loop	3-5 per join
Access Methods	How to read each base table	Table scan, Index scan, Index-only	2-5 per table
Join Type Implementation	Left-deep, right-deep, or bushy tree	Tree shape for multi-way join	Catalan(n) shapes
Operator Placement	Where to apply filters, projections	Push down vs. post-join	Multiple positions
Parallelism	Degree and partitioning strategy	Sequential, hash-partition, broadcast	Varies by system

Independence Amplifies Explosion

Because dimensions are largely independent, their choices multiply together. If you have 10 join orders, 3 algorithms per join, and 2 access methods per table—for a 4-table join with 4 tables—the space explodes: 10 × 3³ × 2⁴ = 10 × 27 × 16 = 4,320 plans. Each additional dimension multiplies the total.

Join Order: The Dominant Factor

Among all dimensions of the search space, join ordering typically dominates complexity. The order in which tables are joined can affect query performance by orders of magnitude, and the number of possible orderings grows super-exponentially with the number of tables.

2.1 Why Join Order Matters

Consider a simple example with three tables:

Orders: 1,000,000 rows
Customers: 50,000 rows
Products: 10,000 rows

Query: Find orders from premium customers for electronics products.

SELECT o.order_id, c.name, p.product_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE c.tier = 'premium' AND p.category = 'electronics';

The filter on customers yields ~5,000 rows (10%). The filter on products yields ~1,000 rows (10%). But orders has no direct filter.

Bad Join Order: Orders First

•Join Orders (1M) with Customers (50K)
•Intermediate result: ~1M rows
•Then filter by tier: ~100K rows
•Join with Products (10K)
•Filter by category
•Total work: ~1M + 100K operations

Good Join Order: Filter First

•Filter Customers to premium: 5K rows
•Filter Products to electronics: 1K rows
•Join Orders with filtered Customers
•Intermediate: ~50K rows (assuming 10:1)
•Join with filtered Products
•Total work: ~55K operations

The difference is approximately 20× in this example, and real-world differences can exceed 1000×. This enormous variance makes join order optimization the primary focus of most query optimizers.

2.2 Counting Join Orderings

For n tables, how many join orderings exist? The answer depends on whether we consider linear (left-deep/right-deep) or bushy trees:

Linear Trees (Left-Deep): In a left-deep tree, the inner relation of each join is always a base table. The number of left-deep orderings is simply the number of table permutations:

$$\text{Left-deep orderings} = n!$$

For commutative joins (where A ⋈ B = B ⋈ A), this reduces to n!/2.

Bushy Trees: Bushy trees allow arbitrary nesting—any intermediate result can join with any other. The number of distinct bushy tree shapes follows the Catalan numbers:

$$C_n = \frac{1}{n+1}\binom{2n}{n} = \frac{(2n)!}{(n+1)!n!}$$

Catalan(n) × n! gives the total bushy orderings, growing approximately as 4^n/n^1.5.

Join Order Search Space Growth
Tables (n)	Left-Deep (n!/2)	Catalan(n-1)	Bushy Total	Comparison
2	1	1	1	Trivial
3	3	2	6	Manageable
4	12	5	60	Small
5	60	14	840	Moderate
6	360	42	15,120	Large
8	20,160	429	4.7M	Challenging
10	1.8M	4,862	176B	Extreme
12	239M	58,786	~10^15	Impossible

Exponential Growth Is Inescapable

Even if we could evaluate one plan per nanosecond, exhaustively searching 10^15 plans would take over 30 years. No optimization technique eliminates this fundamental explosion—we can only navigate it intelligently through pruning and heuristics.

Algorithm Selection Space

Beyond join ordering, the optimizer must choose which algorithm to use for each operator. This creates another multiplicative factor in the search space.

3.1 Join Algorithm Options

Most database systems support multiple join algorithms, each optimal for different data characteristics:

Algorithm	Best When	Strengths	Weaknesses
Nested Loop	Small outer, indexed inner	Simple, works with any join condition	O(n×m) worst case
Block Nested Loop	Medium tables, limited memory	Better cache utilization	Still O(n×m)
Index Nested Loop	Indexed inner table	Efficient for selective joins	Requires suitable index
Hash Join	Equi-joins, one small table	O(n+m) time	Requires memory for hash table
Sort-Merge Join	Pre-sorted data, range joins	Handles duplicates well	Requires sorting if not sorted
Grace Hash Join	Large tables, limited memory	Disk-based, predictable	Multiple passes over data

For a query with k joins, if each has a choice of j algorithms, this dimension contributes j^k plans.

3.2 Access Method Options

For each base table, the optimizer chooses how to retrieve data:

Access Method	Description	Cost Characteristics
Sequential Scan	Read entire table	O(pages), good for large result sets
Index Scan	Use B-tree/hash index	O(log n + k), good for selective queries
Index-Only Scan	Answer from index alone	Fastest when applicable
Bitmap Scan	Combine multiple indexes	Good for OR conditions
TID Scan	Direct tuple access	Used for specific row access

With a average of a access methods per table and t tables, this contributes a^t plans.

3.3 Multiplicative Effect

The total algorithm selection space for a query with t tables and t-1 joins, with j join algorithms and a access methods:

$$\text{Algorithm Space} = a^t \times j^{t-1}$$

For 5 tables with 3 access methods each and 4 join algorithms: $$3^5 \times 4^4 = 243 \times 256 = 62,208 \text{ algorithm combinations}$$

This multiplies with join orderings. A 5-table query might have: $$60 \text{ (join orders)} \times 62,208 \text{ (algorithms)} = 3.7 \text{ million plans}$$

Pruning Through Dominance

In practice, many algorithm choices can be pruned early. If hash join always beats nested-loop for a particular join (given statistics), nested-loop is eliminated without full plan enumeration. This 'dominance pruning' dramatically reduces the effective search space.

Plan Tree Topologies

The shape of the execution plan tree is itself a choice that affects both the search space size and the execution characteristics. Understanding plan topologies is crucial for optimization strategy.

4.1 Left-Deep Trees

In a left-deep tree, the right input of every join is always a base table. The left input is either a base table (for the first join) or the result of a previous join.

Characteristics:

Pipelining: Results can flow directly from one join to the next
Memory Efficient: Only one intermediate result "in flight"
Good for Indexed Inner: Each join can probe an index on the base table
Search Space: n!/2 orderings (polynomial in the exponent)

4.2 Right-Deep Trees

The mirror image—left input of every join is a base table:

    ⋈
   / \
  T1  ⋈
     / \
    T2  ⋈
       / \
      T3  T4

Characteristics:

Bulk Operations: Good for hash joins with materialized build phases
Parallelism Friendly: All base tables can be scanned concurrently
Memory Intensive: Multiple intermediate results active simultaneously
Search Space: Same as left-deep

4.3 Bushy Trees

Bushy trees allow any structure—intermediate results can join with other intermediate results:

         ⋈
        / \
       ⋈   ⋈
      / \ / \
     T1 T2 T3 T4

Characteristics:

Maximum Flexibility: Can exploit opportunities left-deep misses
Parallelism: Sub-trees can execute concurrently
Complex Optimization: Much larger search space
Memory Pressure: Multiple intermediate results
Search Space: Catalan(n-1) × n!, grows as 4^n/n^1.5

4.4 Why Left-Deep Dominates

Despite bushy trees' flexibility, most commercial optimizers focus on left-deep trees:

Smaller Search Space: n! vs 4^n growth
Pipelining: Memory efficiency and response time
Good Enough: Left-deep plans are often within 10-20% of bushy optimal
Predictability: Easier to estimate cost and memory
Hardware Match: Traditional disk-based systems favor sequential access patterns

However, modern analytical systems with abundant memory and parallel hardware increasingly explore bushy plans.

Converting Mermaid diagram...

Operator Placement Space

Beyond joins, the placement of other operators—selections, projections, aggregations—creates additional search space dimensions.

5.1 Selection (Filter) Placement

Filters can often be applied at multiple points in the plan:

Example Query:

SELECT * FROM orders o JOIN customers c ON o.cust_id = c.id
WHERE c.country = 'USA' AND o.amount > 1000;

Placement Options:

Early (Pushed Down): Filter tables before join
- Filter(customers, country='USA') ⋈ Filter(orders, amount>1000)
Late: Filter after join
- Filter(customers ⋈ orders, country='USA' AND amount>1000)
Mixed: Some early, some late (for complex predicates)

Generally, pushing selections down reduces intermediate result sizes, improving performance. But there are exceptions:

Selective indexes might not be usable after transformation
Complex predicates might be expensive to evaluate repeatedly
Correlation with join conditions might make early filtering suboptimal

5.2 Projection Placement

Projections remove unneeded columns. Like selections, they can be placed at various points:

Early Projection Benefits:

Reduced tuple width → fewer pages → less I/O
Better cache utilization
Smaller sort/hash memory requirements

Early Projection Risks:

May eliminate columns needed for later operations
Must track column dependencies carefully

5.3 Aggregation and Grouping Placement

For queries with GROUP BY, aggregation can sometimes be pushed down:

SELECT dept_id, SUM(salary) 
FROM employees e JOIN departments d ON e.dept_id = d.id
GROUP BY dept_id;

Options:

Join first, then aggregate
Pre-aggregate employees by dept_id, then join
For certain aggregates, post-aggregate to combine pre-aggregates

Pre-aggregation is legal only for decomposable aggregates (SUM, COUNT, MIN, MAX) and can dramatically reduce join input size.

Operator Placement as Optimization

While operator placement is often handled by heuristic rewrite rules (Module 3: Heuristic Optimization), it also creates search space dimensions. Advanced optimizers consider placement alternatives during cost-based search, potentially finding placements that heuristics miss.

Search Space Constraints and Restrictions

Not all mathematically possible plans are valid. Various constraints restrict the search space:

6.1 Join Graph Constraints

The join graph of a query defines which tables can be joined directly. A join between tables A and B is only possible if there's a join predicate connecting them.

-- Join Graph:
-- Customers ── Orders ── Products
-- (customers.id = orders.cust_id)
-- (orders.prod_id = products.id)

Without a direct predicate, joining Customers with Products first would create a Cartesian product—almost always disastrous. The optimizer typically constrains the search space to avoid intermediate Cartesian products.

6.2 Outer Join Ordering Constraints

Outer joins (LEFT, RIGHT, FULL) are not freely reorderable like inner joins:

SELECT * FROM A LEFT JOIN B ON ... LEFT JOIN C ON ...;

The order of outer joins affects semantics. A LEFT JOIN B LEFT JOIN C ≠ A LEFT JOIN C LEFT JOIN B. This constrains which orderings are semantically valid.

6.3 Physical Constraints

Index Availability: Index scans only possible where indexes exist
Memory Limits: Hash joins may not fit if table exceeds memory budget
Algorithm Applicability: Sort-merge join requires comparable types
Parallelism Limits: Maximum parallel workers, partitioning requirements

Search Space Reduction Through Constraints
Constraint Type	Example	Space Reduction
No Cartesian Products	Require join predicate connection	Often 90%+ reduction
Outer Join Order	Preserve semantic ordering	Significant for outer-heavy queries
Index Availability	Only indexed paths where indexes exist	Varies by schema
Memory Bounds	Exclude plans exceeding limits	Depends on statistics
Parallel Compatibility	Not all operators parallelize	Moderate reduction

Constraints as Friends

Far from being limitations, constraints dramatically help the optimizer by eliminating massive regions of the search space. A well-constrained query (many indexes, clear join structure, memory bounds) optimizes faster and more reliably than an unconstrained one.

Visualizing the Search Space

Conceptualizing the search space helps developers understand optimization behavior. While the actual space has as many dimensions as there are choices, we can visualize simplified representations:

7.1 Cost Landscape Metaphor

Imagine the search space as a mountainous terrain where:

Each point represents a possible plan
Elevation represents cost (lower is better)
The optimizer tries to find the lowest valley

Landscape Characteristics:

Local Minima: Good-looking plans that aren't globally optimal
Plateaus: Regions of similar-cost plans
Cliffs: Small changes causing dramatic cost differences
Ridges: Paths between good regions

7.2 Neighborhood Structure

Two plans are "neighbors" if one can be reached from the other by a single transformation:

Swapping two adjacent tables in join order
Changing one join's algorithm
Switching one table's access method

The optimizer explores by moving through neighborhoods. Different search strategies (exhaustive, greedy, simulated annealing) navigate differently:

Strategy	Behavior	Strengths	Risks
Exhaustive	Visit every point	Finds global optimum	Infeasible for large spaces
Greedy	Always move downhill	Fast convergence	Trapped in local minima
Hill Climbing	Move to best neighbor	Simple, effective	Local minima traps
Simulated Annealing	Probabilistic uphill moves	Escapes local minima	Slower convergence
Dynamic Programming	Optimal substructure	Polynomial for some dimensions	Requires problem structure

The Cost Function's Role

The 'elevation' in our metaphor—the cost function—is itself an estimate. The true landscape (actual execution time) differs from the estimated landscape (optimizer's cost model). This mismatch explains why optimizers sometimes choose 'optimal' plans that perform poorly: they're finding the minimum of the wrong function.

Implications for Optimizer Design

Understanding the search space informs how optimizers are designed:

8.1 Search Space Structure Dictates Strategy

The mathematical structure of different search dimensions enables different optimization strategies:

Join Ordering (exhibits optimal substructure): → Dynamic programming is effective because the best plan for joining tables {A,B,C} can be built from best plans for {A,B}, {A,C}, {B,C}.

Algorithm Selection (per-operator choice): → Can be decided locally once join order is fixed, often using simple cost comparison.

Operator Placement (rule-based transformations): → Heuristics work well because 'push down selections' is almost always beneficial.

8.2 Practical Search Space Reduction

Real optimizers don't explore the full theoretical space. Reduction techniques include:

Search Space Reduction Techniques

•Left-Deep Only: Ignore bushy trees, reducing 4^n to n! growth
•Interesting Orders: Only track orderings that enable merge joins or ORDER BY
•Join Graph Connectivity: Only consider connected join subsets
•Branch and Bound: Abandon partial plans exceeding current best
•Randomization: Explore random samples for complex queries
•Heuristic Prefiltering: Apply rules before cost-based search

8.3 When the Search Space Defeats the Optimizer

For some queries, even reduced search spaces are too large:

Symptoms:

Optimization takes longer than execution would
Plan quality degrades noticeably
Optimizer gives up and uses heuristics

Causes:

Many-way joins (>10 tables)
Complex predicates with many alternatives
Missing/stale statistics
Cross products in join graph

Responses:

Query decomposition (break into sub-queries)
Plan hints (constrain optimizer choices)
Prepared statements (amortize optimization cost)
Materialized views (precompute complex results)

Know When You're Hitting the Wall

An EXPLAIN or optimizer trace showing excessive optimization time, fallback to heuristic mode, or dramatically suboptimal plans often indicate search space explosion. Recognizing this pattern is essential for effective query tuning.

Summary: Navigating the Search Space

The search space of possible execution plans is the arena in which query optimization takes place. Understanding its structure and growth patterns is essential for understanding optimizer behavior and limitations.

Key Takeaways

•The search space is the set of all valid execution plans — It's defined implicitly by the grammar of possible plan structures.
•Multiple dimensions multiply together — Join order × algorithms × access methods creates explosive growth.
•Join ordering dominates complexity — Growing as n! or 4^n, it's the primary optimization challenge.
•Tree topology matters — Left-deep trees offer smaller spaces and pipelining; bushy trees offer flexibility.
•Constraints are beneficial — Join graphs, outer join semantics, and physical limits reduce valid plans.
•Exhaustive search is impossible — Intelligent exploration strategies and pruning are essential.
•Search space structure enables optimization — Dynamic programming works because of optimal substructure.

What's Next:

Now that we understand the vast landscape the optimizer must navigate, the next page examines Plan Enumeration—the algorithms and strategies optimizers use to systematically explore the search space. We'll see how dynamic programming, heuristics, and randomization combine to find good plans within time budgets.

Page Complete

You now understand the structure and size of the query optimization search space. This knowledge is foundational for understanding the enumeration algorithms, cost models, and selection strategies covered in the remaining pages of this module.

2 / 5

Loading learning content...

Database Management SystemsQuery Optimization

Optimization Overview

LevelAdvanced

Duration60 mins

TopicQuery Optimization

2 / 5

Search Space

The Universe of Possible Plans

What You Will Learn

Defining the Search Space

Formally, we can define the search space as:

Search Space S(Q) = { P | P is a valid execution plan for query Q }

Where a valid plan P satisfies:

Semantic Correctness: P produces the same result as any other valid plan for Q
Completeness: P includes all necessary operations to produce the result
Executability: P uses only operators and algorithms available in the system

1.1 Dimensions of the Search Space

Every execution plan is defined by choices across multiple dimensions. Each dimension is independent, and the total search space is the Cartesian product of choices across all dimensions:

Search Space Dimensions
Dimension	Description	Example Choices	Typical Options
Join Order	Sequence in which tables are joined	R ⋈ S ⋈ T vs R ⋈ T ⋈ S	n! / 2 orderings
Join Algorithms	Physical algorithm for each join	Hash, Merge, Nested-Loop	3-5 per join
Access Methods	How to read each base table	Table scan, Index scan, Index-only	2-5 per table
Join Type Implementation	Left-deep, right-deep, or bushy tree	Tree shape for multi-way join	Catalan(n) shapes
Operator Placement	Where to apply filters, projections	Push down vs. post-join	Multiple positions
Parallelism	Degree and partitioning strategy	Sequential, hash-partition, broadcast	Varies by system

Independence Amplifies Explosion

Join Order: The Dominant Factor

2.1 Why Join Order Matters

Consider a simple example with three tables:

Orders: 1,000,000 rows
Customers: 50,000 rows
Products: 10,000 rows

Query: Find orders from premium customers for electronics products.

SELECT o.order_id, c.name, p.product_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE c.tier = 'premium' AND p.category = 'electronics';

The filter on customers yields ~5,000 rows (10%). The filter on products yields ~1,000 rows (10%). But orders has no direct filter.

Bad Join Order: Orders First

•Join Orders (1M) with Customers (50K)
•Intermediate result: ~1M rows
•Then filter by tier: ~100K rows
•Join with Products (10K)
•Filter by category
•Total work: ~1M + 100K operations

Good Join Order: Filter First

•Filter Customers to premium: 5K rows
•Filter Products to electronics: 1K rows
•Join Orders with filtered Customers
•Intermediate: ~50K rows (assuming 10:1)
•Join with filtered Products
•Total work: ~55K operations

2.2 Counting Join Orderings

For n tables, how many join orderings exist? The answer depends on whether we consider linear (left-deep/right-deep) or bushy trees:

Linear Trees (Left-Deep): In a left-deep tree, the inner relation of each join is always a base table. The number of left-deep orderings is simply the number of table permutations:

$$\text{Left-deep orderings} = n!$$

For commutative joins (where A ⋈ B = B ⋈ A), this reduces to n!/2.

Bushy Trees: Bushy trees allow arbitrary nesting—any intermediate result can join with any other. The number of distinct bushy tree shapes follows the Catalan numbers:

$$C_n = \frac{1}{n+1}\binom{2n}{n} = \frac{(2n)!}{(n+1)!n!}$$

Catalan(n) × n! gives the total bushy orderings, growing approximately as 4^n/n^1.5.

Join Order Search Space Growth
Tables (n)	Left-Deep (n!/2)	Catalan(n-1)	Bushy Total	Comparison
2	1	1	1	Trivial
3	3	2	6	Manageable
4	12	5	60	Small
5	60	14	840	Moderate
6	360	42	15,120	Large
8	20,160	429	4.7M	Challenging
10	1.8M	4,862	176B	Extreme
12	239M	58,786	~10^15	Impossible

Exponential Growth Is Inescapable

Algorithm Selection Space

Beyond join ordering, the optimizer must choose which algorithm to use for each operator. This creates another multiplicative factor in the search space.

3.1 Join Algorithm Options

Most database systems support multiple join algorithms, each optimal for different data characteristics:

Algorithm	Best When	Strengths	Weaknesses
Nested Loop	Small outer, indexed inner	Simple, works with any join condition	O(n×m) worst case
Block Nested Loop	Medium tables, limited memory	Better cache utilization	Still O(n×m)
Index Nested Loop	Indexed inner table	Efficient for selective joins	Requires suitable index
Hash Join	Equi-joins, one small table	O(n+m) time	Requires memory for hash table
Sort-Merge Join	Pre-sorted data, range joins	Handles duplicates well	Requires sorting if not sorted
Grace Hash Join	Large tables, limited memory	Disk-based, predictable	Multiple passes over data

For a query with k joins, if each has a choice of j algorithms, this dimension contributes j^k plans.

3.2 Access Method Options

For each base table, the optimizer chooses how to retrieve data:

Access Method	Description	Cost Characteristics
Sequential Scan	Read entire table	O(pages), good for large result sets
Index Scan	Use B-tree/hash index	O(log n + k), good for selective queries
Index-Only Scan	Answer from index alone	Fastest when applicable
Bitmap Scan	Combine multiple indexes	Good for OR conditions
TID Scan	Direct tuple access	Used for specific row access

With a average of a access methods per table and t tables, this contributes a^t plans.

3.3 Multiplicative Effect

The total algorithm selection space for a query with t tables and t-1 joins, with j join algorithms and a access methods:

$$\text{Algorithm Space} = a^t \times j^{t-1}$$

For 5 tables with 3 access methods each and 4 join algorithms: $$3^5 \times 4^4 = 243 \times 256 = 62,208 \text{ algorithm combinations}$$

This multiplies with join orderings. A 5-table query might have: $$60 \text{ (join orders)} \times 62,208 \text{ (algorithms)} = 3.7 \text{ million plans}$$

Pruning Through Dominance

Plan Tree Topologies

4.1 Left-Deep Trees

In a left-deep tree, the right input of every join is always a base table. The left input is either a base table (for the first join) or the result of a previous join.

Characteristics:

Pipelining: Results can flow directly from one join to the next
Memory Efficient: Only one intermediate result "in flight"
Good for Indexed Inner: Each join can probe an index on the base table
Search Space: n!/2 orderings (polynomial in the exponent)

4.2 Right-Deep Trees

The mirror image—left input of every join is a base table:

    ⋈
   / \
  T1  ⋈
     / \
    T2  ⋈
       / \
      T3  T4

Characteristics:

Bulk Operations: Good for hash joins with materialized build phases
Parallelism Friendly: All base tables can be scanned concurrently
Memory Intensive: Multiple intermediate results active simultaneously
Search Space: Same as left-deep

4.3 Bushy Trees

Bushy trees allow any structure—intermediate results can join with other intermediate results:

         ⋈
        / \
       ⋈   ⋈
      / \ / \
     T1 T2 T3 T4

Characteristics:

Maximum Flexibility: Can exploit opportunities left-deep misses
Parallelism: Sub-trees can execute concurrently
Complex Optimization: Much larger search space
Memory Pressure: Multiple intermediate results
Search Space: Catalan(n-1) × n!, grows as 4^n/n^1.5

4.4 Why Left-Deep Dominates

Despite bushy trees' flexibility, most commercial optimizers focus on left-deep trees:

Smaller Search Space: n! vs 4^n growth
Pipelining: Memory efficiency and response time
Good Enough: Left-deep plans are often within 10-20% of bushy optimal
Predictability: Easier to estimate cost and memory
Hardware Match: Traditional disk-based systems favor sequential access patterns

However, modern analytical systems with abundant memory and parallel hardware increasingly explore bushy plans.

Converting Mermaid diagram...

Operator Placement Space

Beyond joins, the placement of other operators—selections, projections, aggregations—creates additional search space dimensions.

5.1 Selection (Filter) Placement

Filters can often be applied at multiple points in the plan:

Example Query:

SELECT * FROM orders o JOIN customers c ON o.cust_id = c.id
WHERE c.country = 'USA' AND o.amount > 1000;

Placement Options:

Early (Pushed Down): Filter tables before join
- Filter(customers, country='USA') ⋈ Filter(orders, amount>1000)
Late: Filter after join
- Filter(customers ⋈ orders, country='USA' AND amount>1000)
Mixed: Some early, some late (for complex predicates)

Generally, pushing selections down reduces intermediate result sizes, improving performance. But there are exceptions:

Selective indexes might not be usable after transformation
Complex predicates might be expensive to evaluate repeatedly
Correlation with join conditions might make early filtering suboptimal

5.2 Projection Placement

Projections remove unneeded columns. Like selections, they can be placed at various points:

Early Projection Benefits:

Reduced tuple width → fewer pages → less I/O
Better cache utilization
Smaller sort/hash memory requirements

Early Projection Risks:

May eliminate columns needed for later operations
Must track column dependencies carefully

5.3 Aggregation and Grouping Placement

For queries with GROUP BY, aggregation can sometimes be pushed down:

SELECT dept_id, SUM(salary) 
FROM employees e JOIN departments d ON e.dept_id = d.id
GROUP BY dept_id;

Options:

Join first, then aggregate
Pre-aggregate employees by dept_id, then join
For certain aggregates, post-aggregate to combine pre-aggregates

Pre-aggregation is legal only for decomposable aggregates (SUM, COUNT, MIN, MAX) and can dramatically reduce join input size.

Operator Placement as Optimization

Search Space Constraints and Restrictions

Not all mathematically possible plans are valid. Various constraints restrict the search space:

6.1 Join Graph Constraints

The join graph of a query defines which tables can be joined directly. A join between tables A and B is only possible if there's a join predicate connecting them.

-- Join Graph:
-- Customers ── Orders ── Products
-- (customers.id = orders.cust_id)
-- (orders.prod_id = products.id)

6.2 Outer Join Ordering Constraints

Outer joins (LEFT, RIGHT, FULL) are not freely reorderable like inner joins:

SELECT * FROM A LEFT JOIN B ON ... LEFT JOIN C ON ...;

The order of outer joins affects semantics. A LEFT JOIN B LEFT JOIN C ≠ A LEFT JOIN C LEFT JOIN B. This constrains which orderings are semantically valid.

6.3 Physical Constraints

Index Availability: Index scans only possible where indexes exist
Memory Limits: Hash joins may not fit if table exceeds memory budget
Algorithm Applicability: Sort-merge join requires comparable types
Parallelism Limits: Maximum parallel workers, partitioning requirements

Search Space Reduction Through Constraints
Constraint Type	Example	Space Reduction
No Cartesian Products	Require join predicate connection	Often 90%+ reduction
Outer Join Order	Preserve semantic ordering	Significant for outer-heavy queries
Index Availability	Only indexed paths where indexes exist	Varies by schema
Memory Bounds	Exclude plans exceeding limits	Depends on statistics
Parallel Compatibility	Not all operators parallelize	Moderate reduction

Constraints as Friends

Visualizing the Search Space

Conceptualizing the search space helps developers understand optimization behavior. While the actual space has as many dimensions as there are choices, we can visualize simplified representations:

7.1 Cost Landscape Metaphor

Imagine the search space as a mountainous terrain where:

Each point represents a possible plan
Elevation represents cost (lower is better)
The optimizer tries to find the lowest valley

Landscape Characteristics:

Local Minima: Good-looking plans that aren't globally optimal
Plateaus: Regions of similar-cost plans
Cliffs: Small changes causing dramatic cost differences
Ridges: Paths between good regions

7.2 Neighborhood Structure

Two plans are "neighbors" if one can be reached from the other by a single transformation:

Swapping two adjacent tables in join order
Changing one join's algorithm
Switching one table's access method

The optimizer explores by moving through neighborhoods. Different search strategies (exhaustive, greedy, simulated annealing) navigate differently:

Strategy	Behavior	Strengths	Risks
Exhaustive	Visit every point	Finds global optimum	Infeasible for large spaces
Greedy	Always move downhill	Fast convergence	Trapped in local minima
Hill Climbing	Move to best neighbor	Simple, effective	Local minima traps
Simulated Annealing	Probabilistic uphill moves	Escapes local minima	Slower convergence
Dynamic Programming	Optimal substructure	Polynomial for some dimensions	Requires problem structure

The Cost Function's Role

Implications for Optimizer Design

Understanding the search space informs how optimizers are designed:

8.1 Search Space Structure Dictates Strategy

The mathematical structure of different search dimensions enables different optimization strategies:

Join Ordering (exhibits optimal substructure): → Dynamic programming is effective because the best plan for joining tables {A,B,C} can be built from best plans for {A,B}, {A,C}, {B,C}.

Algorithm Selection (per-operator choice): → Can be decided locally once join order is fixed, often using simple cost comparison.

Operator Placement (rule-based transformations): → Heuristics work well because 'push down selections' is almost always beneficial.

8.2 Practical Search Space Reduction

Real optimizers don't explore the full theoretical space. Reduction techniques include:

Search Space Reduction Techniques

•Left-Deep Only: Ignore bushy trees, reducing 4^n to n! growth
•Interesting Orders: Only track orderings that enable merge joins or ORDER BY
•Join Graph Connectivity: Only consider connected join subsets
•Branch and Bound: Abandon partial plans exceeding current best
•Randomization: Explore random samples for complex queries
•Heuristic Prefiltering: Apply rules before cost-based search

8.3 When the Search Space Defeats the Optimizer

For some queries, even reduced search spaces are too large:

Symptoms:

Optimization takes longer than execution would
Plan quality degrades noticeably
Optimizer gives up and uses heuristics

Causes:

Many-way joins (>10 tables)
Complex predicates with many alternatives
Missing/stale statistics
Cross products in join graph

Responses:

Query decomposition (break into sub-queries)
Plan hints (constrain optimizer choices)
Prepared statements (amortize optimization cost)
Materialized views (precompute complex results)

Know When You're Hitting the Wall

Summary: Navigating the Search Space

Key Takeaways

•The search space is the set of all valid execution plans — It's defined implicitly by the grammar of possible plan structures.
•Multiple dimensions multiply together — Join order × algorithms × access methods creates explosive growth.
•Join ordering dominates complexity — Growing as n! or 4^n, it's the primary optimization challenge.
•Tree topology matters — Left-deep trees offer smaller spaces and pipelining; bushy trees offer flexibility.
•Constraints are beneficial — Join graphs, outer join semantics, and physical limits reduce valid plans.
•Exhaustive search is impossible — Intelligent exploration strategies and pruning are essential.
•Search space structure enables optimization — Dynamic programming works because of optimal substructure.

What's Next:

Page Complete

2 / 5