System Design (HLD)Query Optimization

Query Optimization

LevelAdvanced

Duration90 mins

TopicQuery Optimization

1 / 5

Query Execution Plans

The Hidden Engine Behind Every Query

When you execute a SQL query, you're writing a declaration of intent, not a program. You tell the database what data you want—not how to retrieve it. This distinction is SQL's greatest strength and its most profound source of complexity.

Behind every SELECT statement lies a sophisticated engine that transforms your declarative request into an imperative execution strategy. This transformation produces what we call an execution plan—a step-by-step recipe the database follows to retrieve your data.

Understanding execution plans is the single most important skill for optimizing database performance. Without this knowledge, you're optimizing blind—guessing at solutions rather than diagnosing root causes.

What You Will Learn

By the end of this page, you will understand: the complete query processing lifecycle; how the optimizer evaluates alternative plans; the components and operators that make up execution plans; how cost estimation works; and why the same query can perform radically different on identical data. This knowledge forms the foundation for all query optimization work.

The Query Processing Lifecycle

Before a single row is read, every SQL query passes through a multi-stage pipeline. Understanding this pipeline reveals why execution plans exist and how the database makes decisions about your queries.

The Five Stages of Query Processing:

Query Processing Pipeline

•Parsing — The SQL text is tokenized and parsed into an Abstract Syntax Tree (AST). Syntax errors are caught here. The parser validates that the query is grammatically correct SQL but doesn't yet check if tables or columns exist.
•Semantic Analysis (Binding) — The AST is validated against the database catalog. Table names are resolved, column references are verified, data types are checked, and permissions are validated. This produces a logical query tree.
•Query Optimization — The optimizer explores equivalent execution strategies, estimates their costs, and selects the cheapest plan. This is where indexes are considered, join orders are determined, and access methods are chosen.
•Plan Compilation — The chosen plan is compiled into an executable form. Some databases JIT-compile plans to machine code; others interpret a plan tree. The result is cached for reuse.
•Execution — The execution engine runs the compiled plan, reading data pages, applying operators, and streaming results. This is where actual I/O occurs.

The Optimization Phase is Critical

For complex queries, the optimization phase alone can take longer than execution. The optimizer may evaluate thousands or millions of alternative plans before selecting one. This upfront investment pays dividends—a well-chosen plan can be orders of magnitude faster than a poorly-chosen one.

Why This Architecture Matters:

The separation between what (SQL) and how (execution plan) enables the database to adapt. The same SQL query can produce different execution plans based on:

Data distribution: A table with 100 rows vs. 10 million rows requires different strategies
Available indexes: The presence of useful indexes changes optimal access methods
Statistics freshness: Outdated statistics lead to poor cost estimates
Hardware resources: Available memory affects join algorithm selection
Concurrent load: Some databases adapt plans based on current resource pressure

This adaptability is powerful, but it means you cannot assume a query will always execute the same way. The same query on the same table can use different plans as data grows or changes.

Anatomy of an Execution Plan

An execution plan is a tree-structured recipe where each node represents an operator that performs a specific task. Data flows from leaf nodes (which access base tables) up through intermediate nodes (which transform data) to the root node (which produces the final result).

Understanding Plan Structure:

Every non-trivial query plan contains three types of operators:

Operator Categories in Execution Plans
Category	Purpose	Examples	I/O Pattern
Access Operators	Read data from base tables or indexes	Sequential Scan, Index Scan, Index Only Scan, Bitmap Scan	Typically I/O-bound
Join Operators	Combine rows from multiple sources	Nested Loop, Hash Join, Merge Join, Semi-Join	CPU or I/O-bound depending on algorithm
Other Operators	Filter, sort, aggregate, and transform data	Filter, Sort, Aggregate, Limit, Gather, Materialize	CPU or memory-bound

The Tree Execution Model:

Execution plans follow a pull-based iterator model (also called Volcano model). Each operator implements three methods:

Open(): Initialize the operator
Next(): Return the next result row (or indicate completion)
Close(): Release resources

The root operator calls Next() on its child, which recursively calls Next() on its children, propagating down to the leaves. This lazy evaluation means rows flow through the pipeline one at a time, enabling early termination (e.g., LIMIT 1 can stop after finding one row).

Example Plan Tree Structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Query
SELECT c.name, COUNT(o.id) as order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.status = 'active'
GROUP BY c.id, c.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
LIMIT 10;
 
-- Conceptual Plan Tree (simplified)
Limit (10 rows)
  └── Sort (order_count DESC)
        └── Filter (count > 5)
              └── Aggregate (GROUP BY c.id, c.name; COUNT o.id)
                    └── Hash Join (c.id = o.customer_id)
                          ├── Seq Scan on customers (filter: status = 'active')
                          └── Seq Scan on orders

Reading Plans Bottom-Up:

Execution plans are read from the leaves upward because that's how data flows. The innermost (deepest) operators execute first, feeding data to their parents. When analyzing a plan:

Start at the leaf nodes—these are your table access operations
Follow the arrows upward to see how data is transformed
Note the row count estimates at each level—significant drops indicate filtering
Identify the most expensive operators—these are your optimization targets

Width and Depth:

Plan width (how many tables are joined) grows with query complexity. Plan depth increases with the number of operations applied to data. Deep, narrow plans often indicate complex transformations on a single table. Wide, shallow plans typically indicate multi-table joins with minimal post-processing.

Plan Shape Reveals Query Characteristics

A bushy plan (wide at multiple levels) suggests complex multi-way joins. A left-deep plan (each join has one base table input) is common for sequential processing. Right-deep plans are rare but useful for fully pipelined execution. Understanding these shapes helps you quickly categorize queries.

Access Operators Deep Dive

Access operators are the foundation of every execution plan. They determine how the database reads data from storage, and their choice often dominates query performance. Understanding when and why each access method is used is essential for optimization.

Primary Access Methods

•Sequential Scan (Seq Scan / Full Table Scan) — Reads every page of the table from start to finish. Despite its reputation, this is often the best choice for small tables, tables without useful indexes, or queries that need most rows anyway. The I/O is sequential, which modern SSDs and HDDs handle efficiently.
•Index Scan — Traverses an index (typically a B-tree) to find matching rows, then retrieves the corresponding table rows using the row identifiers (TIDs, ROWIDs). Excellent for selective queries but involves random I/O to fetch table pages, which can be slower than sequential reads.
•Index Only Scan — Like an index scan, but retrieves all needed columns directly from the index without touching the base table. This is the fastest access method when applicable, but requires a covering index containing all columns referenced by the query.
•Bitmap Scan — A two-phase scan: first, scan one or more indexes and build a bitmap of matching page locations; second, sequentially scan those pages. This combines index selectivity with sequential I/O patterns, ideal for medium-selectivity queries.

Access Method Selection Criteria
Access Method	Best For	Avoid When	I/O Pattern
Sequential Scan	Full table reads, small tables, non-selective filters	Highly selective queries on large tables	Sequential, predictable
Index Scan	Highly selective queries (<5-10% of rows)	Low selectivity or no useful index	Random (table pages) + sequential (index)
Index Only Scan	Index covers all needed columns	Need columns not in index, visibility issues	Sequential (index only)
Bitmap Scan	Medium selectivity (1-20%), multiple OR conditions	Very high or very low selectivity	Bitmap build + sequential heap scan

The Selectivity Crossover Point:

One of the most important concepts in access method selection is the crossover point—the selectivity threshold where one access method becomes cheaper than another.

For a typical table, index scans beat sequential scans only when retrieving a small fraction of rows. This fraction depends on:

Table size: Larger tables favor indexes for selective queries
Row width: Wide rows (many columns) favor indexes because full scans read more data
Clustering: If table data is physically ordered like the index, index scans are more efficient
Hardware: SSDs reduce the random I/O penalty, raising the crossover point

Common rule of thumb: Index scans typically win for queries selecting less than 5-15% of rows. Beyond that, sequential scans often beat random I/O from index lookups.

Index Scans Aren't Always Better

A common misconception is that index scans are always faster than sequential scans. This is false. For large result sets, the random I/O pattern of index scans can make them 10-100x slower than sequential scans. The optimizer considers this, but stale statistics can lead to wrong choices.

Visibility and Index Only Scans:

In MVCC databases (PostgreSQL, MySQL/InnoDB, Oracle), index only scans face a complication: the index doesn't track row visibility. A row might be in the index but not visible to the current transaction.

PostgreSQL handles this with a visibility map—a bitmap tracking which pages contain only visible-to-all rows. Index only scans check this map; if a page is all-visible, no table access is needed. Otherwise, the database must check the heap.

Implication: Recently modified tables have poor visibility map coverage, making index only scans less effective. Running VACUUM updates the visibility map, improving index only scan efficiency.

Join Operators and Algorithms

Joins are where query execution gets interesting—and where performance problems most often occur. The choice of join algorithm can make the difference between a 10ms query and a 10-minute query. Understanding the three primary join algorithms is essential for interpreting execution plans.

Nested Loop Join

•For each row in the outer table, scan the inner table for matches
•Time complexity: O(N × M) for tables of N and M rows
•Excellent when outer table is small AND inner table has a fast index
•Only algorithm that handles all join types and conditions
•Can leverage index on inner table to avoid full scans

Hash Join

•Build a hash table from the smaller table, probe with the larger
•Time complexity: O(N + M) — linear in the sum of table sizes
•Excellent for large tables where neither has a useful index
•Requires sufficient memory for the hash table
•Only works for equality joins (hash function depends on exact match)

Merge Join (Sort-Merge Join)

•Sort both inputs on the join key, then merge like a zipper
•Time complexity: O(N log N + M log M) for sorting, O(N + M) for merge
•Excellent when inputs are already sorted (from an index or previous operation)
•Memory-efficient—processes rows in streaming fashion
•Produces ordered output, which subsequent operations can leverage
•Works for equality and range joins (unlike hash join)

Join Algorithm Selection Guide
Scenario	Best Algorithm	Why
Small outer, indexed inner	Nested Loop	Index allows O(log N) lookup per outer row
Large tables, no useful indexes	Hash Join	Linear time beats quadratic nested loop
Pre-sorted inputs	Merge Join	Skip sort step, exploit existing order
Memory pressure	Merge Join	Can spill to disk gracefully
Range conditions (e.g., date ranges)	Merge Join	Hash join only supports equality
Very large hash build side	Merge Join	Hash table spilling hurts performance

Join Order Matters Enormously:

For multi-table joins, the order in which tables are joined dramatically affects performance. Consider joining tables A (1M rows), B (10K rows), and C (100 rows):

A ⋈ B ⋈ C: First join produces huge intermediate result
C ⋈ B ⋈ A: First join produces small intermediate result

The optimizer explores different join orders and estimates the intermediate result sizes. With N tables, there are N! possible join orders (for 10 tables: 3.6 million orders). Optimizers use heuristics to prune this space, but for complex queries, they may not find the optimal plan.

Practical Impact: In cases where the optimizer chooses a poor join order, you may need to use hints or rewrite the query to force a better order.

The Build vs Probe Asymmetry

In hash joins, it matters which table is the 'build' side (creates hash table) and which is the 'probe' side. Building on the smaller table minimizes memory usage. If the optimizer picks the wrong side due to cardinality misestimation, performance can degrade severely. Look for 'Hash' nodes in plans and verify the build side is the smaller input.

Cost Estimation Mechanics

The optimizer's job is to find the lowest-cost execution plan. But what exactly is "cost," and how is it calculated? Understanding cost estimation reveals why the optimizer makes certain choices—and why it sometimes makes wrong ones.

The Cost Model:

Database cost models attempt to predict the resources (time, I/O, CPU) a plan will consume. Costs are typically expressed in abstract units that correlate with execution time. The optimizer doesn't try to predict exact wall-clock time—it produces relative costs for comparing alternatives.

Key cost components:

Sequential I/O cost: Reading contiguous disk pages (relatively cheap)
Random I/O cost: Reading scattered disk pages (expensive on HDD, less so on SSD)
CPU cost: Processing rows, evaluating expressions, comparing values
Network cost: In distributed databases, data movement between nodes

PostgreSQL, for example, has configuration parameters like seq_page_cost = 1.0, random_page_cost = 4.0, and cpu_tuple_cost = 0.01 that weight these factors. Tuning these for your hardware (especially random_page_cost on SSDs) significantly affects plan choices.

Statistics: The Foundation of Cost Estimation

•Row counts: How many rows are in each table? This determines join size estimates.
•Column cardinality (n_distinct): How many unique values in each column? Low cardinality means highly selective filters.
•Value distribution histograms: What's the distribution of values? Skewed data requires different estimates than uniform data.
•Correlation: Are column values correlated with physical row order? High correlation means index scans are more efficient.
•Null fraction: What percentage of values are NULL? Affects filter selectivity.
•Most common values (MCV): What are the most frequent values and their frequencies? Enables accurate estimates for common filter values.

Selectivity Estimation:

The optimizer must estimate how many rows will pass each filter (the filter's selectivity). This drives cost calculations throughout the plan.

Common selectivity heuristics:

column = constant: Use MCV if value is common; otherwise, assume 1/n_distinct selectivity
column > constant: Use histogram to estimate what fraction of values satisfy the condition
column LIKE 'prefix%': Similar to range, using prefix bounds from histogram
column1 = column2: Assume 1/max(n_distinct_1, n_distinct_2)
complex_expression: Default to a magic constant (often 0.1% or 1%)

The problem with compound conditions:

For WHERE a = 1 AND b = 2, the optimizer typically assumes independence: selectivity(a=1) × selectivity(b=2). But if a and b are correlated (e.g., city and country), this assumption produces wildly wrong estimates. Some databases address this with multi-column statistics or extended statistics.

Cardinality Estimation Errors Cascade

A 10x error in row count estimate at the bottom of a plan can become a 1000x error at the top because errors compound through joins. If the optimizer thinks a join produces 1,000 rows but it produces 1,000,000, every subsequent operation uses the wrong algorithm or resources. This is the single most common cause of slow queries.

Startup Cost vs. Total Cost:

Execution plans often show two cost numbers: startup cost and total cost.

Startup cost: Resources needed before the operator can return its first row
Total cost: Resources needed to return all rows

This distinction matters for queries with LIMIT. A sort operation has high startup cost (must read all input) but once sorted, returns rows quickly. A sequential scan has low startup cost (can return rows immediately) but continues scanning.

For SELECT ... LIMIT 10, the optimizer weighs startup cost heavily because only the first rows matter.

Plan Caching and Stability

Query optimization is expensive—exploring thousands of alternative plans takes time. To amortize this cost, databases cache execution plans for reuse. Understanding plan caching affects how you design queries and explains some surprising performance behaviors.

Plan Caching Strategies by Database
Database	Caching Strategy	Key Characteristics
PostgreSQL	Prepared statement caching with generic/custom plans	Starts with custom plans per parameter value; switches to generic plan after 5 executions if generic is not worse
MySQL	Query cache (deprecated), prepared statement handles	Query cache stored results, not plans; InnoDB typically re-plans each query
SQL Server	Plan cache with parameterization	Aggressive plan caching; ad-hoc queries auto-parameterized
Oracle	Shared pool cursor caching	Extensive plan reuse with adaptive cursor sharing for varying parameter values

The Generic Plan Problem:

When the optimizer caches a plan without knowing specific parameter values, it must choose a plan that works reasonably for all values. This generic plan may be suboptimal for any specific value.

Example scenario:

PREPARE find_by_status AS SELECT * FROM orders WHERE status = $1;

If status = 'pending' matches 0.1% of rows and status = 'completed' matches 80%, the optimal plan differs:

For 'pending': Index scan is fastest
For 'completed': Sequential scan is fastest

A generic plan can't optimize for both. The database must choose a compromise or use different plans for different parameter values.

Plan Stability Challenges

•Statistics changes: After ANALYZE/VACUUM, statistics update can change the optimal plan overnight
•Data growth: As tables grow, crossover points shift and previously-optimal plans become suboptimal
•Parameter sensitivity: Different parameter values dramatically change optimal plans
•Concurrent activity: Some databases adapt plans based on current memory/CPU availability
•Schema changes: Adding or dropping indexes invalidates cached plans

Diagnosing Plan Instability

If a query is fast sometimes and slow other times, plan instability is often the cause. Compare EXPLAIN output from fast and slow executions. Look for changes in access methods (index vs seq scan), join algorithms, or join order. Use EXPLAIN with actual execution statistics to compare estimated vs actual row counts.

Parallel Query Execution

Modern databases can execute a single query across multiple CPU cores, dividing work among parallel workers. Understanding parallel execution is essential because it changes how plans are structured and interpreted.

How Parallel Execution Works:

In parallel execution, the query is divided among a leader process and multiple worker processes. Each worker processes a portion of the data, and results are gathered by the leader.

Key operators in parallel plans:

Gather / Gather Merge: Collects results from parallel workers into a single stream
Parallel Seq Scan: Workers divide table pages among themselves
Parallel Index Scan: Workers process different portions of the index
Parallel Hash Join: Workers cooperatively build and probe hash tables

Not all operations parallelize:

Some operations are inherently sequential:

Writing results (only the leader can modify data)
Some aggregate finalization
Operations with side effects (functions marked non-parallel-safe)

The query plan must structure parallel and sequential portions appropriately.

Parallel Execution Considerations
Factor	Impact on Parallelism
Query complexity	Simple, scan-heavy queries benefit most; complex expressions may limit parallelism
Table size	Small tables don't justify worker startup overhead
Available workers	Limited pool shared across concurrent queries
Memory	Each worker needs memory for joins, sorts
I/O bandwidth	Parallelism helps CPU-bound work more than I/O-bound work

Reading Parallel Plans

In parallel plans, costs shown below a Gather node are per-worker costs. The actual total work is the per-worker cost times the number of workers plus gathering overhead. Row counts below Gather are also per-worker. Don't multiply by workers—the Gather node's output shows the combined row count.

Summary: Query Execution Plans

Query execution plans are the bridge between your SQL declarations and actual data retrieval. Mastering plan analysis is foundational to all query optimization work. Let's consolidate the essential knowledge:

Key Takeaways

•Execution plans are tree structures — Data flows from leaf operators (table access) through intermediate operators (joins, filters) to the root (result set).
•Access method choice is about selectivity — Index scans win for selective queries; sequential scans win for bulk reads. The crossover point depends on table/hardware characteristics.
•Join algorithm selection is critical — Nested loops for small-large indexed joins, hash joins for large unindexed equi-joins, merge joins for sorted or range-based joins.
•Cost estimation depends on statistics — Stale or missing statistics lead to wrong cost estimates and wrong plan choices. Keep statistics current.
•Cardinality errors cascade through plans — Misestimated row counts at one level compound into severe misestimates at higher levels, causing pathological plan choices.
•Plans can change over time — The same query may execute differently as data grows, statistics refresh, or system load changes. Plan stability requires monitoring.
•Parallel execution adds complexity — Modern databases parallelize queries, but understanding where parallelism applies (and where it doesn't) is essential for performance analysis.

What's Next:

Now that you understand execution plan structure and components, the next page dives into EXPLAIN analysis—the practical skill of reading and interpreting execution plans. You'll learn the specific syntax for major databases, how to extract critical information, and how to identify optimization opportunities from plan output.

Page Complete

You now understand how databases transform SQL queries into execution plans. This conceptual foundation enables you to interpret any execution plan output and understand why the optimizer made specific choices. Next, we'll apply this knowledge through hands-on EXPLAIN analysis.

1 / 5

Loading learning content...

System Design (HLD)Query Optimization

Query Optimization

LevelAdvanced

Duration90 mins

TopicQuery Optimization

1 / 5

Query Execution Plans

The Hidden Engine Behind Every Query

What You Will Learn

The Query Processing Lifecycle

The Five Stages of Query Processing:

Query Processing Pipeline

•Parsing — The SQL text is tokenized and parsed into an Abstract Syntax Tree (AST). Syntax errors are caught here. The parser validates that the query is grammatically correct SQL but doesn't yet check if tables or columns exist.
•Semantic Analysis (Binding) — The AST is validated against the database catalog. Table names are resolved, column references are verified, data types are checked, and permissions are validated. This produces a logical query tree.
•Query Optimization — The optimizer explores equivalent execution strategies, estimates their costs, and selects the cheapest plan. This is where indexes are considered, join orders are determined, and access methods are chosen.
•Plan Compilation — The chosen plan is compiled into an executable form. Some databases JIT-compile plans to machine code; others interpret a plan tree. The result is cached for reuse.
•Execution — The execution engine runs the compiled plan, reading data pages, applying operators, and streaming results. This is where actual I/O occurs.

The Optimization Phase is Critical

Why This Architecture Matters:

The separation between what (SQL) and how (execution plan) enables the database to adapt. The same SQL query can produce different execution plans based on:

Data distribution: A table with 100 rows vs. 10 million rows requires different strategies
Available indexes: The presence of useful indexes changes optimal access methods
Statistics freshness: Outdated statistics lead to poor cost estimates
Hardware resources: Available memory affects join algorithm selection
Concurrent load: Some databases adapt plans based on current resource pressure

This adaptability is powerful, but it means you cannot assume a query will always execute the same way. The same query on the same table can use different plans as data grows or changes.

Anatomy of an Execution Plan

Understanding Plan Structure:

Every non-trivial query plan contains three types of operators:

Operator Categories in Execution Plans
Category	Purpose	Examples	I/O Pattern
Access Operators	Read data from base tables or indexes	Sequential Scan, Index Scan, Index Only Scan, Bitmap Scan	Typically I/O-bound
Join Operators	Combine rows from multiple sources	Nested Loop, Hash Join, Merge Join, Semi-Join	CPU or I/O-bound depending on algorithm
Other Operators	Filter, sort, aggregate, and transform data	Filter, Sort, Aggregate, Limit, Gather, Materialize	CPU or memory-bound

The Tree Execution Model:

Execution plans follow a pull-based iterator model (also called Volcano model). Each operator implements three methods:

Open(): Initialize the operator
Next(): Return the next result row (or indicate completion)
Close(): Release resources

Example Plan Tree Structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Query
SELECT c.name, COUNT(o.id) as order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.status = 'active'
GROUP BY c.id, c.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
LIMIT 10;
 
-- Conceptual Plan Tree (simplified)
Limit (10 rows)
  └── Sort (order_count DESC)
        └── Filter (count > 5)
              └── Aggregate (GROUP BY c.id, c.name; COUNT o.id)
                    └── Hash Join (c.id = o.customer_id)
                          ├── Seq Scan on customers (filter: status = 'active')
                          └── Seq Scan on orders

Reading Plans Bottom-Up:

Execution plans are read from the leaves upward because that's how data flows. The innermost (deepest) operators execute first, feeding data to their parents. When analyzing a plan:

Start at the leaf nodes—these are your table access operations
Follow the arrows upward to see how data is transformed
Note the row count estimates at each level—significant drops indicate filtering
Identify the most expensive operators—these are your optimization targets

Width and Depth:

Plan Shape Reveals Query Characteristics

Access Operators Deep Dive

Primary Access Methods

•Sequential Scan (Seq Scan / Full Table Scan) — Reads every page of the table from start to finish. Despite its reputation, this is often the best choice for small tables, tables without useful indexes, or queries that need most rows anyway. The I/O is sequential, which modern SSDs and HDDs handle efficiently.
•Index Scan — Traverses an index (typically a B-tree) to find matching rows, then retrieves the corresponding table rows using the row identifiers (TIDs, ROWIDs). Excellent for selective queries but involves random I/O to fetch table pages, which can be slower than sequential reads.
•Index Only Scan — Like an index scan, but retrieves all needed columns directly from the index without touching the base table. This is the fastest access method when applicable, but requires a covering index containing all columns referenced by the query.
•Bitmap Scan — A two-phase scan: first, scan one or more indexes and build a bitmap of matching page locations; second, sequentially scan those pages. This combines index selectivity with sequential I/O patterns, ideal for medium-selectivity queries.

Access Method Selection Criteria
Access Method	Best For	Avoid When	I/O Pattern
Sequential Scan	Full table reads, small tables, non-selective filters	Highly selective queries on large tables	Sequential, predictable
Index Scan	Highly selective queries (<5-10% of rows)	Low selectivity or no useful index	Random (table pages) + sequential (index)
Index Only Scan	Index covers all needed columns	Need columns not in index, visibility issues	Sequential (index only)
Bitmap Scan	Medium selectivity (1-20%), multiple OR conditions	Very high or very low selectivity	Bitmap build + sequential heap scan

The Selectivity Crossover Point:

One of the most important concepts in access method selection is the crossover point—the selectivity threshold where one access method becomes cheaper than another.

For a typical table, index scans beat sequential scans only when retrieving a small fraction of rows. This fraction depends on:

Table size: Larger tables favor indexes for selective queries
Row width: Wide rows (many columns) favor indexes because full scans read more data
Clustering: If table data is physically ordered like the index, index scans are more efficient
Hardware: SSDs reduce the random I/O penalty, raising the crossover point

Common rule of thumb: Index scans typically win for queries selecting less than 5-15% of rows. Beyond that, sequential scans often beat random I/O from index lookups.

Index Scans Aren't Always Better

Visibility and Index Only Scans:

Implication: Recently modified tables have poor visibility map coverage, making index only scans less effective. Running VACUUM updates the visibility map, improving index only scan efficiency.

Join Operators and Algorithms

Nested Loop Join

•For each row in the outer table, scan the inner table for matches
•Time complexity: O(N × M) for tables of N and M rows
•Excellent when outer table is small AND inner table has a fast index
•Only algorithm that handles all join types and conditions
•Can leverage index on inner table to avoid full scans

Hash Join

•Build a hash table from the smaller table, probe with the larger
•Time complexity: O(N + M) — linear in the sum of table sizes
•Excellent for large tables where neither has a useful index
•Requires sufficient memory for the hash table
•Only works for equality joins (hash function depends on exact match)

Merge Join (Sort-Merge Join)

•Sort both inputs on the join key, then merge like a zipper
•Time complexity: O(N log N + M log M) for sorting, O(N + M) for merge
•Excellent when inputs are already sorted (from an index or previous operation)
•Memory-efficient—processes rows in streaming fashion
•Produces ordered output, which subsequent operations can leverage
•Works for equality and range joins (unlike hash join)

Join Algorithm Selection Guide
Scenario	Best Algorithm	Why
Small outer, indexed inner	Nested Loop	Index allows O(log N) lookup per outer row
Large tables, no useful indexes	Hash Join	Linear time beats quadratic nested loop
Pre-sorted inputs	Merge Join	Skip sort step, exploit existing order
Memory pressure	Merge Join	Can spill to disk gracefully
Range conditions (e.g., date ranges)	Merge Join	Hash join only supports equality
Very large hash build side	Merge Join	Hash table spilling hurts performance

Join Order Matters Enormously:

For multi-table joins, the order in which tables are joined dramatically affects performance. Consider joining tables A (1M rows), B (10K rows), and C (100 rows):

A ⋈ B ⋈ C: First join produces huge intermediate result
C ⋈ B ⋈ A: First join produces small intermediate result

Practical Impact: In cases where the optimizer chooses a poor join order, you may need to use hints or rewrite the query to force a better order.

The Build vs Probe Asymmetry

Cost Estimation Mechanics

The Cost Model:

Key cost components:

Sequential I/O cost: Reading contiguous disk pages (relatively cheap)
Random I/O cost: Reading scattered disk pages (expensive on HDD, less so on SSD)
CPU cost: Processing rows, evaluating expressions, comparing values
Network cost: In distributed databases, data movement between nodes

Statistics: The Foundation of Cost Estimation

•Row counts: How many rows are in each table? This determines join size estimates.
•Column cardinality (n_distinct): How many unique values in each column? Low cardinality means highly selective filters.
•Value distribution histograms: What's the distribution of values? Skewed data requires different estimates than uniform data.
•Correlation: Are column values correlated with physical row order? High correlation means index scans are more efficient.
•Null fraction: What percentage of values are NULL? Affects filter selectivity.
•Most common values (MCV): What are the most frequent values and their frequencies? Enables accurate estimates for common filter values.

Selectivity Estimation:

The optimizer must estimate how many rows will pass each filter (the filter's selectivity). This drives cost calculations throughout the plan.

Common selectivity heuristics:

column = constant: Use MCV if value is common; otherwise, assume 1/n_distinct selectivity
column > constant: Use histogram to estimate what fraction of values satisfy the condition
column LIKE 'prefix%': Similar to range, using prefix bounds from histogram
column1 = column2: Assume 1/max(n_distinct_1, n_distinct_2)
complex_expression: Default to a magic constant (often 0.1% or 1%)

The problem with compound conditions:

Cardinality Estimation Errors Cascade

Startup Cost vs. Total Cost:

Execution plans often show two cost numbers: startup cost and total cost.

Startup cost: Resources needed before the operator can return its first row
Total cost: Resources needed to return all rows

For SELECT ... LIMIT 10, the optimizer weighs startup cost heavily because only the first rows matter.

Plan Caching and Stability

Plan Caching Strategies by Database
Database	Caching Strategy	Key Characteristics
PostgreSQL	Prepared statement caching with generic/custom plans	Starts with custom plans per parameter value; switches to generic plan after 5 executions if generic is not worse
MySQL	Query cache (deprecated), prepared statement handles	Query cache stored results, not plans; InnoDB typically re-plans each query
SQL Server	Plan cache with parameterization	Aggressive plan caching; ad-hoc queries auto-parameterized
Oracle	Shared pool cursor caching	Extensive plan reuse with adaptive cursor sharing for varying parameter values

The Generic Plan Problem:

When the optimizer caches a plan without knowing specific parameter values, it must choose a plan that works reasonably for all values. This generic plan may be suboptimal for any specific value.

Example scenario:

PREPARE find_by_status AS SELECT * FROM orders WHERE status = $1;

If status = 'pending' matches 0.1% of rows and status = 'completed' matches 80%, the optimal plan differs:

For 'pending': Index scan is fastest
For 'completed': Sequential scan is fastest

A generic plan can't optimize for both. The database must choose a compromise or use different plans for different parameter values.

Plan Stability Challenges

•Statistics changes: After ANALYZE/VACUUM, statistics update can change the optimal plan overnight
•Data growth: As tables grow, crossover points shift and previously-optimal plans become suboptimal
•Parameter sensitivity: Different parameter values dramatically change optimal plans
•Concurrent activity: Some databases adapt plans based on current memory/CPU availability
•Schema changes: Adding or dropping indexes invalidates cached plans

Diagnosing Plan Instability

Parallel Query Execution

How Parallel Execution Works:

In parallel execution, the query is divided among a leader process and multiple worker processes. Each worker processes a portion of the data, and results are gathered by the leader.

Key operators in parallel plans:

Gather / Gather Merge: Collects results from parallel workers into a single stream
Parallel Seq Scan: Workers divide table pages among themselves
Parallel Index Scan: Workers process different portions of the index
Parallel Hash Join: Workers cooperatively build and probe hash tables

Not all operations parallelize:

Some operations are inherently sequential:

Writing results (only the leader can modify data)
Some aggregate finalization
Operations with side effects (functions marked non-parallel-safe)

The query plan must structure parallel and sequential portions appropriately.

Parallel Execution Considerations
Factor	Impact on Parallelism
Query complexity	Simple, scan-heavy queries benefit most; complex expressions may limit parallelism
Table size	Small tables don't justify worker startup overhead
Available workers	Limited pool shared across concurrent queries
Memory	Each worker needs memory for joins, sorts
I/O bandwidth	Parallelism helps CPU-bound work more than I/O-bound work

Reading Parallel Plans

Summary: Query Execution Plans

Key Takeaways

•Execution plans are tree structures — Data flows from leaf operators (table access) through intermediate operators (joins, filters) to the root (result set).
•Access method choice is about selectivity — Index scans win for selective queries; sequential scans win for bulk reads. The crossover point depends on table/hardware characteristics.
•Join algorithm selection is critical — Nested loops for small-large indexed joins, hash joins for large unindexed equi-joins, merge joins for sorted or range-based joins.
•Cost estimation depends on statistics — Stale or missing statistics lead to wrong cost estimates and wrong plan choices. Keep statistics current.
•Cardinality errors cascade through plans — Misestimated row counts at one level compound into severe misestimates at higher levels, causing pathological plan choices.
•Plans can change over time — The same query may execute differently as data grows, statistics refresh, or system load changes. Plan stability requires monitoring.
•Parallel execution adds complexity — Modern databases parallelize queries, but understanding where parallelism applies (and where it doesn't) is essential for performance analysis.

What's Next:

Page Complete

1 / 5