Loading learning content...
When you execute a SQL query, you're writing a declaration of intent, not a program. You tell the database what data you want—not how to retrieve it. This distinction is SQL's greatest strength and its most profound source of complexity.
Behind every SELECT statement lies a sophisticated engine that transforms your declarative request into an imperative execution strategy. This transformation produces what we call an execution plan—a step-by-step recipe the database follows to retrieve your data.
Understanding execution plans is the single most important skill for optimizing database performance. Without this knowledge, you're optimizing blind—guessing at solutions rather than diagnosing root causes.
By the end of this page, you will understand: the complete query processing lifecycle; how the optimizer evaluates alternative plans; the components and operators that make up execution plans; how cost estimation works; and why the same query can perform radically different on identical data. This knowledge forms the foundation for all query optimization work.
Before a single row is read, every SQL query passes through a multi-stage pipeline. Understanding this pipeline reveals why execution plans exist and how the database makes decisions about your queries.
The Five Stages of Query Processing:
For complex queries, the optimization phase alone can take longer than execution. The optimizer may evaluate thousands or millions of alternative plans before selecting one. This upfront investment pays dividends—a well-chosen plan can be orders of magnitude faster than a poorly-chosen one.
Why This Architecture Matters:
The separation between what (SQL) and how (execution plan) enables the database to adapt. The same SQL query can produce different execution plans based on:
This adaptability is powerful, but it means you cannot assume a query will always execute the same way. The same query on the same table can use different plans as data grows or changes.
An execution plan is a tree-structured recipe where each node represents an operator that performs a specific task. Data flows from leaf nodes (which access base tables) up through intermediate nodes (which transform data) to the root node (which produces the final result).
Understanding Plan Structure:
Every non-trivial query plan contains three types of operators:
| Category | Purpose | Examples | I/O Pattern |
|---|---|---|---|
| Access Operators | Read data from base tables or indexes | Sequential Scan, Index Scan, Index Only Scan, Bitmap Scan | Typically I/O-bound |
| Join Operators | Combine rows from multiple sources | Nested Loop, Hash Join, Merge Join, Semi-Join | CPU or I/O-bound depending on algorithm |
| Other Operators | Filter, sort, aggregate, and transform data | Filter, Sort, Aggregate, Limit, Gather, Materialize | CPU or memory-bound |
The Tree Execution Model:
Execution plans follow a pull-based iterator model (also called Volcano model). Each operator implements three methods:
Open(): Initialize the operatorNext(): Return the next result row (or indicate completion)Close(): Release resourcesThe root operator calls Next() on its child, which recursively calls Next() on its children, propagating down to the leaves. This lazy evaluation means rows flow through the pipeline one at a time, enabling early termination (e.g., LIMIT 1 can stop after finding one row).
123456789101112131415161718
-- QuerySELECT c.name, COUNT(o.id) as order_countFROM customers cJOIN orders o ON c.id = o.customer_idWHERE c.status = 'active'GROUP BY c.id, c.nameHAVING COUNT(o.id) > 5ORDER BY order_count DESCLIMIT 10; -- Conceptual Plan Tree (simplified)Limit (10 rows) └── Sort (order_count DESC) └── Filter (count > 5) └── Aggregate (GROUP BY c.id, c.name; COUNT o.id) └── Hash Join (c.id = o.customer_id) ├── Seq Scan on customers (filter: status = 'active') └── Seq Scan on ordersReading Plans Bottom-Up:
Execution plans are read from the leaves upward because that's how data flows. The innermost (deepest) operators execute first, feeding data to their parents. When analyzing a plan:
Width and Depth:
Plan width (how many tables are joined) grows with query complexity. Plan depth increases with the number of operations applied to data. Deep, narrow plans often indicate complex transformations on a single table. Wide, shallow plans typically indicate multi-table joins with minimal post-processing.
A bushy plan (wide at multiple levels) suggests complex multi-way joins. A left-deep plan (each join has one base table input) is common for sequential processing. Right-deep plans are rare but useful for fully pipelined execution. Understanding these shapes helps you quickly categorize queries.
Access operators are the foundation of every execution plan. They determine how the database reads data from storage, and their choice often dominates query performance. Understanding when and why each access method is used is essential for optimization.
| Access Method | Best For | Avoid When | I/O Pattern |
|---|---|---|---|
| Sequential Scan | Full table reads, small tables, non-selective filters | Highly selective queries on large tables | Sequential, predictable |
| Index Scan | Highly selective queries (<5-10% of rows) | Low selectivity or no useful index | Random (table pages) + sequential (index) |
| Index Only Scan | Index covers all needed columns | Need columns not in index, visibility issues | Sequential (index only) |
| Bitmap Scan | Medium selectivity (1-20%), multiple OR conditions | Very high or very low selectivity | Bitmap build + sequential heap scan |
The Selectivity Crossover Point:
One of the most important concepts in access method selection is the crossover point—the selectivity threshold where one access method becomes cheaper than another.
For a typical table, index scans beat sequential scans only when retrieving a small fraction of rows. This fraction depends on:
Common rule of thumb: Index scans typically win for queries selecting less than 5-15% of rows. Beyond that, sequential scans often beat random I/O from index lookups.
A common misconception is that index scans are always faster than sequential scans. This is false. For large result sets, the random I/O pattern of index scans can make them 10-100x slower than sequential scans. The optimizer considers this, but stale statistics can lead to wrong choices.
Visibility and Index Only Scans:
In MVCC databases (PostgreSQL, MySQL/InnoDB, Oracle), index only scans face a complication: the index doesn't track row visibility. A row might be in the index but not visible to the current transaction.
PostgreSQL handles this with a visibility map—a bitmap tracking which pages contain only visible-to-all rows. Index only scans check this map; if a page is all-visible, no table access is needed. Otherwise, the database must check the heap.
Implication: Recently modified tables have poor visibility map coverage, making index only scans less effective. Running VACUUM updates the visibility map, improving index only scan efficiency.
Joins are where query execution gets interesting—and where performance problems most often occur. The choice of join algorithm can make the difference between a 10ms query and a 10-minute query. Understanding the three primary join algorithms is essential for interpreting execution plans.
| Scenario | Best Algorithm | Why |
|---|---|---|
| Small outer, indexed inner | Nested Loop | Index allows O(log N) lookup per outer row |
| Large tables, no useful indexes | Hash Join | Linear time beats quadratic nested loop |
| Pre-sorted inputs | Merge Join | Skip sort step, exploit existing order |
| Memory pressure | Merge Join | Can spill to disk gracefully |
| Range conditions (e.g., date ranges) | Merge Join | Hash join only supports equality |
| Very large hash build side | Merge Join | Hash table spilling hurts performance |
Join Order Matters Enormously:
For multi-table joins, the order in which tables are joined dramatically affects performance. Consider joining tables A (1M rows), B (10K rows), and C (100 rows):
The optimizer explores different join orders and estimates the intermediate result sizes. With N tables, there are N! possible join orders (for 10 tables: 3.6 million orders). Optimizers use heuristics to prune this space, but for complex queries, they may not find the optimal plan.
Practical Impact: In cases where the optimizer chooses a poor join order, you may need to use hints or rewrite the query to force a better order.
In hash joins, it matters which table is the 'build' side (creates hash table) and which is the 'probe' side. Building on the smaller table minimizes memory usage. If the optimizer picks the wrong side due to cardinality misestimation, performance can degrade severely. Look for 'Hash' nodes in plans and verify the build side is the smaller input.
The optimizer's job is to find the lowest-cost execution plan. But what exactly is "cost," and how is it calculated? Understanding cost estimation reveals why the optimizer makes certain choices—and why it sometimes makes wrong ones.
The Cost Model:
Database cost models attempt to predict the resources (time, I/O, CPU) a plan will consume. Costs are typically expressed in abstract units that correlate with execution time. The optimizer doesn't try to predict exact wall-clock time—it produces relative costs for comparing alternatives.
Key cost components:
PostgreSQL, for example, has configuration parameters like seq_page_cost = 1.0, random_page_cost = 4.0, and cpu_tuple_cost = 0.01 that weight these factors. Tuning these for your hardware (especially random_page_cost on SSDs) significantly affects plan choices.
Selectivity Estimation:
The optimizer must estimate how many rows will pass each filter (the filter's selectivity). This drives cost calculations throughout the plan.
Common selectivity heuristics:
column = constant: Use MCV if value is common; otherwise, assume 1/n_distinct selectivitycolumn > constant: Use histogram to estimate what fraction of values satisfy the conditioncolumn LIKE 'prefix%': Similar to range, using prefix bounds from histogramcolumn1 = column2: Assume 1/max(n_distinct_1, n_distinct_2)complex_expression: Default to a magic constant (often 0.1% or 1%)The problem with compound conditions:
For WHERE a = 1 AND b = 2, the optimizer typically assumes independence: selectivity(a=1) × selectivity(b=2). But if a and b are correlated (e.g., city and country), this assumption produces wildly wrong estimates. Some databases address this with multi-column statistics or extended statistics.
A 10x error in row count estimate at the bottom of a plan can become a 1000x error at the top because errors compound through joins. If the optimizer thinks a join produces 1,000 rows but it produces 1,000,000, every subsequent operation uses the wrong algorithm or resources. This is the single most common cause of slow queries.
Startup Cost vs. Total Cost:
Execution plans often show two cost numbers: startup cost and total cost.
This distinction matters for queries with LIMIT. A sort operation has high startup cost (must read all input) but once sorted, returns rows quickly. A sequential scan has low startup cost (can return rows immediately) but continues scanning.
For SELECT ... LIMIT 10, the optimizer weighs startup cost heavily because only the first rows matter.
Query optimization is expensive—exploring thousands of alternative plans takes time. To amortize this cost, databases cache execution plans for reuse. Understanding plan caching affects how you design queries and explains some surprising performance behaviors.
| Database | Caching Strategy | Key Characteristics |
|---|---|---|
| PostgreSQL | Prepared statement caching with generic/custom plans | Starts with custom plans per parameter value; switches to generic plan after 5 executions if generic is not worse |
| MySQL | Query cache (deprecated), prepared statement handles | Query cache stored results, not plans; InnoDB typically re-plans each query |
| SQL Server | Plan cache with parameterization | Aggressive plan caching; ad-hoc queries auto-parameterized |
| Oracle | Shared pool cursor caching | Extensive plan reuse with adaptive cursor sharing for varying parameter values |
The Generic Plan Problem:
When the optimizer caches a plan without knowing specific parameter values, it must choose a plan that works reasonably for all values. This generic plan may be suboptimal for any specific value.
Example scenario:
PREPARE find_by_status AS SELECT * FROM orders WHERE status = $1;
If status = 'pending' matches 0.1% of rows and status = 'completed' matches 80%, the optimal plan differs:
A generic plan can't optimize for both. The database must choose a compromise or use different plans for different parameter values.
If a query is fast sometimes and slow other times, plan instability is often the cause. Compare EXPLAIN output from fast and slow executions. Look for changes in access methods (index vs seq scan), join algorithms, or join order. Use EXPLAIN with actual execution statistics to compare estimated vs actual row counts.
Modern databases can execute a single query across multiple CPU cores, dividing work among parallel workers. Understanding parallel execution is essential because it changes how plans are structured and interpreted.
How Parallel Execution Works:
In parallel execution, the query is divided among a leader process and multiple worker processes. Each worker processes a portion of the data, and results are gathered by the leader.
Key operators in parallel plans:
Not all operations parallelize:
Some operations are inherently sequential:
The query plan must structure parallel and sequential portions appropriately.
| Factor | Impact on Parallelism |
|---|---|
| Query complexity | Simple, scan-heavy queries benefit most; complex expressions may limit parallelism |
| Table size | Small tables don't justify worker startup overhead |
| Available workers | Limited pool shared across concurrent queries |
| Memory | Each worker needs memory for joins, sorts |
| I/O bandwidth | Parallelism helps CPU-bound work more than I/O-bound work |
In parallel plans, costs shown below a Gather node are per-worker costs. The actual total work is the per-worker cost times the number of workers plus gathering overhead. Row counts below Gather are also per-worker. Don't multiply by workers—the Gather node's output shows the combined row count.
Query execution plans are the bridge between your SQL declarations and actual data retrieval. Mastering plan analysis is foundational to all query optimization work. Let's consolidate the essential knowledge:
What's Next:
Now that you understand execution plan structure and components, the next page dives into EXPLAIN analysis—the practical skill of reading and interpreting execution plans. You'll learn the specific syntax for major databases, how to extract critical information, and how to identify optimization opportunities from plan output.
You now understand how databases transform SQL queries into execution plans. This conceptual foundation enables you to interpret any execution plan output and understand why the optimizer made specific choices. Next, we'll apply this knowledge through hands-on EXPLAIN analysis.