Loading learning content...
Every SQL query you write can be executed in dozens, hundreds, or even millions of different ways. The query optimizer's job is to find the best way—but what does "best" even mean? How can a database system predict which execution strategy will be fastest without actually running each one?
The answer lies in cost estimation: the mathematical art of predicting how expensive a query execution plan will be before executing it. Cost-based optimizers don't guess or rely solely on rules—they build sophisticated models that estimate the resources each plan will consume, then choose the plan with the lowest predicted cost.
This page explores the foundation of cost-based optimization: how optimizers assign numerical costs to plans, what those costs represent, and why accurate estimation is both critical and extraordinarily challenging.
By the end of this page, you will understand: (1) What cost estimation is and why it matters, (2) The components of cost models used by database optimizers, (3) How I/O costs, CPU costs, and memory costs are calculated, (4) The role of statistics and cardinality estimates in cost formulas, and (5) Why cost estimation is inherently imprecise and how optimizers cope with uncertainty.
Cost estimation is the process of assigning a numerical value to a query execution plan that represents the predicted resources required to execute it. This cost is not measured in real time or actual resource consumption—it's a prediction based on mathematical models and statistical information about the data.
The fundamental problem:
Given a SQL query, there are typically many logically equivalent ways to execute it. Consider a simple join of three tables A, B, and C:
With just three tables, there might be hundreds of possible execution plans. With ten tables, there can be millions. The optimizer cannot execute all of them to find the fastest—it would take longer to find the plan than to just pick one randomly!
The cost estimation solution:
Instead of executing plans, the optimizer estimates how expensive each plan would be if executed:
This works because estimating cost is vastly cheaper than actually executing plans. A cost estimate might take microseconds; executing the plan might take minutes or hours.
Cost units are deliberately abstract. A plan with cost 1000 is not necessarily twice as slow as a plan with cost 500. Cost values are relative measures for comparison, not absolute predictions of execution time. The goal is ordering plans correctly, not predicting elapsed time precisely.
The cost function abstraction:
Formally, a cost model defines a function:
Cost(P) = f(operators in P, cardinalities, physical characteristics)
Where:
The optimizer evaluates this function for each candidate plan and selects the one with minimum cost.
A comprehensive cost model considers multiple resource types. The total cost of a plan is typically a weighted combination of these components:
| Cost Component | What It Measures | Typical Importance |
|---|---|---|
| I/O Cost | Disk reads and writes (pages transferred) | Often dominant for disk-based systems |
| CPU Cost | Tuple processing, comparisons, computations | Significant for complex predicates and functions |
| Memory Cost | Buffer usage, hash table sizes, sort memory | Critical when memory is constrained |
| Network Cost | Data transfer in distributed systems | Dominant in distributed and cloud databases |
| Startup Cost | Initial overhead before first tuple produced | Important for pipelined execution |
The total cost formula:
Most optimizers express total cost as:
Total_Cost = w₁ × IO_Cost + w₂ × CPU_Cost + w₃ × Memory_Cost + ...
Where w₁, w₂, w₃ are weights that reflect the relative costs of each resource. These weights are often configurable parameters:
Example from PostgreSQL:
PostgreSQL uses parameters like:
seq_page_cost = 1.0 (baseline for sequential I/O)random_page_cost = 4.0 (random I/O is 4× more expensive)cpu_tuple_cost = 0.01 (processing each tuple)cpu_operator_cost = 0.0025 (evaluating predicates)These weights can be tuned based on actual hardware characteristics. If you move to SSDs, you might reduce random_page_cost since random reads are nearly as fast as sequential reads on SSDs.
Optimal cost weights depend on hardware. Cloud databases often auto-tune these parameters based on observed performance. On-premise systems may require manual calibration. Incorrect weights lead to suboptimal plan choices—the optimizer might prefer a disk-heavy plan when CPU is the actual bottleneck.
For disk-based database systems, I/O cost is often the dominant factor. Reading and writing pages to disk is orders of magnitude slower than in-memory operations. A well-designed cost model pays careful attention to I/O.
Key I/O concepts:
Basic I/O cost formulas:
| Operation | I/O Cost Formula | Variables |
|---|---|---|
| Full Table Scan | b_R | b_R = number of pages in relation R |
| Index Scan (Clustered) | h + ⌈s × b_R⌉ | h = index height, s = selectivity |
| Index Scan (Unclustered) | h + ⌈s × n_R⌉ | n_R = number of tuples (worst case) |
| Nested Loop Join | b_R + n_R × b_S | Scan R, for each tuple scan S |
| Block Nested Loop | b_R + ⌈b_R / M⌉ × b_S | M = memory pages available |
| Sort-Merge Join | 3 × (b_R + b_S) | Sort both, then merge (approx.) |
| Hash Join | 3 × (b_R + b_S) | Build hash, partition if needed |
Example: Comparing scan vs. index access
Consider a table Orders with:
customer_id with height 3Query: SELECT * FROM Orders WHERE customer_id = 12345
Assume the selectivity is 0.01% (100 matching rows).
Full table scan cost:
Cost = b_R = 50,000 page reads
Clustered index scan cost:
Cost = h + ⌈s × b_R⌉
= 3 + ⌈0.0001 × 50,000⌉
= 3 + 5 = 8 page reads
Unclustered index scan cost (worst case):
Cost = h + ⌈s × n_R⌉
= 3 + ⌈0.0001 × 1,000,000⌉
= 3 + 100 = 103 page reads
The index is clearly better, but notice how a clustered index (8 pages) dramatically outperforms an unclustered index (103 pages) because matching rows are stored together.
Unclustered index costs can vary wildly. If matching rows happen to be on the same pages (good correlation), actual I/O is much lower than the worst case. Many optimizers track "clustering factor" or "correlation" statistics to refine these estimates.
While I/O often dominates, CPU cost becomes increasingly important as:
Components of CPU cost:
CPU cost formulas:
CPU_Cost(Scan R with predicate P) = n_R × (tuple_cost + predicate_cost(P))
CPU_Cost(Hash Join) = n_R × hash_cost + n_S × probe_cost + matched × output_cost
CPU_Cost(Sort) = n_R × log₂(n_R) × comparison_cost
CPU_Cost(Aggregation) = n_R × (grouping_cost + aggregate_update_cost)
Predicate complexity matters:
Not all predicates are equal. Consider these WHERE clauses:
-- Simple: Just an integer comparison
WHERE order_id = 12345
-- Moderate: String comparison with potential collation
WHERE customer_name = 'Smith'
-- Expensive: Pattern matching (may scan entire string)
WHERE customer_name LIKE '%Smith%'
-- Very expensive: Regular expression
WHERE customer_name ~ '^[A-Z][a-z]+\\s[A-Z][a-z]+$'
A sophisticated optimizer assigns different costs to each predicate type. The regex might be 100× more expensive than the integer comparison, significantly affecting which plan is optimal.
Modern in-memory databases like SAP HANA, MemSQL (SingleStore), and VoltDB make CPU cost the primary concern. When all data is in RAM, optimizers must accurately model CPU costs—traditional I/O-centric cost models produce poor plans.
Cost formulas depend critically on cardinality estimates—predictions of how many rows will flow through each operator. These estimates come from statistics collected about the data.
Without statistics, optimization is blind:
Consider estimating the cost of: SELECT * FROM Orders WHERE status = 'pending'
The optimizer must know the data distribution to make correct decisions.
| Statistic Type | Description | Usage in Cost Estimation |
|---|---|---|
| Table Cardinality (n_R) | Total number of rows in table R | Base for all tuple-based costs |
| Table Size (b_R) | Number of pages/blocks in table R | Base for all I/O costs |
| Column Cardinality (V(A,R)) | Number of distinct values in column A | Selectivity under uniformity assumption |
| Min/Max Values | Range of values in a column | Range query selectivity |
| Histograms | Distribution of values in buckets | Non-uniform selectivity estimation |
| Null Fraction | Percentage of NULL values | IS NULL / IS NOT NULL selectivity |
| Correlation | How column order matches physical order | Index scan cost refinement |
| Most Common Values | Frequent values and their frequencies | Accurate estimation for skewed data |
Selectivity estimation:
The selectivity of a predicate is the fraction of rows that satisfy it:
Selectivity(predicate) = |rows satisfying predicate| / |total rows|
Common selectivity formulas (assuming uniform distribution):
Selectivity(A = value) = 1 / V(A,R)
Selectivity(A > value) = (max(A) - value) / (max(A) - min(A))
Selectivity(A BETWEEN low AND high) = (high - low) / (max(A) - min(A))
Selectivity(A IN (v1, v2, ..., vk)) = k / V(A,R)
Combining selectivities (independence assumption):
Selectivity(P1 AND P2) = Selectivity(P1) × Selectivity(P2)
Selectivity(P1 OR P2) = Selectivity(P1) + Selectivity(P2) - Selectivity(P1) × Selectivity(P2)
Selectivity(NOT P) = 1 - Selectivity(P)
These formulas make strong assumptions (uniformity, independence) that are often violated in real data, leading to estimation errors.
Cardinality estimation is the Achilles' heel of query optimization. Studies show estimates can be off by orders of magnitude, especially for complex predicates, joins, and correlated columns. A 10× estimation error can lead to choosing a plan that is 1000× slower than optimal.
Estimating the size of join results is particularly challenging and crucial because join ordering decisions depend heavily on intermediate result sizes.
The fundamental join formula:
For a join R ⋈ S on attribute A:
|R ⋈ S| = |R| × |S| / max(V(A,R), V(A,S))
This assumes:
The error propagation problem:
In multi-way joins, estimation errors compound multiplicatively:
Actual: R ⋈ S ⋈ T ⋈ U
If each join estimate has 2× error:
- After R ⋈ S: 2× off
- After R ⋈ S ⋈ T: 4× off
- After R ⋈ S ⋈ T ⋈ U: 8× off
With 10 tables, even small per-join errors can produce estimates that are millions of times wrong!
Advanced techniques for better join estimation:
Research systems like Naru (learned cardinality estimation) and production features like Oracle's SQL Plan Management use historical execution data to improve estimates. The trend is toward adaptive optimization that learns from actual performance.
Given that cost estimates are inherently imprecise, how do optimizers cope? Several strategies have emerged to handle uncertainty:
1. Robust Plan Selection
Instead of choosing the plan with minimum estimated cost, choose a plan that performs reasonably well across a range of possible cardinalities:
Robust_Plan = argmin_P max_c∈C Cost(P, c)
Where C is the set of possible cardinality scenarios. This "minimax" approach avoids plans that are optimal for the estimate but catastrophic if the estimate is wrong.
2. Plan Bouquets
Pre-compute multiple plans for different cardinality regions. At runtime, take a sample to refine the estimate, then choose the appropriate plan:
If estimated_rows < 100: use index nested loop plan
If estimated_rows in [100, 10000]: use hash join plan
If estimated_rows > 10000: use sort-merge join plan
3. Adaptive Query Execution
Monitor actual cardinalities during execution and switch plans mid-query if estimates prove wrong:
Oracle's Adaptive Query Optimization, SQL Server's Adaptive Joins, and PostgreSQL's parallel query execution all implement forms of runtime adaptation. The future of query optimization increasingly involves runtime feedback, not just static estimation.
Let's walk through a complete cost estimation example to solidify these concepts.
Scenario:
Two tables:
Customers: 100,000 rows, 5,000 pages, index on customer_idOrders: 10,000,000 rows, 500,000 pages, index on customer_idQuery:
SELECT c.name, COUNT(*) as order_count
FROM Customers c
JOIN Orders o ON c.customer_id = o.customer_id
WHERE c.country = 'USA'
GROUP BY c.name;
Statistics:
Plan A: Hash Join with Customer filter first
1. Index scan on Customers.country = 'USA'
- Pages: 3 (index) + 0.25 × 5,000 = 1,253 pages
- Rows output: 25,000 customers
2. Build hash table on filtered customers
- CPU: 25,000 × hash_cost
- Memory: ~25,000 × tuple_size
3. Scan Orders and probe hash table
- Pages: 500,000
- CPU: 10,000,000 × probe_cost
- Matches: 25,000 × 100 = 2,500,000 rows
4. Aggregate by name
- CPU: 2,500,000 × aggregate_cost
Total I/O: 1,253 + 500,000 = 501,253 pages
Plan B: Index Nested Loop Join
1. Scan all Orders
- Pages: 500,000
- Rows: 10,000,000
2. For each order, index lookup on Customers
- Per lookup: 3 pages (index) + 1 page (data)
- Total lookups: 10,000,000
- But with caching, unique customers accessed: 100,000
- Estimated pages: 100,000 × 4 = 400,000 (optimistic with caching)
3. Filter on country = 'USA'
- 25% pass: 2,500,000 rows
4. Aggregate by name
- CPU: 2,500,000 × aggregate_cost
Total I/O: 500,000 + 400,000 = 900,000 pages
Comparison:
| Plan | I/O Cost | Notes |
|---|---|---|
| Plan A (Hash Join) | 501,253 pages | Filter early reduces hash table size |
| Plan B (Index NL) | 900,000 pages | Index lookups expensive despite caching |
Winner: Plan A — the hash join with early filtering is substantially cheaper because it avoids repeated index probes.
If only 0.1% of customers were in USA (250 customers), Plan B with index nested loop might win—a small hash build side and minimal index probes. The optimal plan depends critically on selectivity estimates. This is why accurate statistics matter!
We've explored the foundation of cost-based query optimization. Let's consolidate the key concepts:
What's next:
With cost estimation in place, the optimizer must efficiently compare alternative plans. The next page explores Plan Comparison—how optimizers enumerate candidate plans, prune the search space, and select winners using the cost estimates we've learned to compute.
You now understand the mathematical foundation of cost-based optimization. Cost estimation transforms the abstract problem of finding the best plan into a concrete optimization: minimize a cost function. Next, we'll see how optimizers efficiently search through the vast space of possible plans.