Loading content...
Imagine you're planning a cross-country trip. You could drive, fly, take a train, or some combination. Each option has different costs: money, time, convenience, environmental impact. To make a rational choice, you need to estimate these costs before committing to a route.
Database query optimizers face a similar challenge. For a single query, there might be dozens or hundreds of possible execution strategies. Different join orders, different access methods, different algorithms. The optimizer must choose one strategy before executing the query—it can't try them all and pick the fastest.
To make this choice, optimizers rely on cost estimates—predictions of how expensive each strategy will be. These estimates appear in EXPLAIN output as abstract numbers that confuse many developers. But understanding cost estimation transforms you from someone who reads execution plans to someone who reasons about why plans are chosen and how to influence optimizer decisions.
By the end of this page, you will understand what cost numbers represent, how databases calculate them, the inputs that influence cost (statistics, configuration parameters), why costs can be wrong, and how to use cost information effectively for query optimization.
Let's start by demystifying what cost numbers in execution plans actually represent.
Cost Is NOT Time
The most important thing to understand: cost values are not milliseconds, seconds, or any real time unit. They are abstract, relative measures calibrated to make comparisons possible.
The Fundamental Unit: Sequential Page Reads
Most database cost models are calibrated relative to a single sequential page read from disk. This is assigned a cost of 1.0. All other operations are measured relative to this baseline:
| Operation | Typical Cost Multiplier |
|---|---|
| Sequential page read | 1.0 (baseline) |
| Random page read | 4.0 (random I/O is slower) |
| CPU processing per row | 0.01 |
| CPU processing per operator | 0.0025 |
| Parallel worker communication | variable |
Why Abstract Units?
Real I/O times vary enormously: HDDs vs. SSDs, cached vs. cold data, sequential vs. random access. Rather than predict actual time (which is impossible), cost models predict relative resource consumption. A plan with cost 1000 should be roughly 10x more expensive than a plan with cost 100.
123456789101112131415161718192021
-- EXPLAIN output with cost breakdownEXPLAIN (COSTS ON) SELECT * FROM employees WHERE salary > 50000; -- QUERY PLAN -- ----------------------------------------------------------------- Seq Scan on employees (cost=0.00..289.00 rows=3330 width=64)-- Filter: (salary > 50000) -- Cost breakdown: cost=startup..total-- startup=0.00 : Cost before first row is returned-- total=289.00 : Cost to return all rows -- How 289.00 is calculated (approximately):-- - Table has 145 pages (blocks)-- - seq_page_cost = 1.0 (per page read)-- - cpu_tuple_cost = 0.01 (per row processed) -- - cpu_operator_cost = 0.0025 (per row for filter)-- -- Total ≈ (145 pages × 1.0) + (10000 rows × 0.01) + (10000 rows × 0.0025)-- Total ≈ 145 + 100 + 25 = 270 (approximately)-- (Exact formula varies by version and statistics)Never say 'this query costs 289 milliseconds.' Instead, say 'this query has an estimated cost of 289 units' or compare 'Plan A costs 3x more than Plan B.' Cost is meaningful only for relative comparison, not absolute performance prediction.
Every operation in an execution plan has two cost components shown as (cost=startup..total):
Startup Cost: Work that must be completed before the first row can be returned. This represents blocking operations or initialization overhead.
Total Cost: Work to complete the entire operation and return all rows.
The distinction matters enormously for understanding query behavior:
| Operator | Startup Cost | Why It Matters |
|---|---|---|
| Seq Scan | Near zero | Can return first row immediately as it reads |
| Index Scan | Low | Small overhead to navigate index before first row |
| Sort | High (≈ total) | Must read ALL input before outputting ANY row |
| Hash (for join) | High | Must build entire hash table before probing |
| Hash Join | Moderate | Build phase completes before matches flow |
| Merge Join | Depends | Low if sorted input, high if Sort added |
| Aggregate | Near total | Must process all input to compute aggregates |
12345678910111213141516171819202122
-- Low startup cost: Streaming operationEXPLAIN SELECT * FROM orders WHERE status = 'pending';-- Seq Scan on orders (cost=0.00..1250.00 rows=500 width=80)-- ^^^^-- Startup=0: First row available immediately as scan begins -- High startup cost: Sorting blocks outputEXPLAIN SELECT * FROM orders ORDER BY created_at DESC;-- Sort (cost=9876.89..10126.89 rows=100000 width=80)-- ^^^^^^^-- Startup=9876.89, Total=10126.89-- Must accumulate ~98% of work before ANY output -- High startup cost: Hash build phaseEXPLAIN SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id;-- Hash Join (cost=458.00..2589.00 rows=100000 width=160)-- Hash Cond: (o.customer_id = c.id)-- -> Seq Scan on orders o (cost=0.00..1250.00 ...)-- -> Hash (cost=333.00..333.00 rows=10000 width=80)-- ^^^^^^ ^^^^^^-- Startup = Total for Hash (must build completely)-- -> Seq Scan on customers c (cost=0.00..283.00 ...)Why Startup Cost Matters:
Time to First Row: For interactive queries where users see results progressively, high startup cost means they wait with blank screens.
LIMIT Optimization: If only a few rows are needed, low startup + streaming can be faster than 'cheaper' plans requiring full processing:
SELECT * FROM huge_table ORDER BY indexed_column LIMIT 10
Pipeline Breaking: High startup operators 'break' the data pipeline—nothing flows until they complete. Multiple such operators in sequence multiply delays.
Cursor/Stream Behavior: Applications using cursors to fetch rows incrementally are penalized by high startup costs but benefit from low-startup streaming operations.
When using LIMIT, prefer execution plans with low startup costs even if total cost is higher. A plan that can produce the first N rows without processing all data wins. The optimizer usually handles this, but understanding why helps when it doesn't.
The cost model requires several inputs to calculate estimates. Understanding these inputs explains why costs are sometimes wildly wrong.
Table and Index Statistics:
The optimizer maintains statistics about data distribution:
| Statistic | Description | Impact on Cost |
|---|---|---|
| Row count (reltuples) | Estimated number of rows in table | Fundamental input to all estimates |
| Page count (relpages) | Number of disk pages table occupies | Drives I/O cost calculation |
| Distinct values (n_distinct) | Unique values per column | Affects GROUP BY and join estimates |
| Most common values (MCV) | Frequent values and their frequencies | Improves selective filter estimates |
| Histogram | Distribution of values in buckets | Range query selectivity |
| Correlation | Physical ordering vs. logical ordering | Index scan random vs. sequential I/O |
123456789101112131415161718192021222324252627282930313233
-- View table-level statisticsSELECT relname AS table_name, reltuples AS estimated_rows, relpages AS pages_on_disk, pg_size_pretty(relpages * 8192) AS estimated_sizeFROM pg_class WHERE relname = 'orders'; -- View column-level statisticsSELECT attname AS column_name, n_distinct, most_common_vals, most_common_freqs, histogram_bounds, correlationFROM pg_stats WHERE tablename = 'orders' AND attname = 'status'; -- Sample output:-- column_name | status-- n_distinct | 5 (5 unique values)-- most_common_vals | {pending,completed,shipped,cancelled,refunded}-- most_common_freqs | {0.35,0.30,0.20,0.10,0.05}-- correlation | 0.15 (weak physical-logical ordering) -- Force statistics updateANALYZE orders; -- Increase statistics granularity for a columnALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000;ANALYZE orders;Configuration Parameters Affecting Cost:
Databases have tunable parameters that adjust cost calculations:
PostgreSQL Cost Parameters:
| Parameter | Default | Effect |
|---|---|---|
seq_page_cost | 1.0 | Cost per sequential page read |
random_page_cost | 4.0 | Cost per random page read (index access) |
cpu_tuple_cost | 0.01 | Cost per row processed |
cpu_index_tuple_cost | 0.005 | Cost per index entry processed |
cpu_operator_cost | 0.0025 | Cost per operator evaluation |
effective_cache_size | 4GB typical | Assumed available cache (affects index preference) |
parallel_tuple_cost | 0.1 | Cost of passing tuples to parallel workers |
Tuning Strategy:
These parameters should reflect your hardware reality:
random_page_cost, maybe 1.1-2.0)effective_cache_sizeAdjusting cost parameters without understanding the effects can make optimization worse across your workload. Tune only after profiling and understanding your specific hardware characteristics. Changes affect ALL query planning.
Understanding the mechanics of cost calculation helps explain optimizer behavior and diagnose when estimates go wrong.
Sequential Scan Cost:
Cost = (disk pages × seq_page_cost) + (rows × cpu_tuple_cost)
+ (rows × filter evaluation cost)
Example: Table with 1000 pages, 100,000 rows:
Index Scan Cost:
Startup = index height traversal
Per-row = (index_tuple_cost + random_page_cost * correlation_factor)
+ cpu_tuple_cost for heap fetch
The correlation factor is crucial: highly correlated data (physically ordered like the index) means sequential I/O; low correlation means random I/O.
123456789101112131415161718192021222324252627282930
-- Comparing Seq Scan vs Index Scan costs-- Table: 100,000 rows, 1000 pages -- Scenario: WHERE id = 12345 (returns 1 row)-- Index Scan is chosen:-- Index Scan using pk_table on mytable (cost=0.29..8.31 rows=1 width=80)-- Index Cond: (id = 12345)-- Cost breakdown:-- startup: ~0.29 (index tree traversal)-- per row: ~8.02 (one random page + CPU)-- Very cheap because selectivity is high -- Seq Scan would be:-- (cost=0.00..1500.00 rows=1 width=80)-- Complete table scan to find 1 row = wasteful -- Scenario: WHERE status = 'active' (returns 60,000 rows)-- Seq Scan is chosen:-- Seq Scan on mytable (cost=0.00..1750.00 rows=60000 width=80)-- Filter: (status = 'active')-- Why not index? 60% of rows means:-- - Index scan: 60,000 random page fetches-- - Seq scan: 1000 sequential page reads-- Sequential reading 1000 pages beats 60,000 random fetches -- Viewing optimizer's decision factorsSET enable_seqscan = off; -- Force index considerationEXPLAIN SELECT * FROM mytable WHERE status = 'active';-- Bitmap Heap Scan (cost=685.00..2250.00 rows=60000 width=80)-- Higher cost! Optimizer was right to prefer Seq Scan.| Join Type | Cost Formula (Simplified) | When Optimal |
|---|---|---|
| Nested Loop | outer_rows × (inner_startup + inner_per_row) | Small outer, indexed inner |
| Hash Join | build_cost + probe_cost | Medium tables, equi-join |
| Merge Join | sort_cost (if needed) + merge_cost | Pre-sorted inputs |
The Selectivity Factor:
Selectivity is the estimated fraction of rows passing a filter. It's crucial for cost calculation:
WHERE id = 12345 on unique column → selectivity = 1/total_rowsWHERE status = 'active' → selectivity from MCV or histogramWHERE flag = true (boolean) → often estimated as 50% if no statisticsWhen Selectivity Estimation Fails:
| Situation | Problem | Symptom |
|---|---|---|
| Skewed data | Histogram doesn't capture unusual distribution | Wrong row estimates |
| Correlated columns | (a = 1 AND b = 2) estimated as independent | Underestimate |
| Functions on columns | WHERE UPPER(name) = 'SMITH' | Can't use column stats |
| Parameterized queries | Value unknown at plan time | Generic estimate used |
| Cross-database comparisons | Foreign tables lack local statistics | Wild guessing |
A 10x error in selectivity creates a 10x error in row estimates, which cascades through joins (10x × 10x = 100x error) and affects all subsequent cost calculations. This is why cardinality errors are the primary cause of bad plans.
Cost estimates are predictions, and predictions can be wrong. Understanding failure modes helps you diagnose and fix optimizer mistakes.
Category 1: Stale Statistics
The most common problem. Statistics describe data at the time they were gathered. If data changes significantly afterward:
(region = 'US' AND language = 'English') estimated as independent (0.3 × 0.4 = 0.12) when reality is 0.28.WHERE YEAR(created_at) = 2024 can't use created_at histogram. Optimizer guesses.123456789101112131415161718192021222324252627282930313233
-- Using EXPLAIN ANALYZE to find estimate errorsEXPLAIN ANALYZESELECT * FROM orders oJOIN customers c ON o.customer_id = c.idWHERE c.country = 'Luxembourg'; -- Hash Join (cost=125.00..2589.00 rows=15000 width=160)-- (actual time=0.5..45.2 rows=23 loops=1)-- ^^^^^^^ ^^^^-- ESTIMATED ACTUAL-- -- 15000 vs 23 = 652x overestimate!!-- -- Why? Optimizer estimated Luxembourg has many customers-- but actually it's a tiny country with few customers. -- Check the statisticsSELECT most_common_vals, most_common_freqs FROM pg_stats WHERE tablename = 'customers' AND attname = 'country'; -- Might show: Luxembourg not in most_common_vals-- Optimizer used default selectivity = 1/n_distinct-- If n_distinct=200 countries, estimated 0.5% → wrong -- Solution: Update statistics with more detailALTER TABLE customers ALTER COLUMN country SET STATISTICS 1000;ANALYZE customers; -- Or create extended statistics for correlated columnsCREATE STATISTICS stats_country_lang (dependencies)ON country, language FROM customers;ANALYZE customers;When EXPLAIN ANALYZE shows actual rows differing from estimated by more than 10x, the optimizer is working with significantly wrong information. This is your cue to investigate statistics freshness, missing histograms, or correlated column assumptions.
While cost isn't a time measurement, it's extremely valuable for comparing alternative approaches to the same query.
Effective Comparison Techniques:
1. Compare Alternative Query Formulations:
Write the same logic multiple ways and compare costs:
12345678910111213141516171819202122
-- Approach 1: SubqueryEXPLAIN SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE region = 'EU');-- Cost: 2458.00 -- Approach 2: JOINEXPLAIN SELECT o.* FROM orders oJOIN customers c ON o.customer_id = c.idWHERE c.region = 'EU';-- Cost: 1872.00 ← Lower cost, likely faster -- Approach 3: EXISTSEXPLAIN SELECT * FROM orders oWHERE EXISTS ( SELECT 1 FROM customers c WHERE c.id = o.customer_id AND c.region = 'EU');-- Cost: 1756.00 ← Even lower! -- Verify with ANALYZEEXPLAIN ANALYZE SELECT * FROM orders o ... (each version)-- Actual execution confirms EXISTS is fastest for this data2. Compare With and Without Indexes:
-- Before index
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
-- Seq Scan (cost=0.00..15000.00)
CREATE INDEX idx_customer ON orders(customer_id);
-- After index
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
-- Index Scan (cost=0.29..8.31)
-- Cost improvement: ~1800x
3. Test Configuration Changes:
-- Current: prefer hash joins
EXPLAIN SELECT ... -- Hash Join cost=2500
SET enable_hashjoin = off;
EXPLAIN SELECT ... -- Merge Join cost=3200
-- Confirms hash join is better for this query
RESET enable_hashjoin;
4. Evaluate Materialized View Benefits:
-- Complex aggregation query
EXPLAIN SELECT region, SUM(amount) FROM orders
GROUP BY region;
-- Cost: 45000 (full table scan + aggregation)
-- With materialized view
EXPLAIN SELECT * FROM mv_regional_sales;
-- Cost: 5.00 (just read pre-computed results)
Cost comparison identifies likely winners, but EXPLAIN ANALYZE confirms actual performance. Sometimes lower-cost plans are slower due to caching effects, parameter variations, or cardinality errors. Use cost for screening, actual timing for final decisions.
For expert-level optimization, consider these advanced aspects of cost estimation:
Cost and Parallelism:
Parallel query execution changes cost calculations:
parallel_tuple_cost and parallel_setup_costEXPLAIN SELECT COUNT(*) FROM huge_table;
-- Without parallel: cost=500000
-- With parallel (4 workers): cost=125000 + coordination
-- Wall time: ~4x faster despite similar 'logical' cost
Cost and Caching:
The cost model assumes cold cache (data read from disk). Reality often differs:
Cost Doesn't Include:
| Excluded Factor | Impact |
|---|---|
| Lock wait time | Query may wait for locks (not predictable) |
| Network transfer | Returning large results to client |
| Concurrent workload | Resource contention with other queries |
| Connection overhead | Connecting, authentication |
| Trigger execution | Cascading effects of DML |
| Situation | Trust Level | Reason |
|---|---|---|
| Comparing similar plans | High | Errors affect both similarly |
| Fresh statistics | High | Inputs are accurate |
| Simple queries | High | Few places for errors to cascade |
| Complex multi-join queries | Medium | Errors compound through joins |
| After bulk data changes | Low | Statistics likely stale |
| Parameterized queries | Low | Generic plans may not fit specific parameters |
| Histograms missing | Low | Filter selectivity is guesswork |
| Cross-database joins | Very Low | Foreign statistics unavailable |
1234567891011121314151617181920212223242526272829
-- Check if autovacuum is updating statisticsSELECT relname, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze, n_live_tup, n_dead_tupFROM pg_stat_user_tablesWHERE relname = 'orders'; -- If last_analyze is old + high n_dead_tup → stale stats -- Check histogram coverageSELECT attname, null_frac, n_distinct, most_common_vals IS NOT NULL AS has_mcv, histogram_bounds IS NOT NULL AS has_histogramFROM pg_statsWHERE tablename = 'orders'; -- Columns with has_histogram = false have poor selectivity estimates -- Monitor prepared statement behaviorSELECT * FROM pg_prepared_statements;-- Check if generic_plans are being used (suboptimal for skewed params)Even with perfect statistics, cost is a model—a simplification of reality. Hardware, caching, concurrency, and data distribution all create real-world variations. Use cost as a strong guide, but validate important decisions with actual measurements.
We've developed a comprehensive understanding of cost estimates in execution plans. Here are the essential insights:
random_page_cost should reflect your hardware reality (SSD vs. HDD).What's Next:
With a deep understanding of cost estimates, the final page of this module covers plan comparison—systematic techniques for comparing execution plans, understanding why the optimizer makes specific choices, and influencing plan selection when needed.
You now understand the mechanics of cost estimation—what costs mean, how they're calculated, why they go wrong, and how to use them effectively. This knowledge transforms EXPLAIN output from mysterious numbers into actionable performance insights.