Loading learning content...
Throughout this module, we have examined the spectrum of selection implementation strategies: linear scans that read everything, index scans that access specific tuples, bitmap operations that combine multiple access paths, and specialized handling for conjunctive and disjunctive predicates. But how does the database choose among these options?
The answer lies in cost-based optimization—the systematic comparison of estimated costs for each viable access method. The optimizer evaluates all applicable strategies, estimates their resource consumption, and selects the one with the lowest total cost.
This page synthesizes our understanding of individual access methods into a unified framework for cost comparison. You will understand the complete picture: how costs are estimated, what factors influence the decision, why the optimizer sometimes makes suboptimal choices, and how to diagnose and correct these situations.
By the end of this page, you will master the unified cost model for selection operations, understand the decision framework that optimizers use, learn to interpret and analyze query execution plans, and develop skills for diagnosing and resolving access path problems in production systems.
All selection access methods share a common cost structure composed of I/O cost and CPU cost. Understanding this unified model enables direct comparison across different strategies.
The Fundamental Cost Equation:
Total Cost = I/O Cost + CPU Cost
Where:
The relative weights of these components are system-configurable parameters that reflect hardware characteristics.
| Parameter | Default Value | Meaning | Typical Tuning |
|---|---|---|---|
| seq_page_cost | 1.0 | Cost of sequential page read | Baseline unit |
| random_page_cost | 4.0 | Cost of random page read | 1.0-1.5 for SSD, 4.0 for HDD |
| cpu_tuple_cost | 0.01 | Cost to process one tuple | Rarely changed |
| cpu_index_tuple_cost | 0.005 | Cost to process index entry | Rarely changed |
| cpu_operator_cost | 0.0025 | Cost per operator evaluation | Rarely changed |
| parallel_tuple_cost | 0.1 | Inter-process tuple transfer | Depends on parallelism |
123456789101112131415161718192021222324252627282930313233
-- Cost Formula for Each Access Method -- Sequential Scan:-- I/O: B pages × seq_page_cost (all pages read sequentially)-- CPU: T tuples × cpu_tuple_cost + T × cpu_operator_cost (predicate)-- Total_seq = B × 1.0 + T × (0.01 + 0.0025) -- Index Scan (standard):-- I/O: H pages (tree traversal) × random_page_cost-- + L pages (leaf scan) × seq_page_cost -- + M pages (heap access) × random_page_cost-- CPU: Matching entries × (cpu_index_tuple_cost + cpu_tuple_cost)-- Total_idx = H × 4.0 + L × 1.0 + M × 4.0 + ... -- Index-Only Scan:-- I/O: H pages × random_page_cost + L pages × seq_page_cost-- CPU: Matching entries × cpu_index_tuple_cost-- (No heap access!)-- Total_idx_only = H × 4.0 + L × 1.0 + ... -- Bitmap Index Scan:-- I/O: (Index pages × random_page_cost) + (Heap pages × seq_page_cost)-- CPU: Building bitmap + recheck operations-- Total_bitmap = Index_IO + Heap_IO + bitmap_overhead -- Example calculation for 10,000 page table, 100 matching rows:-- Sequential: 10,000 × 1.0 = 10,000-- Index: 4 × 4.0 + 100 × 4.0 = 16 + 400 = 416-- Ratio: 10,000 / 416 = 24× cheaper to use index SHOW seq_page_cost;SHOW random_page_cost;SHOW cpu_tuple_cost;Key Cost Factors by Access Method:
| Access Method | Primary I/O Cost | Primary CPU Cost | Critical Variable |
|---|---|---|---|
| Sequential Scan | All pages (sequential) | All tuples + predicates | Table size (B) |
| Index Scan | Random heap access | Matching tuples | Selectivity × random_page_cost |
| Index-Only Scan | Index leaf pages | Index entries | Index size, visibility |
| Bitmap Index Scan | Heap pages (sequential) | Bitmap ops + recheck | Union/intersection size |
| Bitmap AND | Intersection pages | Both index scans | Intersection selectivity |
| Bitmap OR | Union pages | All index scans | Union selectivity |
The ratio of random_page_cost to seq_page_cost (default 4:1) is the most critical parameter influencing access method selection. On modern SSDs, random I/O is only 3-5× slower than sequential (not 40-100× like HDDs), so SSD systems should reduce random_page_cost to 1.1-1.5 for accurate cost estimation.
Selectivity—the fraction of tuples satisfying a predicate—is the single most important factor in access method selection. It determines how many index entries must be processed, how many heap pages must be accessed, and ultimately whether index-based access is worthwhile.
The Break-Even Selectivity:
There exists a selectivity threshold below which index access is cheaper and above which sequential scan is cheaper. This break-even point depends on table characteristics and cost parameters.
123456789101112131415161718192021222324252627282930313233
-- Break-Even Selectivity Analysis -- For index scan to be cheaper than sequential scan:-- Index_Cost < Sequential_Cost -- Sequential: B × seq_page_cost = B × 1.0 = B-- Index: H × random_page_cost + s×T × random_page_cost-- = 4H + 4×s×T (assuming each tuple on different page) -- For index to win: 4H + 4×s×T < B -- Simplified (ignoring tree traversal, H ≈ 4):-- 4×s×T < B-- s < B / (4×T) -- For table with T = 1,000,000 tuples, B = 10,000 pages:-- s < 10,000 / (4 × 1,000,000) = 0.0025 = 0.25% -- Interpretation: Index scan wins only if < 0.25% of rows match-- This is quite selective! (2,500 rows out of 1 million) -- With clustered data (correlation = 0.9):-- Heap access approaches sequential, reducing random I/O penalty-- Effective random_page_cost closer to 1.5-- s < 10,000 / (1.5 × 1,000,000) = 0.0067 = 0.67% -- With SSD (random_page_cost = 1.1):-- s < 10,000 / (1.1 × 1,000,000) = 0.0091 = 0.91% -- Real-world break-even typically 5-15% depending on:-- - Correlation (clustering factor)-- - Hardware (SSD vs HDD)-- - Index type (B-tree depth, covering potential)Selectivity Estimation Methods:
The optimizer must estimate selectivity to calculate expected costs. Several techniques are employed:
123456789101112131415161718192021222324252627282930313233343536373839404142
-- Viewing Selectivity Statistics in PostgreSQL -- Table-level statistics:SELECT relname, reltuples AS estimated_rows, relpages AS pages, reltuples / NULLIF(relpages, 0) AS rows_per_pageFROM pg_classWHERE relname = 'orders'; -- Column-level statistics:SELECT attname AS column_name, n_distinct, -- Distinct values (negative = fraction) null_frac, -- NULL fraction avg_width, -- Average bytes per value correlation -- Physical ordering correlationFROM pg_statsWHERE tablename = 'orders' AND attname = 'customer_id'; -- MCV (Most Common Values):SELECT attname, most_common_vals, -- The values most_common_freqs -- Their frequenciesFROM pg_statsWHERE tablename = 'orders' AND attname = 'status'; -- Histogram buckets:SELECT attname, histogram_boundsFROM pg_statsWHERE tablename = 'orders' AND attname = 'order_date'; -- Force statistics refresh:ANALYZE orders; -- Increase statistics precision (1-10000):ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000;ANALYZE orders;Selectivity estimation is only as good as the underlying statistics. After bulk loads, deletes, or significant data changes, statistics become stale and estimates diverge from reality. Regular ANALYZE/UPDATE STATISTICS is essential for accurate cost-based optimization.
The query optimizer follows a systematic process to select the best access method for each selection operation. Understanding this process reveals why certain choices are made and how to influence them.
Step 1: Enumerate Viable Access Paths
For each selection, the optimizer identifies all applicable access methods:
123456789101112131415161718192021222324252627282930313233343536
-- Access Path Enumeration Example -- Query:SELECT * FROM orders WHERE customer_id = 100 AND order_date > '2024-01-01'; -- Available indexes on orders:-- idx_customer (customer_id)-- idx_date (order_date)-- idx_cust_date (customer_id, order_date) -- Optimizer enumerates viable paths: -- Path 1: Sequential Scan-- + Applicable to any query-- Cost: B × seq_page_cost + T × predicate_cost -- Path 2: Index Scan on idx_customer-- + Predicate includes customer_id equality-- Cost: Index_traversal + Matching_entries × heap_access-- + Filter(order_date) CPU cost -- Path 3: Index Scan on idx_date-- + Predicate includes order_date range-- Cost: Index_traversal + Matching_entries × heap_access-- + Filter(customer_id) CPU cost -- Path 4: Index Scan on idx_cust_date-- + Both predicates covered by composite index-- Cost: Index_traversal + Matching_entries × heap_access-- (no additional filter needed) -- Path 5: Bitmap AND (idx_customer AND idx_date)-- + Combine bitmaps from both indexes-- Cost: Two index scans + bitmap ops + intersection heap scan -- Optimizer calculates cost for each and picks minimumStep 2: Estimate Selectivity and Cardinality
For each path, estimate how many rows will be accessed at each stage:
12345678910111213141516171819202122232425262728
-- Cardinality Estimation -- Given:-- Total rows: 10,000,000-- customer_id = 100: n_distinct = 100,000 → s1 = 1/100,000 = 0.001%-- order_date > '2024-01-01': histogram → s2 = 15% -- Path 2 (Index on customer_id):-- Index entries matching: 10M × 0.001% = 100 rows-- After filter (order_date): 100 × 15% = 15 rows-- Heap pages accessed: ~100 (each customer order on different page) -- Path 3 (Index on order_date):-- Index entries matching: 10M × 15% = 1,500,000 rows-- After filter (customer_id): 1.5M × 0.001% = 15 rows-- Heap pages accessed: ~500,000 (range matches many pages) -- Path 4 (Composite index):-- Index entries fully matching: 10M × 0.001% × 15% = 15 rows-- Heap pages accessed: ~15-- → CLEARLY THE WINNER -- Path 5 (Bitmap AND):-- Bitmap1: 100 rows → ~50 pages-- Bitmap2: 1.5M rows → ~300,000 pages-- Intersection: ~10 pages-- Heap pages accessed: ~10-- High overhead for small gain vs Path 4Step 3: Calculate and Compare Costs
Apply cost formulas to each path and select the minimum:
| Access Path | Index I/O | Heap I/O | Total Cost | Winner? |
|---|---|---|---|---|
| Sequential Scan | 0 | 100,000 pages | 100,000 | No |
| idx_customer | 4 pages | 100 × 4.0 = 400 | 404 | Maybe |
| idx_date | 50 pages | 500,000 × 4.0 | 2,000,050 | No |
| idx_cust_date | 4 pages | 15 × 4.0 = 60 | 64 | YES |
| Bitmap AND | 54 pages | 10 × 1.0 = 10 | 64 + overhead | Close |
In this example, the composite index (idx_cust_date) wins decisively because it evaluates BOTH predicates during index traversal, minimizing heap access to exactly the matching rows. This illustrates why composite index design is critical for multi-predicate queries.
Execution plan analysis is the primary tool for understanding and validating optimizer decisions. All major databases provide EXPLAIN functionality that reveals the chosen access path and its estimated costs.
Reading EXPLAIN Output:
1234567891011121314151617181920212223242526272829303132333435
-- EXPLAIN Output Analysis -- Basic EXPLAIN (estimates only):EXPLAIN SELECT * FROM orders WHERE customer_id = 100; -- Output:-- Index Scan using idx_customer on orders (cost=0.43..8.45 rows=1 width=164)-- Index Cond: (customer_id = 100) -- Breaking down the output:-- "Index Scan using idx_customer" → Access method chosen-- "cost=0.43..8.45" → Startup cost..Total cost (arbitrary units)-- "rows=1" → Estimated output rows-- "width=164" → Estimated bytes per output row-- "Index Cond" → Predicate applied via index -- EXPLAIN ANALYZE (actual execution):EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE customer_id = 100; -- Output:-- Index Scan using idx_customer on orders-- (cost=0.43..8.45 rows=1 width=164)-- (actual time=0.025..0.027 rows=1 loops=1)-- Index Cond: (customer_id = 100)-- Buffers: shared hit=4 -- Additional fields:-- "actual time=0.025..0.027" → Time to first row..Time to all rows (ms)-- "rows=1" → ACTUAL rows returned (compare to estimated!)-- "loops=1" → Number of times this node executed-- "Buffers: shared hit=4" → Pages read from buffer pool -- Comparing estimated vs actual:-- Estimated rows: 1, Actual rows: 1 → Good estimate!-- Cost: 8.45, Actual time: 0.027ms → Very fastKey Metrics to Examine:
1234567891011121314151617181920212223242526272829303132333435363738
-- Common EXPLAIN Patterns and Interpretation -- Pattern 1: Filter with high removal (inefficient index use)-- Index Scan using idx_status on orders (cost=0.42..1542.00 rows=10 width=164)-- (actual time=0.025..156.234 rows=10 loops=1)-- Index Cond: (status = 'active')-- Filter: (amount > 1000)-- Rows Removed by Filter: 49990-- -- Problem: Index returns 50,000 rows, but only 10 match the full predicate.-- Consider: Composite index on (status, amount) or different index. -- Pattern 2: Sequential scan on large table (missing index)-- Seq Scan on customers (cost=0.00..458236.00 rows=5 width=87)-- (actual time=5234.123..5234.789 rows=5 loops=1)-- Filter: (email = 'user@example.com')-- Rows Removed by Filter: 9999995---- Problem: Scanning 10M rows to find 5. -- Solution: CREATE INDEX idx_email ON customers(email); -- Pattern 3: Bitmap becoming lossy (memory pressure)-- Bitmap Heap Scan on orders (cost=456.78..12345.00 rows=50000 width=164)-- Heap Blocks: exact=1234 lossy=5678-- Recheck Cond: (region = 'US')---- Observation: 5678 lossy blocks = bitmaps too large for work_mem-- Consider: Increase work_mem or accept lossy scan overhead -- Pattern 4: Parallel scan efficiency-- Gather (cost=1000.00..15000.00 rows=100000 width=164)-- Workers Planned: 4-- Workers Launched: 4-- -> Parallel Seq Scan on large_table (...)-- Actual time=123..456 rows=25000 loops=4 (each worker)---- Good: 4 workers, each processing ~25% of work-- Check: (time × loops) should be less than single-threaded timeAn estimated row count differing from actual by more than 10× is a strong signal that the optimizer is working with bad information. This commonly leads to wrong join orders, wrong access methods, and poor memory allocation. Investigate statistics for the involved columns.
Even with cost-based optimization, various factors can lead to suboptimal access path selection. Recognizing these patterns enables quick diagnosis and resolution.
Mistake 1: Stale Statistics
123456789101112131415161718192021222324
-- Mistake 1: Stale Statistics -- Symptom: Estimated rows wildly differ from actualEXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending';-- Estimated rows: 100, Actual rows: 50,000-- 500× error! -- Cause: Table was bulk-loaded since last ANALYZE -- Fix: Update statisticsANALYZE orders; -- For critical tables, increase statistics target:ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;ANALYZE orders; -- PostgreSQL: Track autovacuum statistics freshnessSELECT schemaname, relname, last_analyze, last_autoanalyze, n_live_tup, n_dead_tup, n_mod_since_analyzeFROM pg_stat_user_tablesWHERE n_mod_since_analyze > 0.1 * n_live_tup; -- >10% changedMistake 2: Wrong Cost Parameters
1234567891011121314151617181920212223
-- Mistake 2: Wrong Cost Parameters (HDD settings on SSD) -- Symptom: Optimizer avoids indexes that would be beneficial-- Default random_page_cost = 4.0 is for spinning disk-- On SSD, random I/O is only ~1.1-1.5× slower than sequential -- Check current settings:SHOW random_page_cost;SHOW seq_page_cost; -- For SSD storage, reduce random_page_cost:SET random_page_cost = 1.1; -- Verify the change affects plan selection:EXPLAIN SELECT * FROM orders WHERE customer_id = 100;-- Should now prefer index scan more often -- Make permanent in postgresql.conf:-- random_page_cost = 1.1 -- For in-memory tables (buffer pool can hold entire table):-- Consider even lower values:-- effective_cache_size = <large value matching RAM>Mistake 3: Correlated Predicates Without Extended Statistics
123456789101112131415161718192021
-- Mistake 3: Correlated Predicates -- Query:SELECT * FROM addresses WHERE city = 'San Francisco' AND state = 'CA'; -- Estimated: city selectivity × state selectivity = 0.001 × 0.05 = 0.00005-- Actual: All San Francisco addresses ARE in CA, so actual = 0.001 -- The independence assumption fails because city determines state -- Fix: Create extended statistics for correlated columns-- PostgreSQL 10+:CREATE STATISTICS addr_city_state (dependencies) ON city, state FROM addresses;ANALYZE addresses; -- Now optimizer knows: P(state='CA' | city='San Francisco') = 1.0-- Estimate becomes accurate -- Check extended statistics:SELECT * FROM pg_statistic_ext WHERE stxname = 'addr_city_state';| Mistake | Symptom | Diagnosis | Fix |
|---|---|---|---|
| Stale statistics | Estimated ≠ Actual rows | Check last_analyze, n_mod_since_analyze | ANALYZE table |
| Wrong cost params | Prefers scan over index | Compare costs on known-good queries | Adjust random_page_cost |
| Predicate correlation | Underestimates conjunction | Multiply selectivities manually | Extended statistics |
| Function in predicate | Full scans | Function prevents index use | Expression index |
| Type mismatch | Index not used | Implicit cast prevents index | Cast explicitly or fix schema |
While most databases support hints to force specific access methods (INDEX, FULL, etc.), hints are maintenance burdens that bypass the optimizer's adaptability. Always prefer fixing the underlying cause (statistics, indexes, cost parameters) over applying hints. Use hints only as a last resort for known optimizer bugs.
Modern hardware evolution has dramatically shifted the balance between access methods. Understanding hardware characteristics enables both accurate cost parameter tuning and informed access method expectations.
Storage Technology Impact:
| Characteristic | HDD (Spinning) | SATA SSD | NVMe SSD | Impact on Selection |
|---|---|---|---|---|
| Sequential read (MB/s) | 100-200 | 400-550 | 3,000-7,000 | High seq throughput favors scans on all |
| Random read (IOPS) | 100-200 | 50,000-100,000 | 500,000-1,000,000 | Higher random IOPS favor index access |
| Random/Sequential ratio | 1:100+ | 1:10 | 1:3 | Lower ratio → prefer random access (indexes) |
| Recommended random_page_cost | 4.0 | 1.5-2.0 | 1.0-1.2 | Lower = more index-friendly |
123456789101112131415161718192021222324
-- Hardware-Aware Cost Parameter Tuning -- For NVMe SSD storage:SET random_page_cost = 1.1;SET seq_page_cost = 1.0; -- For large RAM (data mostly cached):SET effective_cache_size = '32GB'; -- Or your actual RAMSET random_page_cost = 1.0; -- Buffer pool hits are ~equal speed -- For parallel processing capability:SET max_parallel_workers_per_gather = 4;SET parallel_tuple_cost = 0.01; -- Force re-evaluation after tuning:EXPLAIN ANALYZE <your_query>;-- Compare old vs new execution time -- Example: Query benefiting from SSD tuning-- Before (random_page_cost = 4.0):-- Seq Scan (prefers sequential due to high random cost) -- After (random_page_cost = 1.1):-- Index Scan (now recognizes random I/O is cheap)Memory and Buffer Pool Impact:
When data resides in the buffer pool, I/O costs effectively become CPU costs. The optimizer considers cache effectiveness through the effective_cache_size parameter:
Benchmark your actual storage to determine accurate cost ratios. Use tools like fio or sysbench to measure sequential vs. random IOPS, then set random_page_cost based on the measured ratio. This one-time calibration dramatically improves optimizer accuracy.
With comprehensive understanding of cost factors, let us synthesize a practical decision framework for evaluating selection access methods. This framework guides both query analysis and index design.
The Access Method Decision Flowchart:
┌─────────────────────┐ │ Selection Query │ │ with Predicate(s) │ └──────────┬─────────┘ │ ┌──────────▼─────────┐ │ Estimate Combined │ │ Selectivity │ └──────────┬─────────┘ │ ┌────────────────┼────────────────┐ │ │ │ ┌─────────▼──────┐ ┌──────▼──────┐ ┌───────▼───────┐ │ Selectivity │ │ Selectivity │ │ Selectivity │ │ > 30-50% │ │ 5-30% │ │ < 5% │ └───────┬────────┘ └──────┬──────┘ └───────┬───────┘ │ │ │ ┌───────▼────────┐ │ ┌───────▼───────┐ │ Sequential │ │ │ Index Exists │ │ Scan Likely │ │ │ Covering │ │ Best Choice │ │ │ Predicate? │ └────────────────┘ │ └───────┬───────┘ │ Yes ─┬─ No │ │ ┌────────▼───────┐ ┌─────▼─────┐ │ Multi-Predicate│ │ Sequential│ │ Bitmap OR/AND │ │ Scan or │ │ May Be Best │ │Create Idx │ └────────────────┘ └───────────┘ Legend:- High selectivity (>30%): Sequential scan (read everything anyway)- Medium selectivity (5-30%): Bitmap may win (sorted heap access)- Low selectivity (<5%): Index scan wins (surgical access)- Always consider: correlation, covering indexes, hardwareQuick Decision Rules:
These rules of thumb match optimizer behavior under reasonable conditions. Always verify your understanding with EXPLAIN ANALYZE. When the optimizer disagrees with your expectation, investigate—either you're missing something, or the optimizer is working with bad information.
We have synthesized the complete picture of selection access method cost comparison, from unified cost models through optimizer decision processes to practical diagnosis techniques. Let us consolidate the essential knowledge:
Module Completion:
With cost comparison mastered, you have completed the Selection Implementation module. You now possess comprehensive understanding of how databases implement selection operations—from fundamental linear scans through sophisticated index strategies to complex predicate handling—and how the optimizer chooses among these methods.
These skills enable you to write efficient queries, design effective indexes, diagnose performance problems, and understand database behavior at a deep level. As you proceed to study join algorithms and other physical operators, this foundation in selection implementation will prove invaluable.
You have achieved expert-level mastery of selection implementation—the fundamental building block of query execution. From linear scans through index strategies to cost-based optimization, you now understand how databases filter data efficiently and why certain choices are made. This knowledge is essential for query tuning, index design, and understanding overall database performance.