Loading learning content...
When a doctor reads an MRI scan, they recognize specific anatomical structures—the hippocampus, the corpus callosum, the prefrontal cortex. Each structure has a name, a function, and associations with specific conditions. Without this vocabulary, the scan is meaningless noise.
Execution plan operators are the anatomical vocabulary of database performance. When you see "Hash Join" or "Index Scan" or "Sort" in a plan, each term should immediately evoke:
This page is your comprehensive reference for plan operators. We'll cover access methods (how data is retrieved), join algorithms (how tables are combined), and processing operators (how data is transformed). Master this vocabulary, and execution plans become instantly readable.
By the end of this page, you will understand every major operator type you'll encounter in execution plans. For each operator, you'll learn what it does, when it's optimal, when it's a red flag, and how to respond when you see it in a problematic query.
Data access operators are the leaf nodes of execution plans—they're responsible for actually retrieving data from storage. The choice of access method dramatically impacts query performance.
Sequential Scan (Seq Scan / Table Scan / Full Table Scan)
A sequential scan reads every row in a table from beginning to end, in the physical order they're stored on disk.
Mechanics:
When It's Appropriate:
When It's a Red Flag:
123456789101112131415161718
-- Seq Scan is optimal here: selecting most of the tableEXPLAIN SELECT * FROM employees WHERE salary > 0;-- Seq Scan on employees (cost=0.00..289.00 rows=9900 width=64)-- Filter: (salary > 0)-- Nearly all rows qualify, index wouldn't help -- Seq Scan is PROBLEMATIC here: highly selective filterEXPLAIN SELECT * FROM orders WHERE order_id = 12345;-- Seq Scan on orders (cost=0.00..45892.00 rows=1 width=128)-- Filter: (order_id = 12345)-- Scanning 2 million rows to find 1! Need an index. -- Why optimizer might choose Seq Scan even with index:-- 1. Small table: Index overhead not worth it-- 2. Low selectivity: Filter returns >10-20% of rows-- 3. Statistics indicate index isn't helpful-- 4. Random I/O cost exceeds sequential scan cost-- 5. No index exists on filtered columnMany developers assume Seq Scan is always wrong. It isn't. For small tables or queries returning most rows, Seq Scan is optimal. The concern is Seq Scan on LARGE tables with SELECTIVE filters—that's where indexes should be used.
Index scans use index structures to efficiently locate specific rows without scanning the entire table. There are several variants:
Index Scan
Uses an index to find row locations, then fetches the actual rows from the table (known as "table access by ROWID" in Oracle terminology).
Mechanics:
Key Characteristics:
| Operator | Description | When Used |
|---|---|---|
| Index Scan | Navigate index, fetch rows from table | Index exists, query is selective, need non-indexed columns |
| Index Only Scan | Read all data from index without table access | All required columns are in the index (covering index) |
| Bitmap Index Scan | Build bitmap of matching row locations | Multi-column conditions or OR conditions on same table |
| Bitmap Heap Scan | Fetch rows using bitmap from Bitmap Index Scan | Follows Bitmap Index Scan to get actual rows |
| Index Seek (SQL Server) | Navigate B-tree + range scan | Seek starts point, then scans range |
1234567891011121314151617181920
-- Standard Index Scan: Uses index, fetches from tableEXPLAIN SELECT * FROM orders WHERE customer_id = 12345;-- Index Scan using idx_customer on orders (cost=0.42..12.47 rows=8 width=128)-- Index Cond: (customer_id = 12345) -- Index Only Scan: All columns in index, no table accessEXPLAIN SELECT customer_id, order_date FROM orders WHERE customer_id = 12345;-- Requires: CREATE INDEX idx_covering ON orders(customer_id, order_date);-- Index Only Scan using idx_covering on orders (cost=0.42..8.47 rows=8 width=12)-- Index Cond: (customer_id = 12345) -- Bitmap Index Scan/Heap Scan: OR conditionsEXPLAIN SELECT * FROM orders WHERE status = 'pending' OR status = 'processing';-- Bitmap Heap Scan on orders (cost=23.45..789.23 rows=1500 width=128)-- Recheck Cond: ((status = 'pending') OR (status = 'processing'))-- -> BitmapOr (cost=23.45..23.45 rows=1500 width=0)-- -> Bitmap Index Scan on idx_status (cost=0.00..11.34 rows=750 width=0)-- Index Cond: (status = 'pending')-- -> Bitmap Index Scan on idx_status (cost=0.00..11.34 rows=750 width=0)-- Index Cond: (status = 'processing')Index Only Scan avoids the random I/O of fetching heap pages entirely. When designing high-performance queries, consider covering indexes that include all columns needed by the query. The savings in I/O can be dramatic.
Join operators combine rows from multiple tables. The optimizer chooses among different algorithms based on table sizes, available indexes, and statistics. Understanding these algorithms is crucial for diagnosing join performance issues.
Nested Loop Join (NLJ)
The simplest join algorithm: for each row in the outer table, scan the inner table for matching rows.
Mechanics:
for each row R1 in outer_table:
for each row R2 in inner_table:
if R1 and R2 satisfy join condition:
output (R1, R2)
Time Complexity: O(N × M) where N = outer rows, M = inner rows
When It's Optimal:
When It's a Disaster:
12345678910111213141516171819202122232425262728
-- GOOD: Nested Loop with indexed inner tableEXPLAIN ANALYZESELECT c.name, o.order_dateFROM customers cJOIN orders o ON c.id = o.customer_idWHERE c.id = 12345; -- Nested Loop (actual time=0.045..0.123 rows=8 loops=1)-- -> Index Scan using pk_customers on customers c-- (actual time=0.012..0.015 rows=1 loops=1)-- Index Cond: (id = 12345)-- -> Index Scan using idx_orders_customer on orders o-- (actual time=0.018..0.095 rows=8 loops=1) -- loops=1!-- Index Cond: (customer_id = 12345) -- BAD: Nested Loop with sequential scan innerEXPLAIN ANALYZESELECT c.name, COUNT(o.id)FROM customers c -- 10,000 customersJOIN orders o ON c.id = o.customer_id -- 1M orders, no indexGROUP BY c.name; -- GroupAggregate (actual time=4523.234..4589.123 rows=10000)-- -> Nested Loop (actual time=0.045..4412.567 rows=1000000)-- -> Seq Scan on customers c (rows=10000 loops=1)-- -> Seq Scan on orders o (rows=100 loops=10000) -- DISASTER!-- Filter: (customer_id = c.id)-- 10,000 loops × full scan of orders = CATASTROPHEIn EXPLAIN ANALYZE, the 'loops' value on the inner side of a Nested Loop tells you how many times that operation ran. High loop counts with expensive inner operations (Seq Scan, complex subqueries) indicate performance disasters waiting to happen.
Hash Join is often the most efficient algorithm for joining medium to large tables without useful indexes.
Hash Join Mechanics:
Phase 1: Build (hash table construction)
Phase 2: Probe (matching rows)
Time Complexity: O(N + M) — linear in both table sizes!
Memory Requirements:
| Aspect | Behavior | Implication |
|---|---|---|
| Build Input | Fully consumed before probe starts | High startup cost; smaller table should be build side |
| Memory Usage | Proportional to build table size | Large build tables may spill to disk |
| Disk Spill | Multi-batch or temp files if memory insufficient | Significant performance degradation |
| Join Condition | Requires equality conditions (hash-based) | Cannot handle inequality or range joins |
| Row Order | Output is NOT ordered | May require additional Sort if ORDER BY needed |
| Scalability | Excellent for large equi-joins | Linear time complexity beats nested loop |
12345678910111213141516171819202122
-- Hash Join: Typical pattern for joining medium/large tablesEXPLAIN (ANALYZE, BUFFERS)SELECT e.name, d.department_nameFROM employees e -- 50,000 rowsJOIN departments d ON e.dept_id = d.id; -- 100 rows (smaller = build side) -- Hash Join (cost=3.25..789.25 rows=50000 width=40)-- (actual time=0.045..15.234 rows=50000 loops=1)-- Hash Cond: (e.dept_id = d.id)-- Buffers: shared hit=512-- -> Seq Scan on employees e -- PROBE side (larger)-- (actual time=0.008..8.234 rows=50000 loops=1)-- -> Hash (cost=2.00..2.00 rows=100 width=20)-- (actual time=0.032..0.032 rows=100 loops=1)-- Buckets: 128 Memory Usage: 12kB -- Fits in memory!-- -> Seq Scan on departments d -- BUILD side (smaller)-- (actual time=0.004..0.015 rows=100 loops=1) -- WARNING SIGNS in Hash Join:-- "Batches: 16" - Multiple batches means memory pressure-- "temp read=X written=Y" - Spilling to disk (very slow)-- Large table as build side - Wrong order, memory issuesIn EXPLAIN output, look for Memory Usage < work_mem (good) and Batches: 1 (good). Multiple batches or temp file usage indicates the hash table doesn't fit in memory. Consider increasing work_mem for complex queries or restructuring to reduce build side size.
Merge Join (also called Sort-Merge Join) is highly efficient when both inputs are already sorted on the join key.
Merge Join Mechanics:
advance pointers in both sorted inputs simultaneously:
if left < right: advance left
if right < left: advance right
if left == right: output match, handle duplicates
Key Requirements:
Time Complexity: O(N log N + M log M) including sorts, or O(N + M) if pre-sorted
When Merge Join Excels:
123456789101112131415161718192021222324252627282930313233
-- Merge Join with pre-sorted inputs (optimal case)EXPLAIN ANALYZESELECT o.order_id, oi.product_idFROM orders o -- Has index on order_idJOIN order_items oi ON o.id = oi.order_id -- Has index on order_idORDER BY o.id; -- Merge Join (actual time=0.045..245.123 rows=500000 loops=1)-- Merge Cond: (o.id = oi.order_id)-- -> Index Scan using pk_orders on orders o-- (actual time=0.012..45.234 rows=100000 loops=1)-- -> Index Scan using idx_items_order on order_items oi-- (actual time=0.008..85.456 rows=500000 loops=1)-- Both inputs come pre-sorted from index scans - no Sort needed! -- Merge Join with Sort operators (less optimal)EXPLAIN ANALYZESELECT c.name, o.order_dateFROM customers cJOIN orders o ON c.id = o.customer_idWHERE o.total > 1000; -- Merge Join (actual time=125.234..234.567 rows=15000)-- Merge Cond: (c.id = o.customer_id)-- -> Sort (actual time=45.234..52.345 rows=10000) -- Explicit sort!-- Sort Key: c.id-- Sort Method: quicksort Memory: 1024kB-- -> Seq Scan on customers c (rows=10000)-- -> Sort (actual time=78.456..95.678 rows=25000) -- Another sort!-- Sort Key: o.customer_id-- Sort Method: external merge Disk: 2048kB -- Spilled to disk!-- -> Seq Scan on orders o (rows=25000)-- Filter: (total > 1000)| Aspect | Merge Join | Hash Join |
|---|---|---|
| Pre-requisite | Sorted inputs (or add Sort) | None |
| Memory usage | Minimal if pre-sorted | Build hash table (can be large) |
| Inequality joins | Supported | Not supported |
| Best for | Pre-sorted large datasets | Unsorted equi-joins |
| Startup cost | Low if sorted, high if Sort needed | High (must build hash) |
| Streaming | Yes (row-by-row) | No (build phase completes first) |
If your query has ORDER BY on the join column, Merge Join can be very efficient because the sort required for the join also satisfies the ORDER BY. The optimizer considers this when choosing join strategies.
Processing operators transform data after it's been retrieved and joined. Sort and Aggregate are among the most common and most resource-intensive.
Sort Operator
Reorders rows according to ORDER BY, DISTINCT, or for Merge Join preparation.
Mechanics:
Memory Considerations:
Sort Method indicators in EXPLAIN:
quicksort Memory: XkB — Good, fit in memorytop-N heapsort — Only keeping top N, very efficientexternal merge Disk: XkB — Spilled to disk, slow12345678910111213141516171819202122232425262728293031
-- Sort operator variationsEXPLAIN (ANALYZE, BUFFERS)SELECT * FROM products ORDER BY price DESC; -- Sort (cost=1250.89..1276.14 rows=10100 width=64)-- (actual time=12.345..15.678 rows=10100 loops=1)-- Sort Key: price DESC-- Sort Method: quicksort Memory: 856kB -- In-memory, good!-- Buffers: shared hit=145-- -> Seq Scan on products (rows=10100) -- Limit optimization with SortEXPLAIN ANALYZESELECT * FROM orders ORDER BY order_date DESC LIMIT 10; -- Limit (actual time=0.145..0.198 rows=10)-- -> Sort (actual time=0.142..0.155 rows=10)-- Sort Key: order_date DESC-- Sort Method: top-N heapsort Memory: 26kB -- Only tracks top 10!-- -> Seq Scan on orders (rows=100000) -- Aggregate operatorsEXPLAIN ANALYZE SELECT department_id, COUNT(*), AVG(salary)FROM employeesGROUP BY department_id; -- HashAggregate (actual time=15.234..15.456 rows=50 loops=1)-- Group Key: department_id-- Batches: 1 Memory Usage: 24kB -- Fit in memory-- -> Seq Scan on employees (rows=50000)| Operator | How It Works | Memory Usage |
|---|---|---|
| HashAggregate | Hash groups, compute aggregates in hash table | Proportional to distinct groups |
| GroupAggregate | Requires sorted input, aggregates consecutive groups | Minimal (streaming) |
| Aggregate | Single aggregate over entire input (no GROUP BY) | Minimal (single state) |
| WindowAgg | Window functions over partitions | Varies by window definition |
When you see 'external merge Disk: XkB' in a Sort, the operation spilled to disk. This can make a 50ms sort take 5000ms. Solutions: increase work_mem, add index to avoid sort, restructure query to reduce intermediate result size.
Beyond core access, join, and processing operators, you'll encounter specialized operators for specific situations:
Materialize
Caches the output of a subplan for reuse.
| Operator | Purpose | Watch For |
|---|---|---|
| Materialize | Cache subplan output for reuse | Large materializations consuming memory |
| Subquery Scan | Wrap subquery as a scan source | Usually benign; check underlying operations |
| Append | Concatenate outputs (UNION ALL, partitions) | Many children = many partition scans |
| Unique | Remove duplicates from sorted input | Requires pre-sorted input |
| SetOp | INTERSECT, EXCEPT operations | Requires sorted inputs; may add Sorts |
| Limit | Restrict output row count | Good when early (stops child operations early) |
| Result | Evaluate constant expressions | No table access; very cheap |
| Gather/Gather Merge | Collect parallel worker output | Indicates parallel query execution |
| Parallel Seq Scan | Seq Scan distributed across workers | Good for large scans on multi-core |
| CTE Scan | Read from Common Table Expression | May materialize CTE; watch for repeated reads |
123456789101112131415161718192021222324252627282930313233343536
-- Materialize: Caching subquery resultsEXPLAIN ANALYZESELECT * FROM products pWHERE p.id IN ( SELECT product_id FROM popular_products); -- Hash Semi Join-- -> Seq Scan on products p-- -> Hash-- -> Materialize (actual time=0.01..0.45 rows=100)-- -> Seq Scan on popular_products -- Parallel Query Execution (PostgreSQL)EXPLAIN ANALYZESELECT COUNT(*) FROM very_large_table WHERE status = 'active'; -- Finalize Aggregate (actual time=234.567..234.568 rows=1)-- -> Gather (actual time=234.123..234.456 rows=3)-- Workers Planned: 2-- Workers Launched: 2-- -> Partial Aggregate (actual time=230.123..230.124 rows=1)-- -> Parallel Seq Scan on very_large_table-- (actual time=0.034..225.456 rows=333333 loops=3)-- Filter: (status = 'active')-- 3 loops = 1 leader + 2 parallel workers -- Append for partitioned tablesEXPLAIN ANALYZESELECT * FROM orders_partitioned WHERE order_date >= '2024-01-01'; -- Append (actual time=0.012..45.234 rows=50000)-- -> Seq Scan on orders_2024_q1 (rows=25000)-- -> Seq Scan on orders_2024_q2 (rows=25000)-- -> Seq Scan on orders_2024_q3 (rows=0) -- Partition pruned or empty-- -> Seq Scan on orders_2024_q4 (rows=0) -- Not yet populatedWhen you see Gather/Parallel operators, the database is using multiple CPU cores. Ensure parallel workers actually launched (Workers Launched: N). If workers planned but not launched, check max_parallel_workers and system load.
You now have a comprehensive vocabulary for reading execution plans. Here's a quick reference for the most important operators and their implications:
| Operator | Good Sign | Warning Sign |
|---|---|---|
| Seq Scan | Small table, need most rows | Large table with selective filter |
| Index Scan | Selective filter uses index | Loops > 1000 (check nested loops) |
| Index Only Scan | All columns from index (covering) | Rare - this is almost always good |
| Nested Loop | Small outer, indexed inner | Large outer, Seq Scan inner |
| Hash Join | Memory Usage << work_mem, Batches: 1 | Disk spill, many batches |
| Merge Join | Pre-sorted inputs (from indexes) | External Sort added for join |
| Sort | quicksort in memory, top-N heapsort | external merge Disk: XkB |
| HashAggregate | Batches: 1, fits in memory | Many batches, disk spill |
What's Next:
With operator vocabulary mastered, the next page examines cost estimates in depth—how the optimizer calculates costs, what the numbers mean, and how to use cost information to compare alternative query approaches and predict performance.
You now have a comprehensive understanding of execution plan operators. You can identify access methods, evaluate join strategies, and recognize processing operations. Next, we'll explore how the optimizer calculates and uses cost estimates.