Database Management SystemsEXPLAIN Plans

Understanding EXPLAIN Plans

LevelIntermediate

Duration60 mins

TopicEXPLAIN Plans

3 / 5

Plan Operators

The Vocabulary of Query Execution

When a doctor reads an MRI scan, they recognize specific anatomical structures—the hippocampus, the corpus callosum, the prefrontal cortex. Each structure has a name, a function, and associations with specific conditions. Without this vocabulary, the scan is meaningless noise.

Execution plan operators are the anatomical vocabulary of database performance. When you see "Hash Join" or "Index Scan" or "Sort" in a plan, each term should immediately evoke:

What this operation does mechanically
When it's the right choice vs. a warning sign
What resources it consumes (CPU, memory, I/O)
How it interacts with neighboring operations

This page is your comprehensive reference for plan operators. We'll cover access methods (how data is retrieved), join algorithms (how tables are combined), and processing operators (how data is transformed). Master this vocabulary, and execution plans become instantly readable.

What You Will Learn

By the end of this page, you will understand every major operator type you'll encounter in execution plans. For each operator, you'll learn what it does, when it's optimal, when it's a red flag, and how to respond when you see it in a problematic query.

Data Access Operators: Sequential Scans

Data access operators are the leaf nodes of execution plans—they're responsible for actually retrieving data from storage. The choice of access method dramatically impacts query performance.

Sequential Scan (Seq Scan / Table Scan / Full Table Scan)

A sequential scan reads every row in a table from beginning to end, in the physical order they're stored on disk.

Mechanics:

The database reads data pages sequentially from disk
Each row is examined to check if it matches filter conditions
No index is used—the table is scanned directly
All columns of matching rows are retrieved

When It's Appropriate:

Small tables (a few hundred to few thousand rows)
Queries that need most or all rows
When no suitable index exists for the filter condition
When random I/O (from index) would be slower than sequential I/O

When It's a Red Flag:

Large tables with selective WHERE clauses
Repeated scans in loops (nested loop with seq scan inner)
Growing tables where performance will degrade over time

Sequential Scan Examples
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Seq Scan is optimal here: selecting most of the table
EXPLAIN SELECT * FROM employees WHERE salary > 0;
--  Seq Scan on employees  (cost=0.00..289.00 rows=9900 width=64)
--    Filter: (salary > 0)
-- Nearly all rows qualify, index wouldn't help
 
-- Seq Scan is PROBLEMATIC here: highly selective filter
EXPLAIN SELECT * FROM orders WHERE order_id = 12345;
--  Seq Scan on orders  (cost=0.00..45892.00 rows=1 width=128)
--    Filter: (order_id = 12345)
-- Scanning 2 million rows to find 1! Need an index.
 
-- Why optimizer might choose Seq Scan even with index:
-- 1. Small table: Index overhead not worth it
-- 2. Low selectivity: Filter returns >10-20% of rows
-- 3. Statistics indicate index isn't helpful
-- 4. Random I/O cost exceeds sequential scan cost
-- 5. No index exists on filtered column

Seq Scan Isn't Always Bad

Many developers assume Seq Scan is always wrong. It isn't. For small tables or queries returning most rows, Seq Scan is optimal. The concern is Seq Scan on LARGE tables with SELECTIVE filters—that's where indexes should be used.

Data Access Operators: Index Scans

Index scans use index structures to efficiently locate specific rows without scanning the entire table. There are several variants:

Index Scan

Uses an index to find row locations, then fetches the actual rows from the table (known as "table access by ROWID" in Oracle terminology).

Mechanics:

Navigate the index B-tree to find matching entries
Each index entry contains a pointer to the row's heap location
Follow each pointer to fetch the complete row from the table
Repeat for all matching index entries

Key Characteristics:

Efficient for selective queries (few rows match)
Requires random I/O to fetch heap pages
Returns rows in index order (may avoid later sort)
Slower than Seq Scan if many rows match (due to random I/O)

Index Scan Variants
Operator	Description	When Used
Index Scan	Navigate index, fetch rows from table	Index exists, query is selective, need non-indexed columns
Index Only Scan	Read all data from index without table access	All required columns are in the index (covering index)
Bitmap Index Scan	Build bitmap of matching row locations	Multi-column conditions or OR conditions on same table
Bitmap Heap Scan	Fetch rows using bitmap from Bitmap Index Scan	Follows Bitmap Index Scan to get actual rows
Index Seek (SQL Server)	Navigate B-tree + range scan	Seek starts point, then scans range

Index Scan Variants in Action
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Standard Index Scan: Uses index, fetches from table
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
--  Index Scan using idx_customer on orders  (cost=0.42..12.47 rows=8 width=128)
--    Index Cond: (customer_id = 12345)
 
-- Index Only Scan: All columns in index, no table access
EXPLAIN SELECT customer_id, order_date FROM orders WHERE customer_id = 12345;
-- Requires: CREATE INDEX idx_covering ON orders(customer_id, order_date);
--  Index Only Scan using idx_covering on orders  (cost=0.42..8.47 rows=8 width=12)
--    Index Cond: (customer_id = 12345)
 
-- Bitmap Index Scan/Heap Scan: OR conditions
EXPLAIN SELECT * FROM orders WHERE status = 'pending' OR status = 'processing';
--  Bitmap Heap Scan on orders  (cost=23.45..789.23 rows=1500 width=128)
--    Recheck Cond: ((status = 'pending') OR (status = 'processing'))
--    ->  BitmapOr  (cost=23.45..23.45 rows=1500 width=0)
--          ->  Bitmap Index Scan on idx_status  (cost=0.00..11.34 rows=750 width=0)
--                Index Cond: (status = 'pending')
--          ->  Bitmap Index Scan on idx_status  (cost=0.00..11.34 rows=750 width=0)
--                Index Cond: (status = 'processing')

Index Only Scan Is the Gold Standard

Index Only Scan avoids the random I/O of fetching heap pages entirely. When designing high-performance queries, consider covering indexes that include all columns needed by the query. The savings in I/O can be dramatic.

Join Operators: Nested Loop Join

Join operators combine rows from multiple tables. The optimizer chooses among different algorithms based on table sizes, available indexes, and statistics. Understanding these algorithms is crucial for diagnosing join performance issues.

Nested Loop Join (NLJ)

The simplest join algorithm: for each row in the outer table, scan the inner table for matching rows.

Mechanics:

for each row R1 in outer_table:
    for each row R2 in inner_table:
        if R1 and R2 satisfy join condition:
            output (R1, R2)

Time Complexity: O(N × M) where N = outer rows, M = inner rows

When It's Optimal:

Outer table is small (few hundred rows)
Inner table has an efficient index on the join column
Extremely selective joins (finding specific matches)

When It's a Disaster:

Both tables are large (N × M becomes huge)
No index on inner table's join column (full scan per outer row)
Poor cardinality estimates causing wrong table order

Nested Loop Works Well

•Outer table: 100 rows, Inner: 1M rows with index
•100 index lookups × O(log n) = ~1,000 operations
•Very efficient for this pattern
•Common in OLTP with selective filters

Nested Loop Catastrophe

•Outer table: 100K rows, Inner: 1M rows, no index
•100K × 1M = 100 billion row comparisons
•Each inner scan reads entire table from disk
•Query runs for hours or days

Nested Loop Join Examples

PostgreSQL

-- GOOD: Nested Loop with indexed inner table
EXPLAIN ANALYZE
SELECT c.name, o.order_date
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.id = 12345;
 
--  Nested Loop  (actual time=0.045..0.123 rows=8 loops=1)
--    ->  Index Scan using pk_customers on customers c
--          (actual time=0.012..0.015 rows=1 loops=1)
--          Index Cond: (id = 12345)
--    ->  Index Scan using idx_orders_customer on orders o
--          (actual time=0.018..0.095 rows=8 loops=1)  -- loops=1!
--          Index Cond: (customer_id = 12345)
 
-- BAD: Nested Loop with sequential scan inner
EXPLAIN ANALYZE
SELECT c.name, COUNT(o.id)
FROM customers c                    -- 10,000 customers
JOIN orders o ON c.id = o.customer_id  -- 1M orders, no index
GROUP BY c.name;
 
--  GroupAggregate  (actual time=4523.234..4589.123 rows=10000)
--    ->  Nested Loop  (actual time=0.045..4412.567 rows=1000000)
--          ->  Seq Scan on customers c (rows=10000 loops=1)
--          ->  Seq Scan on orders o (rows=100 loops=10000)  -- DISASTER!
--                Filter: (customer_id = c.id)
-- 10,000 loops × full scan of orders = CATASTROPHE

Watch the Loops Count

In EXPLAIN ANALYZE, the 'loops' value on the inner side of a Nested Loop tells you how many times that operation ran. High loop counts with expensive inner operations (Seq Scan, complex subqueries) indicate performance disasters waiting to happen.

Join Operators: Hash Join

Hash Join is often the most efficient algorithm for joining medium to large tables without useful indexes.

Hash Join Mechanics:

Phase 1: Build (hash table construction)

Choose the smaller table as the "build" side
Scan the build table completely
For each row, hash the join key and store in a hash table

Phase 2: Probe (matching rows)

Scan the larger "probe" table
For each row, hash the join key
Look up in hash table to find matching rows
Output matched pairs

Time Complexity: O(N + M) — linear in both table sizes!

Memory Requirements:

The hash table must fit in work_mem
If it doesn't, multi-batch processing is used (slower)
If even batches don't fit, hash table spills to disk (much slower)

Hash Join Characteristics
Aspect	Behavior	Implication
Build Input	Fully consumed before probe starts	High startup cost; smaller table should be build side
Memory Usage	Proportional to build table size	Large build tables may spill to disk
Disk Spill	Multi-batch or temp files if memory insufficient	Significant performance degradation
Join Condition	Requires equality conditions (hash-based)	Cannot handle inequality or range joins
Row Order	Output is NOT ordered	May require additional Sort if ORDER BY needed
Scalability	Excellent for large equi-joins	Linear time complexity beats nested loop

Hash Join in Execution Plans

PostgreSQL

-- Hash Join: Typical pattern for joining medium/large tables
EXPLAIN (ANALYZE, BUFFERS)
SELECT e.name, d.department_name
FROM employees e                    -- 50,000 rows
JOIN departments d ON e.dept_id = d.id;  -- 100 rows (smaller = build side)
 
--  Hash Join  (cost=3.25..789.25 rows=50000 width=40)
--             (actual time=0.045..15.234 rows=50000 loops=1)
--    Hash Cond: (e.dept_id = d.id)
--    Buffers: shared hit=512
--    ->  Seq Scan on employees e           -- PROBE side (larger)
--          (actual time=0.008..8.234 rows=50000 loops=1)
--    ->  Hash  (cost=2.00..2.00 rows=100 width=20)
--              (actual time=0.032..0.032 rows=100 loops=1)
--          Buckets: 128  Memory Usage: 12kB     -- Fits in memory!
--          ->  Seq Scan on departments d    -- BUILD side (smaller)
--                (actual time=0.004..0.015 rows=100 loops=1)
 
-- WARNING SIGNS in Hash Join:
-- "Batches: 16" - Multiple batches means memory pressure
-- "temp read=X written=Y" - Spilling to disk (very slow)
-- Large table as build side - Wrong order, memory issues

Watch for Hash Batches and Disk Spills

In EXPLAIN output, look for Memory Usage < work_mem (good) and Batches: 1 (good). Multiple batches or temp file usage indicates the hash table doesn't fit in memory. Consider increasing work_mem for complex queries or restructuring to reduce build side size.

Join Operators: Merge Join

Merge Join (also called Sort-Merge Join) is highly efficient when both inputs are already sorted on the join key.

Merge Join Mechanics:

advance pointers in both sorted inputs simultaneously:
    if left < right: advance left
    if right < left: advance right
    if left == right: output match, handle duplicates

Key Requirements:

Both inputs MUST be sorted on the join column(s)
If not naturally sorted, a Sort operation precedes the join

Time Complexity: O(N log N + M log M) including sorts, or O(N + M) if pre-sorted

When Merge Join Excels:

Inputs are already sorted (index scans in order)
Range/inequality joins (unlike Hash Join)
Very large tables where hash table wouldn't fit
When output needs to be sorted anyway (sort done once, used twice)

Merge Join Examples

PostgreSQL

-- Merge Join with pre-sorted inputs (optimal case)
EXPLAIN ANALYZE
SELECT o.order_id, oi.product_id
FROM orders o                              -- Has index on order_id
JOIN order_items oi ON o.id = oi.order_id  -- Has index on order_id
ORDER BY o.id;
 
--  Merge Join  (actual time=0.045..245.123 rows=500000 loops=1)
--    Merge Cond: (o.id = oi.order_id)
--    ->  Index Scan using pk_orders on orders o
--          (actual time=0.012..45.234 rows=100000 loops=1)
--    ->  Index Scan using idx_items_order on order_items oi
--          (actual time=0.008..85.456 rows=500000 loops=1)
-- Both inputs come pre-sorted from index scans - no Sort needed!
 
-- Merge Join with Sort operators (less optimal)
EXPLAIN ANALYZE
SELECT c.name, o.order_date
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.total > 1000;
 
--  Merge Join  (actual time=125.234..234.567 rows=15000)
--    Merge Cond: (c.id = o.customer_id)
--    ->  Sort  (actual time=45.234..52.345 rows=10000)  -- Explicit sort!
--          Sort Key: c.id
--          Sort Method: quicksort  Memory: 1024kB
--          ->  Seq Scan on customers c (rows=10000)
--    ->  Sort  (actual time=78.456..95.678 rows=25000)  -- Another sort!
--          Sort Key: o.customer_id
--          Sort Method: external merge  Disk: 2048kB  -- Spilled to disk!
--          ->  Seq Scan on orders o (rows=25000)
--                Filter: (total > 1000)

Merge Join vs. Hash Join
Aspect	Merge Join	Hash Join
Pre-requisite	Sorted inputs (or add Sort)	None
Memory usage	Minimal if pre-sorted	Build hash table (can be large)
Inequality joins	Supported	Not supported
Best for	Pre-sorted large datasets	Unsorted equi-joins
Startup cost	Low if sorted, high if Sort needed	High (must build hash)
Streaming	Yes (row-by-row)	No (build phase completes first)

Merge Join and ORDER BY Synergy

If your query has ORDER BY on the join column, Merge Join can be very efficient because the sort required for the join also satisfies the ORDER BY. The optimizer considers this when choosing join strategies.

Processing Operators: Sort and Aggregate

Processing operators transform data after it's been retrieved and joined. Sort and Aggregate are among the most common and most resource-intensive.

Sort Operator

Reorders rows according to ORDER BY, DISTINCT, or for Merge Join preparation.

Mechanics:

Buffers all input rows (blocking operation)
Applies sorting algorithm (quicksort, heapsort, or external merge)
Outputs rows in sorted order

Memory Considerations:

In-memory sort (quicksort): Fast, requires work_mem
External merge sort: Spills to disk if data > work_mem

Sort Method indicators in EXPLAIN:

quicksort Memory: XkB — Good, fit in memory
top-N heapsort — Only keeping top N, very efficient
external merge Disk: XkB — Spilled to disk, slow

Sort and Aggregate Operators

PostgreSQL

-- Sort operator variations
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM products ORDER BY price DESC;
 
--  Sort  (cost=1250.89..1276.14 rows=10100 width=64)
--        (actual time=12.345..15.678 rows=10100 loops=1)
--    Sort Key: price DESC
--    Sort Method: quicksort  Memory: 856kB   -- In-memory, good!
--    Buffers: shared hit=145
--    ->  Seq Scan on products (rows=10100)
 
-- Limit optimization with Sort
EXPLAIN ANALYZE
SELECT * FROM orders ORDER BY order_date DESC LIMIT 10;
 
--  Limit  (actual time=0.145..0.198 rows=10)
--    ->  Sort  (actual time=0.142..0.155 rows=10)
--          Sort Key: order_date DESC
--          Sort Method: top-N heapsort  Memory: 26kB  -- Only tracks top 10!
--          ->  Seq Scan on orders (rows=100000)
 
-- Aggregate operators
EXPLAIN ANALYZE  
SELECT department_id, COUNT(*), AVG(salary)
FROM employees
GROUP BY department_id;
 
--  HashAggregate  (actual time=15.234..15.456 rows=50 loops=1)
--    Group Key: department_id
--    Batches: 1  Memory Usage: 24kB    -- Fit in memory
--    ->  Seq Scan on employees (rows=50000)

Aggregate Operator Types
Operator	How It Works	Memory Usage
HashAggregate	Hash groups, compute aggregates in hash table	Proportional to distinct groups
GroupAggregate	Requires sorted input, aggregates consecutive groups	Minimal (streaming)
Aggregate	Single aggregate over entire input (no GROUP BY)	Minimal (single state)
WindowAgg	Window functions over partitions	Varies by window definition

Disk Sorts Destroy Performance

When you see 'external merge Disk: XkB' in a Sort, the operation spilled to disk. This can make a 50ms sort take 5000ms. Solutions: increase work_mem, add index to avoid sort, restructure query to reduce intermediate result size.

Special Operators

Beyond core access, join, and processing operators, you'll encounter specialized operators for specific situations:

Materialize

Caches the output of a subplan for reuse.

Stores rows in memory (or disk if too large)
Useful when the same subquery result is needed multiple times
Warning sign if materializing large result sets

Special Operators Reference
Operator	Purpose	Watch For
Materialize	Cache subplan output for reuse	Large materializations consuming memory
Subquery Scan	Wrap subquery as a scan source	Usually benign; check underlying operations
Append	Concatenate outputs (UNION ALL, partitions)	Many children = many partition scans
Unique	Remove duplicates from sorted input	Requires pre-sorted input
SetOp	INTERSECT, EXCEPT operations	Requires sorted inputs; may add Sorts
Limit	Restrict output row count	Good when early (stops child operations early)
Result	Evaluate constant expressions	No table access; very cheap
Gather/Gather Merge	Collect parallel worker output	Indicates parallel query execution
Parallel Seq Scan	Seq Scan distributed across workers	Good for large scans on multi-core
CTE Scan	Read from Common Table Expression	May materialize CTE; watch for repeated reads

Special Operators in Plans

PostgreSQL

-- Materialize: Caching subquery results
EXPLAIN ANALYZE
SELECT * FROM products p
WHERE p.id IN (
    SELECT product_id FROM popular_products
);
 
--  Hash Semi Join
--    ->  Seq Scan on products p
--    ->  Hash
--          ->  Materialize  (actual time=0.01..0.45 rows=100)
--                ->  Seq Scan on popular_products
 
-- Parallel Query Execution (PostgreSQL)
EXPLAIN ANALYZE
SELECT COUNT(*) FROM very_large_table WHERE status = 'active';
 
--  Finalize Aggregate  (actual time=234.567..234.568 rows=1)
--    ->  Gather  (actual time=234.123..234.456 rows=3)
--          Workers Planned: 2
--          Workers Launched: 2
--          ->  Partial Aggregate  (actual time=230.123..230.124 rows=1)
--                ->  Parallel Seq Scan on very_large_table
--                      (actual time=0.034..225.456 rows=333333 loops=3)
--                      Filter: (status = 'active')
-- 3 loops = 1 leader + 2 parallel workers
 
-- Append for partitioned tables
EXPLAIN ANALYZE
SELECT * FROM orders_partitioned WHERE order_date >= '2024-01-01';
 
--  Append  (actual time=0.012..45.234 rows=50000)
--    ->  Seq Scan on orders_2024_q1  (rows=25000)
--    ->  Seq Scan on orders_2024_q2  (rows=25000)
--    ->  Seq Scan on orders_2024_q3  (rows=0)  -- Partition pruned or empty
--    ->  Seq Scan on orders_2024_q4  (rows=0)  -- Not yet populated

Parallel Query Optimization

When you see Gather/Parallel operators, the database is using multiple CPU cores. Ensure parallel workers actually launched (Workers Launched: N). If workers planned but not launched, check max_parallel_workers and system load.

Summary: Plan Operators Quick Reference

You now have a comprehensive vocabulary for reading execution plans. Here's a quick reference for the most important operators and their implications:

Operator Quick Reference
Operator	Good Sign	Warning Sign
Seq Scan	Small table, need most rows	Large table with selective filter
Index Scan	Selective filter uses index	Loops > 1000 (check nested loops)
Index Only Scan	All columns from index (covering)	Rare - this is almost always good
Nested Loop	Small outer, indexed inner	Large outer, Seq Scan inner
Hash Join	Memory Usage << work_mem, Batches: 1	Disk spill, many batches
Merge Join	Pre-sorted inputs (from indexes)	External Sort added for join
Sort	quicksort in memory, top-N heapsort	external merge Disk: XkB
HashAggregate	Batches: 1, fits in memory	Many batches, disk spill

Key Takeaways

•Data access operators determine how data is retrieved — indexes enable selective access; their absence forces full scans.
•Join operators vary dramatically in performance — Nested Loop is best for small/indexed, Hash for medium unordered, Merge for large sorted data.
•Memory vs. disk is the critical boundary — in-memory sorts and hash tables are fast; disk spills are 10-100x slower.
•Loop counts multiply everything — even fast operations become slow when looped thousands of times.
•Operator vocabulary enables diagnosis — seeing 'Hash Join with disk spill' immediately tells you the problem and solution.

What's Next:

With operator vocabulary mastered, the next page examines cost estimates in depth—how the optimizer calculates costs, what the numbers mean, and how to use cost information to compare alternative query approaches and predict performance.

Page Complete

You now have a comprehensive understanding of execution plan operators. You can identify access methods, evaluate join strategies, and recognize processing operations. Next, we'll explore how the optimizer calculates and uses cost estimates.

3 / 5

Loading learning content...

Database Management SystemsEXPLAIN Plans

Understanding EXPLAIN Plans

LevelIntermediate

Duration60 mins

TopicEXPLAIN Plans

3 / 5

Plan Operators

The Vocabulary of Query Execution

Execution plan operators are the anatomical vocabulary of database performance. When you see "Hash Join" or "Index Scan" or "Sort" in a plan, each term should immediately evoke:

What this operation does mechanically
When it's the right choice vs. a warning sign
What resources it consumes (CPU, memory, I/O)
How it interacts with neighboring operations

What You Will Learn

Data Access Operators: Sequential Scans

Data access operators are the leaf nodes of execution plans—they're responsible for actually retrieving data from storage. The choice of access method dramatically impacts query performance.

Sequential Scan (Seq Scan / Table Scan / Full Table Scan)

A sequential scan reads every row in a table from beginning to end, in the physical order they're stored on disk.

Mechanics:

The database reads data pages sequentially from disk
Each row is examined to check if it matches filter conditions
No index is used—the table is scanned directly
All columns of matching rows are retrieved

When It's Appropriate:

Small tables (a few hundred to few thousand rows)
Queries that need most or all rows
When no suitable index exists for the filter condition
When random I/O (from index) would be slower than sequential I/O

When It's a Red Flag:

Large tables with selective WHERE clauses
Repeated scans in loops (nested loop with seq scan inner)
Growing tables where performance will degrade over time

Sequential Scan Examples
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Seq Scan is optimal here: selecting most of the table
EXPLAIN SELECT * FROM employees WHERE salary > 0;
--  Seq Scan on employees  (cost=0.00..289.00 rows=9900 width=64)
--    Filter: (salary > 0)
-- Nearly all rows qualify, index wouldn't help
 
-- Seq Scan is PROBLEMATIC here: highly selective filter
EXPLAIN SELECT * FROM orders WHERE order_id = 12345;
--  Seq Scan on orders  (cost=0.00..45892.00 rows=1 width=128)
--    Filter: (order_id = 12345)
-- Scanning 2 million rows to find 1! Need an index.
 
-- Why optimizer might choose Seq Scan even with index:
-- 1. Small table: Index overhead not worth it
-- 2. Low selectivity: Filter returns >10-20% of rows
-- 3. Statistics indicate index isn't helpful
-- 4. Random I/O cost exceeds sequential scan cost
-- 5. No index exists on filtered column

Seq Scan Isn't Always Bad

Data Access Operators: Index Scans

Index scans use index structures to efficiently locate specific rows without scanning the entire table. There are several variants:

Index Scan

Uses an index to find row locations, then fetches the actual rows from the table (known as "table access by ROWID" in Oracle terminology).

Mechanics:

Navigate the index B-tree to find matching entries
Each index entry contains a pointer to the row's heap location
Follow each pointer to fetch the complete row from the table
Repeat for all matching index entries

Key Characteristics:

Efficient for selective queries (few rows match)
Requires random I/O to fetch heap pages
Returns rows in index order (may avoid later sort)
Slower than Seq Scan if many rows match (due to random I/O)

Index Scan Variants
Operator	Description	When Used
Index Scan	Navigate index, fetch rows from table	Index exists, query is selective, need non-indexed columns
Index Only Scan	Read all data from index without table access	All required columns are in the index (covering index)
Bitmap Index Scan	Build bitmap of matching row locations	Multi-column conditions or OR conditions on same table
Bitmap Heap Scan	Fetch rows using bitmap from Bitmap Index Scan	Follows Bitmap Index Scan to get actual rows
Index Seek (SQL Server)	Navigate B-tree + range scan	Seek starts point, then scans range

Index Scan Variants in Action
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Standard Index Scan: Uses index, fetches from table
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
--  Index Scan using idx_customer on orders  (cost=0.42..12.47 rows=8 width=128)
--    Index Cond: (customer_id = 12345)
 
-- Index Only Scan: All columns in index, no table access
EXPLAIN SELECT customer_id, order_date FROM orders WHERE customer_id = 12345;
-- Requires: CREATE INDEX idx_covering ON orders(customer_id, order_date);
--  Index Only Scan using idx_covering on orders  (cost=0.42..8.47 rows=8 width=12)
--    Index Cond: (customer_id = 12345)
 
-- Bitmap Index Scan/Heap Scan: OR conditions
EXPLAIN SELECT * FROM orders WHERE status = 'pending' OR status = 'processing';
--  Bitmap Heap Scan on orders  (cost=23.45..789.23 rows=1500 width=128)
--    Recheck Cond: ((status = 'pending') OR (status = 'processing'))
--    ->  BitmapOr  (cost=23.45..23.45 rows=1500 width=0)
--          ->  Bitmap Index Scan on idx_status  (cost=0.00..11.34 rows=750 width=0)
--                Index Cond: (status = 'pending')
--          ->  Bitmap Index Scan on idx_status  (cost=0.00..11.34 rows=750 width=0)
--                Index Cond: (status = 'processing')

Index Only Scan Is the Gold Standard

Join Operators: Nested Loop Join

Nested Loop Join (NLJ)

The simplest join algorithm: for each row in the outer table, scan the inner table for matching rows.

Mechanics:

for each row R1 in outer_table:
    for each row R2 in inner_table:
        if R1 and R2 satisfy join condition:
            output (R1, R2)

Time Complexity: O(N × M) where N = outer rows, M = inner rows

When It's Optimal:

Outer table is small (few hundred rows)
Inner table has an efficient index on the join column
Extremely selective joins (finding specific matches)

When It's a Disaster:

Both tables are large (N × M becomes huge)
No index on inner table's join column (full scan per outer row)
Poor cardinality estimates causing wrong table order

Nested Loop Works Well

•Outer table: 100 rows, Inner: 1M rows with index
•100 index lookups × O(log n) = ~1,000 operations
•Very efficient for this pattern
•Common in OLTP with selective filters

Nested Loop Catastrophe

•Outer table: 100K rows, Inner: 1M rows, no index
•100K × 1M = 100 billion row comparisons
•Each inner scan reads entire table from disk
•Query runs for hours or days

Nested Loop Join Examples

PostgreSQL

-- GOOD: Nested Loop with indexed inner table
EXPLAIN ANALYZE
SELECT c.name, o.order_date
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.id = 12345;
 
--  Nested Loop  (actual time=0.045..0.123 rows=8 loops=1)
--    ->  Index Scan using pk_customers on customers c
--          (actual time=0.012..0.015 rows=1 loops=1)
--          Index Cond: (id = 12345)
--    ->  Index Scan using idx_orders_customer on orders o
--          (actual time=0.018..0.095 rows=8 loops=1)  -- loops=1!
--          Index Cond: (customer_id = 12345)
 
-- BAD: Nested Loop with sequential scan inner
EXPLAIN ANALYZE
SELECT c.name, COUNT(o.id)
FROM customers c                    -- 10,000 customers
JOIN orders o ON c.id = o.customer_id  -- 1M orders, no index
GROUP BY c.name;
 
--  GroupAggregate  (actual time=4523.234..4589.123 rows=10000)
--    ->  Nested Loop  (actual time=0.045..4412.567 rows=1000000)
--          ->  Seq Scan on customers c (rows=10000 loops=1)
--          ->  Seq Scan on orders o (rows=100 loops=10000)  -- DISASTER!
--                Filter: (customer_id = c.id)
-- 10,000 loops × full scan of orders = CATASTROPHE

Watch the Loops Count

Join Operators: Hash Join

Hash Join is often the most efficient algorithm for joining medium to large tables without useful indexes.

Hash Join Mechanics:

Phase 1: Build (hash table construction)

Choose the smaller table as the "build" side
Scan the build table completely
For each row, hash the join key and store in a hash table

Phase 2: Probe (matching rows)

Scan the larger "probe" table
For each row, hash the join key
Look up in hash table to find matching rows
Output matched pairs

Time Complexity: O(N + M) — linear in both table sizes!

Memory Requirements:

The hash table must fit in work_mem
If it doesn't, multi-batch processing is used (slower)
If even batches don't fit, hash table spills to disk (much slower)

Hash Join Characteristics
Aspect	Behavior	Implication
Build Input	Fully consumed before probe starts	High startup cost; smaller table should be build side
Memory Usage	Proportional to build table size	Large build tables may spill to disk
Disk Spill	Multi-batch or temp files if memory insufficient	Significant performance degradation
Join Condition	Requires equality conditions (hash-based)	Cannot handle inequality or range joins
Row Order	Output is NOT ordered	May require additional Sort if ORDER BY needed
Scalability	Excellent for large equi-joins	Linear time complexity beats nested loop

Hash Join in Execution Plans

PostgreSQL

-- Hash Join: Typical pattern for joining medium/large tables
EXPLAIN (ANALYZE, BUFFERS)
SELECT e.name, d.department_name
FROM employees e                    -- 50,000 rows
JOIN departments d ON e.dept_id = d.id;  -- 100 rows (smaller = build side)
 
--  Hash Join  (cost=3.25..789.25 rows=50000 width=40)
--             (actual time=0.045..15.234 rows=50000 loops=1)
--    Hash Cond: (e.dept_id = d.id)
--    Buffers: shared hit=512
--    ->  Seq Scan on employees e           -- PROBE side (larger)
--          (actual time=0.008..8.234 rows=50000 loops=1)
--    ->  Hash  (cost=2.00..2.00 rows=100 width=20)
--              (actual time=0.032..0.032 rows=100 loops=1)
--          Buckets: 128  Memory Usage: 12kB     -- Fits in memory!
--          ->  Seq Scan on departments d    -- BUILD side (smaller)
--                (actual time=0.004..0.015 rows=100 loops=1)
 
-- WARNING SIGNS in Hash Join:
-- "Batches: 16" - Multiple batches means memory pressure
-- "temp read=X written=Y" - Spilling to disk (very slow)
-- Large table as build side - Wrong order, memory issues

Watch for Hash Batches and Disk Spills

Join Operators: Merge Join

Merge Join (also called Sort-Merge Join) is highly efficient when both inputs are already sorted on the join key.

Merge Join Mechanics:

advance pointers in both sorted inputs simultaneously:
    if left < right: advance left
    if right < left: advance right
    if left == right: output match, handle duplicates

Key Requirements:

Both inputs MUST be sorted on the join column(s)
If not naturally sorted, a Sort operation precedes the join

Time Complexity: O(N log N + M log M) including sorts, or O(N + M) if pre-sorted

When Merge Join Excels:

Inputs are already sorted (index scans in order)
Range/inequality joins (unlike Hash Join)
Very large tables where hash table wouldn't fit
When output needs to be sorted anyway (sort done once, used twice)

Merge Join Examples

PostgreSQL

-- Merge Join with pre-sorted inputs (optimal case)
EXPLAIN ANALYZE
SELECT o.order_id, oi.product_id
FROM orders o                              -- Has index on order_id
JOIN order_items oi ON o.id = oi.order_id  -- Has index on order_id
ORDER BY o.id;
 
--  Merge Join  (actual time=0.045..245.123 rows=500000 loops=1)
--    Merge Cond: (o.id = oi.order_id)
--    ->  Index Scan using pk_orders on orders o
--          (actual time=0.012..45.234 rows=100000 loops=1)
--    ->  Index Scan using idx_items_order on order_items oi
--          (actual time=0.008..85.456 rows=500000 loops=1)
-- Both inputs come pre-sorted from index scans - no Sort needed!
 
-- Merge Join with Sort operators (less optimal)
EXPLAIN ANALYZE
SELECT c.name, o.order_date
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.total > 1000;
 
--  Merge Join  (actual time=125.234..234.567 rows=15000)
--    Merge Cond: (c.id = o.customer_id)
--    ->  Sort  (actual time=45.234..52.345 rows=10000)  -- Explicit sort!
--          Sort Key: c.id
--          Sort Method: quicksort  Memory: 1024kB
--          ->  Seq Scan on customers c (rows=10000)
--    ->  Sort  (actual time=78.456..95.678 rows=25000)  -- Another sort!
--          Sort Key: o.customer_id
--          Sort Method: external merge  Disk: 2048kB  -- Spilled to disk!
--          ->  Seq Scan on orders o (rows=25000)
--                Filter: (total > 1000)

Merge Join vs. Hash Join
Aspect	Merge Join	Hash Join
Pre-requisite	Sorted inputs (or add Sort)	None
Memory usage	Minimal if pre-sorted	Build hash table (can be large)
Inequality joins	Supported	Not supported
Best for	Pre-sorted large datasets	Unsorted equi-joins
Startup cost	Low if sorted, high if Sort needed	High (must build hash)
Streaming	Yes (row-by-row)	No (build phase completes first)

Merge Join and ORDER BY Synergy

Processing Operators: Sort and Aggregate

Processing operators transform data after it's been retrieved and joined. Sort and Aggregate are among the most common and most resource-intensive.

Sort Operator

Reorders rows according to ORDER BY, DISTINCT, or for Merge Join preparation.

Mechanics:

Buffers all input rows (blocking operation)
Applies sorting algorithm (quicksort, heapsort, or external merge)
Outputs rows in sorted order

Memory Considerations:

In-memory sort (quicksort): Fast, requires work_mem
External merge sort: Spills to disk if data > work_mem

Sort Method indicators in EXPLAIN:

quicksort Memory: XkB — Good, fit in memory
top-N heapsort — Only keeping top N, very efficient
external merge Disk: XkB — Spilled to disk, slow

Sort and Aggregate Operators

PostgreSQL

-- Sort operator variations
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM products ORDER BY price DESC;
 
--  Sort  (cost=1250.89..1276.14 rows=10100 width=64)
--        (actual time=12.345..15.678 rows=10100 loops=1)
--    Sort Key: price DESC
--    Sort Method: quicksort  Memory: 856kB   -- In-memory, good!
--    Buffers: shared hit=145
--    ->  Seq Scan on products (rows=10100)
 
-- Limit optimization with Sort
EXPLAIN ANALYZE
SELECT * FROM orders ORDER BY order_date DESC LIMIT 10;
 
--  Limit  (actual time=0.145..0.198 rows=10)
--    ->  Sort  (actual time=0.142..0.155 rows=10)
--          Sort Key: order_date DESC
--          Sort Method: top-N heapsort  Memory: 26kB  -- Only tracks top 10!
--          ->  Seq Scan on orders (rows=100000)
 
-- Aggregate operators
EXPLAIN ANALYZE  
SELECT department_id, COUNT(*), AVG(salary)
FROM employees
GROUP BY department_id;
 
--  HashAggregate  (actual time=15.234..15.456 rows=50 loops=1)
--    Group Key: department_id
--    Batches: 1  Memory Usage: 24kB    -- Fit in memory
--    ->  Seq Scan on employees (rows=50000)

Aggregate Operator Types
Operator	How It Works	Memory Usage
HashAggregate	Hash groups, compute aggregates in hash table	Proportional to distinct groups
GroupAggregate	Requires sorted input, aggregates consecutive groups	Minimal (streaming)
Aggregate	Single aggregate over entire input (no GROUP BY)	Minimal (single state)
WindowAgg	Window functions over partitions	Varies by window definition

Disk Sorts Destroy Performance

Special Operators

Beyond core access, join, and processing operators, you'll encounter specialized operators for specific situations:

Materialize

Caches the output of a subplan for reuse.

Stores rows in memory (or disk if too large)
Useful when the same subquery result is needed multiple times
Warning sign if materializing large result sets

Special Operators Reference
Operator	Purpose	Watch For
Materialize	Cache subplan output for reuse	Large materializations consuming memory
Subquery Scan	Wrap subquery as a scan source	Usually benign; check underlying operations
Append	Concatenate outputs (UNION ALL, partitions)	Many children = many partition scans
Unique	Remove duplicates from sorted input	Requires pre-sorted input
SetOp	INTERSECT, EXCEPT operations	Requires sorted inputs; may add Sorts
Limit	Restrict output row count	Good when early (stops child operations early)
Result	Evaluate constant expressions	No table access; very cheap
Gather/Gather Merge	Collect parallel worker output	Indicates parallel query execution
Parallel Seq Scan	Seq Scan distributed across workers	Good for large scans on multi-core
CTE Scan	Read from Common Table Expression	May materialize CTE; watch for repeated reads

Special Operators in Plans

PostgreSQL

-- Materialize: Caching subquery results
EXPLAIN ANALYZE
SELECT * FROM products p
WHERE p.id IN (
    SELECT product_id FROM popular_products
);
 
--  Hash Semi Join
--    ->  Seq Scan on products p
--    ->  Hash
--          ->  Materialize  (actual time=0.01..0.45 rows=100)
--                ->  Seq Scan on popular_products
 
-- Parallel Query Execution (PostgreSQL)
EXPLAIN ANALYZE
SELECT COUNT(*) FROM very_large_table WHERE status = 'active';
 
--  Finalize Aggregate  (actual time=234.567..234.568 rows=1)
--    ->  Gather  (actual time=234.123..234.456 rows=3)
--          Workers Planned: 2
--          Workers Launched: 2
--          ->  Partial Aggregate  (actual time=230.123..230.124 rows=1)
--                ->  Parallel Seq Scan on very_large_table
--                      (actual time=0.034..225.456 rows=333333 loops=3)
--                      Filter: (status = 'active')
-- 3 loops = 1 leader + 2 parallel workers
 
-- Append for partitioned tables
EXPLAIN ANALYZE
SELECT * FROM orders_partitioned WHERE order_date >= '2024-01-01';
 
--  Append  (actual time=0.012..45.234 rows=50000)
--    ->  Seq Scan on orders_2024_q1  (rows=25000)
--    ->  Seq Scan on orders_2024_q2  (rows=25000)
--    ->  Seq Scan on orders_2024_q3  (rows=0)  -- Partition pruned or empty
--    ->  Seq Scan on orders_2024_q4  (rows=0)  -- Not yet populated

Parallel Query Optimization

Summary: Plan Operators Quick Reference

You now have a comprehensive vocabulary for reading execution plans. Here's a quick reference for the most important operators and their implications:

Operator Quick Reference
Operator	Good Sign	Warning Sign
Seq Scan	Small table, need most rows	Large table with selective filter
Index Scan	Selective filter uses index	Loops > 1000 (check nested loops)
Index Only Scan	All columns from index (covering)	Rare - this is almost always good
Nested Loop	Small outer, indexed inner	Large outer, Seq Scan inner
Hash Join	Memory Usage << work_mem, Batches: 1	Disk spill, many batches
Merge Join	Pre-sorted inputs (from indexes)	External Sort added for join
Sort	quicksort in memory, top-N heapsort	external merge Disk: XkB
HashAggregate	Batches: 1, fits in memory	Many batches, disk spill

Key Takeaways

•Data access operators determine how data is retrieved — indexes enable selective access; their absence forces full scans.
•Join operators vary dramatically in performance — Nested Loop is best for small/indexed, Hash for medium unordered, Merge for large sorted data.
•Memory vs. disk is the critical boundary — in-memory sorts and hash tables are fast; disk spills are 10-100x slower.
•Loop counts multiply everything — even fast operations become slow when looped thousands of times.
•Operator vocabulary enables diagnosis — seeing 'Hash Join with disk spill' immediately tells you the problem and solution.

What's Next:

Page Complete

3 / 5