Loading content...
If table scans are the brute-force approach to data retrieval, index scans are the surgical precision alternative. An index scan uses a pre-built index structure (typically a B+ tree) to locate exactly which rows satisfy a query predicate, then retrieves only those rows from the base table.
Index scans enable databases to answer queries like 'find user with ID 12345 among 100 million users' by examining only 3-4 index pages and 1 data page, rather than scanning all 100 million rows. This is the fundamental technique that makes transactional databases practical at scale.
By the end of this page, you will understand the complete anatomy of index scans, including how the index is traversed, how matching data rows are retrieved, when index scans outperform table scans (and vice versa), and the critical role of selectivity in access method decisions.
An index scan is a two-phase operation: first, traverse the index to find pointers to matching rows; second, follow those pointers to retrieve the actual row data from the base table. Let's examine each phase in detail.
Phase 1: Index Traversal
For a B+ tree index (the most common type), traversal proceeds as follows:
Root Page Access: Start at the root page of the B+ tree. This page contains separator keys that direct the search to the appropriate child subtree.
Internal Node Navigation: At each internal node, perform binary search on the separator keys to find the correct child pointer. Follow the pointer to the next level.
Leaf Page Arrival: Upon reaching a leaf page, the actual index entries are found. Each leaf entry contains the indexed column value(s) plus a row identifier (RID) that points to the corresponding row in the base table.
Range Scan (if applicable): For range queries or multi-row results, the scan continues along the leaf level using sibling pointers (B+ tree leaves are linked as a doubly-linked list).
123456789101112131415161718192021222324252627282930313233343536
function IndexScan(index, search_key, predicate): results = [] // Phase 1: Traverse index to find starting leaf current_page = index.root_page while current_page.is_internal(): // Binary search within the page for correct child child_idx = binary_search(current_page.keys, search_key) current_page = buffer_pool.fetch(current_page.children[child_idx]) // current_page is now a leaf page // Phase 1b: Find starting position within leaf entry_idx = binary_search(current_page.entries, search_key) // Phase 2: Collect matching RIDs and fetch rows while current_page != NULL: for each entry in current_page.entries[entry_idx:]: if not index_key_matches_predicate(entry.key, predicate): // Past the range, we're done return results // Fetch the actual row from the base table row_page = buffer_pool.fetch(entry.rid.page_id) row = row_page.get_row(entry.rid.slot) // Final predicate check (non-indexed columns) if full_predicate_matches(row, predicate): results.append(row) // Move to next leaf page (range scan) current_page = buffer_pool.fetch(current_page.next_leaf) entry_idx = 0 return resultsPhase 2: Row Retrieval (Heap Fetches)
The index provides RIDs, but we still need the actual row data. This involves:
Page Fetch: The RID contains a page ID and slot number. The database fetches the page from disk (or buffer pool) containing the row.
Slot Access: Within the page, a slot directory maps slot numbers to actual byte offsets where rows are stored.
Row Extraction: The complete row is read, including all columns (not just indexed columns).
This phase is where index scans can become expensive. If matching rows are scattered across many different pages, each row retrieval becomes a separate random I/O operation.
Index scans trade sequential I/O (reading consecutive pages) for random I/O (jumping to different pages for each row). While the index itself can be traversed with 3-4 I/Os regardless of table size, fetching matching rows may require one random I/O per row. This is why index scans are not always faster than table scans.
Accurate cost estimation for index scans requires modeling both index traversal and row retrieval phases.
Index Traversal Cost
For a B+ tree with N leaf pages and branching factor F, the height is:
Height = ⌈log_F(N)⌉
With F typically 100-500 (depending on key size and page size), even indexes with billions of entries have heights of 3-5. The traversal cost is:
Traversal Cost = Height × t_random_I/O
This is nearly constant regardless of table size—the logarithmic magic of B+ trees.
| Leaf Pages | Approximate Rows | Tree Height | Traversal I/Os |
|---|---|---|---|
| 200 | 20,000 | 1 | 2 (root + 1 leaf) |
| 40,000 | 4,000,000 | 2 | 3 (root + 1 internal + 1 leaf) |
| 8,000,000 | 800,000,000 | 3 | 4 pages |
| 1,600,000,000 | 160 billion | 4 | 5 pages |
Row Retrieval Cost
After finding matching index entries, each row must be fetched. If the query matches M rows and they are scattered across P distinct pages:
Retrieval Cost = P × t_random_I/O
In the worst case (rows uniformly distributed across all pages), P approaches M (one page per row). In the best case (rows clustered together), P could be much smaller than M.
The Clustering Factor
The clustering factor (or correlation) measures how well the physical order of rows matches the index order. It ranges from:
The clustering factor is critical for index scan cost estimation:
12345678910111213141516171819202122232425262728293031323334353637383940
= Cost Components = Index Traversal Cost: C_traverse = Height × t_random_IO ≈ log_F(N) × t_random_IO ≈ 3-5 random I/Os for any practical table size Leaf Scan Cost (for range queries): C_leaf_scan = Matching_Leaf_Pages × t_sequential_IO Row Retrieval Cost: Best case (perfect clustering): C_retrieval = Matching_Pages × t_random_IO Worst case (no clustering): C_retrieval = Matching_Rows × t_random_IO General formula (using clustering factor CF): C_retrieval ≈ Selectivity × CF × t_random_IO where CF is between NumLeafPages and NumRows = Total Index Scan Cost = C_total = C_traverse + C_leaf_scan + C_retrieval Example: Finding 1,000 rows from 1,000,000 row table- Height = 3, t_random = 5ms, t_seq = 0.1ms- Selectivity = 0.1%- CF = 100,000 (moderately clustered) C_total = 3 × 5ms + 10 × 0.1ms + 0.001 × 100,000 × 5ms = 15ms + 1ms + 500ms = 516ms Same query via table scan:C_table_scan = 10,000 pages × 0.1ms = 1,000ms Index scan wins! (516ms vs 1,000ms)A well-clustered index (CF close to number of pages) makes index scans extremely efficient because multiple matching rows share the same pages. A poorly-clustered index (CF close to number of rows) can make index scans more expensive than table scans even at low selectivity.
Selectivity is the fraction of rows that match a query predicate. It's the single most important factor in choosing between index scan and table scan.
Let:
Table Scan Cost:
C_table = B × t_seq
Index Scan Cost (simplified, worst-case clustering):
C_index = Height + s × N × t_rand
Crossover Point:
Index scan is cheaper when:
Height + s × N × t_rand < B × t_seq
Solving for s:
s < (B × t_seq - Height) / (N × t_rand)
| Scenario | t_rand/t_seq Ratio | Typical Crossover Selectivity |
|---|---|---|
| HDD, poor clustering | 50× (5ms vs 0.1ms) | ~2-5% |
| HDD, good clustering | 50× but pages shared | ~15-25% |
| SSD, poor clustering | 5× (0.2ms vs 0.04ms) | ~10-20% |
| SSD, good clustering | 5× but pages shared | ~30-40% |
| Fully cached (memory) | 1× (no I/O difference) | ~40-60% |
Practical Guidelines:
These are rules of thumb. Real optimizers compute actual costs using statistics.
Query optimizers use collected statistics (histograms, distinct value counts, clustering factors) to estimate selectivity and compute expected costs. When an optimizer chooses a table scan over an available index, it often means the optimizer estimated high selectivity. Trust the optimizer—but verify with EXPLAIN when results seem wrong.
Database systems implement several variants of index scans to handle different query patterns.
1. Unique Index Scan
When scanning a unique index for an equality predicate, at most one row can match. The scan terminates immediately after finding (or not finding) the key:
SELECT * FROM users WHERE user_id = 12345; -- Unique index on user_id
Cost: Height + 1 (one row fetch), regardless of table size.
2. Range Scan
For range predicates, the index locates the starting point, then scans leaf pages until the range ends:
SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';
Cost: Height + Leaf Pages in Range + Row Fetches.
12345678910111213141516171819202122232425262728
-- Unique Index Scan (equality on unique column)EXPLAIN SELECT * FROM users WHERE user_id = 12345;/*Index Scan using users_pkey on users (cost=0.42..8.44 rows=1 width=...) Index Cond: (user_id = 12345)*/ -- Range Scan (range predicate)EXPLAIN SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';/*Index Scan using idx_orders_date on orders (cost=0.43..1523.45 rows=12000 width=...) Index Cond: ((order_date >= '2024-01-01') AND (order_date <= '2024-01-31'))*/ -- Backward Scan (DESC order matches reverse index traversal)EXPLAIN SELECT * FROM orders ORDER BY order_date DESC LIMIT 10;/*Index Scan Backward using idx_orders_date on orders (cost=0.43..50.23 rows=10 width=...)*/ -- Index scan with filter (index used for partial predicate)EXPLAIN SELECT * FROM orders WHERE order_date = '2024-01-15' AND status = 'shipped';/*Index Scan using idx_orders_date on orders (cost=0.43..200.12 rows=50 width=...) Index Cond: (order_date = '2024-01-15') Filter: (status = 'shipped') -- Applied after row fetch Rows Removed by Filter: 150*/3. Multi-Column Index Scan
Composite indexes on multiple columns enable efficient scans when the query matches a prefix of the index columns:
-- Index on (customer_id, order_date)
SELECT * FROM orders WHERE customer_id = 123 AND order_date > '2024-01-01'; -- Uses full index
SELECT * FROM orders WHERE customer_id = 123; -- Uses index (prefix)
SELECT * FROM orders WHERE order_date > '2024-01-01'; -- Cannot use index efficiently!
4. Index Skip Scan
Some databases (Oracle, recent PostgreSQL) can perform 'skip scans' when the first column of a composite index is not in the predicate. The scan jumps through distinct values of the first column:
-- Index on (gender, age)
SELECT * FROM people WHERE age = 25;
-- Skip scan: find age=25 for gender='M', then gender='F', etc.
Skip scans are only efficient when the skipped column has low cardinality.
The efficiency of index scans depends heavily on the relationship between logical order (as defined by the index) and physical order (how rows are actually stored on disk).
Heap Tables vs Clustered Tables
In a heap table, rows are stored in insertion order (or wherever free space exists). The physical order has no relationship to any index's logical order. Index scans on heap tables typically exhibit poor locality—consecutive index entries point to rows scattered across the entire table.
In a clustered table (or an index-organized table in Oracle), rows are physically stored in the order of a specific index (usually the primary key). Index scans using the clustering index have excellent locality—consecutive entries often point to the same or adjacent pages.
Implications for Index Design
Primary key clustering: In MySQL InnoDB, tables are always clustered by primary key. This makes PK-based index scans extremely efficient.
Secondary index overhead: Secondary indexes in clustered tables contain the clustering key (not just a RID), which adds storage but enables efficient row lookup.
Index correlation: Even in heap tables, indexes on columns correlated with insertion order (like auto-increment IDs, timestamps) will have better clustering factors.
CLUSTER command: Some databases (PostgreSQL) allow explicitly reordering heap tables to match an index, improving that index's clustering factor (until new inserts disorder it again).
In PostgreSQL, check pg_stats.correlation for a column to see its clustering factor (-1 to 1, where ±1 means perfect clustering). In Oracle, DBA_INDEXES.CLUSTERING_FACTOR shows the absolute value. In SQL Server, analyze the page-level RIDs in sys.dm_db_index_physical_stats.
Modern databases employ several optimizations to accelerate index scans and mitigate their random I/O characteristics.
1. Index Prefetching
During index scans, databases can analyze upcoming leaf entries and initiate asynchronous reads for the pages they reference. This hides I/O latency by overlapping fetch operations:
Time without prefetch: [Index Entry 1] → [Wait for I/O 1] → [Process Row 1] → [Entry 2] → [Wait for I/O 2]...
Time with prefetch: [Index Entry 1] → [Wait I/O 1] → [Process Row 1] → [Process Row 2 (already prefetched)]...
2. Batched Key Lookup
Instead of fetching rows one at a time, collect a batch of RIDs, sort them by page ID, then fetch in page order. This converts random I/O into more sequential pattern and allows page reuse:
123456789101112131415161718192021222324
function BatchedIndexScan(index, predicate, batch_size=1000): results = [] rid_batch = [] // Collect RIDs from index for each entry in index.range_scan(predicate): rid_batch.append(entry.rid) if len(rid_batch) >= batch_size: // Sort RIDs by page ID to minimize random I/O rid_batch.sort(by=lambda rid: rid.page_id) // Fetch rows in page order for rid in rid_batch: row = fetch_row(rid) if full_predicate_matches(row, predicate): results.append(row) rid_batch = [] // Process remaining batch // ... same sorting and fetching logic return results3. Bitmap Heap Scan (PostgreSQL)
A hybrid approach where the index is scanned to build an in-memory bitmap of qualifying page IDs, then pages are fetched in physical order:
EXPLAIN SELECT * FROM orders WHERE customer_id IN (1, 2, 3, 4, 5);
/*
Bitmap Heap Scan on orders
Recheck Cond: (customer_id = ANY ('{1,2,3,4,5}'))
-> Bitmap Index Scan on idx_orders_customer
Index Cond: (customer_id = ANY ('{1,2,3,4,5}'))
*/
This approach is covered in detail in the Bitmap Scan page.
4. Index-Only Scans
If all columns needed by the query are present in the index, the row fetch phase can be skipped entirely. This is covered in the Index-Only Scan page.
Understanding how index scans appear in query plans across different database systems is essential for performance analysis.
12345678910111213141516171819202122232425262728293031323334353637383940
-- Create sample table and indexCREATE TABLE products ( product_id SERIAL PRIMARY KEY, category_id INTEGER NOT NULL, name VARCHAR(100), price DECIMAL(10,2));CREATE INDEX idx_products_category ON products(category_id); -- Analyze query using indexEXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM products WHERE category_id = 42; /*Index Scan using idx_products_category on products (cost=0.42..248.56 rows=500 width=48) (actual time=0.023..1.234 rows=487 loops=1) Index Cond: (category_id = 42) Buffers: shared hit=156 read=12Planning Time: 0.089 msExecution Time: 1.456 ms Key observations:- "Index Scan using idx_products_category" = index scan access method- "Index Cond: (category_id = 42)" = condition pushed to index- "Buffers: shared hit=156 read=12" = 156 pages from cache, 12 from disk- "rows=500" estimated vs "rows=487" actual = good cardinality estimate*/ -- Range scan exampleEXPLAIN (ANALYZE, BUFFERS) SELECT * FROM products WHERE category_id BETWEEN 10 AND 20; /*Index Scan using idx_products_category on products (cost=0.42..3452.78 rows=5500 width=48) (actual time=0.034..45.678 rows=5423 loops=1) Index Cond: ((category_id >= 10) AND (category_id <= 20)) Buffers: shared hit=1234 read=567*/123456789101112131415161718192021222324
-- MySQL EXPLAIN outputEXPLAIN SELECT * FROM products WHERE category_id = 42; /*+----+-------------+----------+------+----------------------+----------------------+---------+-------+------+-------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+----------+------+----------------------+----------------------+---------+-------+------+-------+| 1 | SIMPLE | products | ref | idx_products_category| idx_products_category| 4 | const | 500 | NULL |+----+-------------+----------+------+----------------------+----------------------+---------+-------+------+-------+ Key columns:- type="ref": index lookup (eq_ref for unique, range for range scans)- key="idx_products_category": index being used- key_len="4": bytes of index being used (useful for composite indexes)- rows="500": estimated rows to examine*/ -- MySQL 8.0+ EXPLAIN ANALYZE for actual execution statsEXPLAIN ANALYZE SELECT * FROM products WHERE category_id = 42; /*-> Index lookup on products using idx_products_category (category_id=42) (cost=50.25 rows=500) (actual time=0.045..1.234 rows=487 loops=1)*/In execution plans, 'Index Cond' (PostgreSQL) or the index key usage (MySQL) indicates conditions evaluated during index traversal. 'Filter' indicates conditions evaluated after fetching rows. Filters mean the index wasn't fully utilized, and extra rows were fetched then discarded.
You now have comprehensive knowledge of index scans—the primary alternative to table scans for selective queries. Here are the essential takeaways:
What's Next:
The next page covers Index-Only Scans—a powerful optimization that eliminates the row fetch phase entirely by storing all needed columns within the index itself. This approach can dramatically accelerate queries when applicable.
You now understand index scans at a deep level—their mechanics, cost models, selectivity crossover points, and optimization strategies. This knowledge is essential for understanding when indexes help, when they hurt, and how to design effective indexing strategies.