Loading learning content...
In the previous page, we established that covering indexes allow databases to answer queries without accessing the underlying table. But how exactly does the database execute this optimization? How does the query optimizer decide to use an index-only scan? And what hidden requirements must be satisfied for index-only scans to actually work?
This page takes you inside the database engine to understand index-only scans—the execution mechanism that transforms covering indexes from a theoretical concept into measurable performance gains. We'll examine the complete lifecycle: detection, planning, execution, and verification.
By the end of this page, you will understand how the query optimizer identifies index-only scan opportunities, the role of visibility maps in MVCC databases, platform-specific behaviors across PostgreSQL, SQL Server, MySQL, and Oracle, and how to verify and diagnose index-only scan execution using EXPLAIN plans.
An index-only scan is a query execution strategy where the database engine retrieves all required data directly from an index, completely bypassing the heap (table) pages. This is fundamentally different from a regular index scan, which uses the index to locate rows and then fetches those rows from the table.
The Execution Flow Comparison
Key Characteristics of Index-Only Scans
| Characteristic | Regular Index Scan | Index-Only Scan |
|---|---|---|
| Data Source | Index + Heap | Index only |
| I/O Pattern | Sequential (index) + Random (heap) | Sequential (index) only |
| Row Visibility Check | Via heap page | Via visibility map (MVCC) or lock |
| Memory Pressure | High (heap pages in buffer) | Low (only index pages) |
| Scale Behavior | Degrades with result size | Constant relative to result size |
| Prerequisite | Suitable index exists | Covering index + visibility requirements |
The critical insight: index-only scans don't just reduce I/O—they eliminate an entire category of I/O (random heap access). This transforms query performance from being dominated by the most expensive operation to using only the most efficient operation (sequential index access).
The query optimizer evaluates multiple execution plans and selects the plan with the lowest estimated cost. For index-only scans to be chosen, several conditions must be met.
The Optimizer's Decision Tree
Detailed Decision Criteria
1. Column Coverage Check
The optimizer first identifies all columns referenced in the query:
SELECT list columns (output)WHERE clause columns (filtering)JOIN condition columnsORDER BY columns (sorting)GROUP BY columns (grouping)HAVING clause columnsIf any required column is missing from the index, index-only scan is immediately ruled out.
2. Index Selectivity Analysis
Even with a covering index, the optimizer compares costs:
Index-Only Scan Cost = (Index Pages to Read) × (Page Read Cost)
Regular Index Scan Cost = (Index Pages) × (Page Read Cost) +
(Expected Rows) × (Random I/O Cost)
Table Scan Cost = (Table Pages) × (Sequential Read Cost)
The optimizer chooses the plan with the lowest total cost. For highly selective queries (few matching rows), even a non-covering index might be preferred if the table lookup cost is minimal.
123456789101112131415161718192021
-- Analyze how PostgreSQL evaluates index-only scan opportunity-- Step 1: Create a covering indexCREATE INDEX idx_orders_covering ON orders (customer_id, order_date) INCLUDE (total_amount, status); -- Step 2: Examine the execution plan with costsEXPLAIN (ANALYZE, COSTS, BUFFERS) SELECT customer_id, order_date, total_amount, statusFROM ordersWHERE customer_id = 12345AND order_date >= '2024-01-01'; -- Example output showing index-only scan:-- Index Only Scan using idx_orders_covering on orders-- Index Cond: ((customer_id = 12345) AND (order_date >= '2024-01-01'))-- Heap Fetches: 0 <-- KEY INDICATOR: Zero heap access!-- Buffers: shared hit=4 -- Heap Fetches > 0 indicates visibility map misses-- requiring fallback to heap for row visibility verificationIn databases using Multi-Version Concurrency Control (MVCC)—including PostgreSQL, MySQL InnoDB, and Oracle—there's a subtle but critical challenge for index-only scans: row visibility determination.
The MVCC Visibility Problem
MVCC databases may have multiple versions of the same row, each visible to different transactions. When reading data, the database must determine which version (if any) is visible to the current transaction.
Normally, visibility is determined by examining the row's heap tuple header, which contains:
But here's the problem: index entries don't contain visibility information. The index only stores the key values and a pointer (TID/RID) to the heap tuple.
If we must access the heap to check visibility, we lose the entire benefit of index-only scans! The database would need to read heap pages anyway, negating the I/O savings. This was a fundamental limitation that prevented true index-only scans in early MVCC implementations.
The Visibility Map Solution (PostgreSQL)
PostgreSQL solved this with the visibility map—a compact auxiliary structure that tracks which heap pages contain only tuples visible to all transactions.
Visibility Map Structure:
How Index-Only Scan Uses the Visibility Map:
Critical Implication: Index-only scans work best on tables that have been vacuumed recently. VACUUM updates the visibility map, marking pages as "all-visible" after dead tuples are removed.
12345678910111213141516171819202122232425262728293031
-- Analyze visibility map status and its impact on index-only scans -- Check visibility map coverage using pg_visibility extensionCREATE EXTENSION IF NOT EXISTS pg_visibility; -- View visibility map statistics for a tableSELECT * FROM pg_visibility_map_summary('orders');-- Returns: all_visible (pages marked visible), all_frozen (pages frozen) -- Detailed per-page visibilitySELECT * FROM pg_visibility('orders') LIMIT 10;-- Returns: blkno, all_visible, all_frozen -- Check EXPLAIN output for heap fetchesEXPLAIN (ANALYZE, BUFFERS) SELECT customer_id, order_date, total_amountFROM ordersWHERE customer_id BETWEEN 1 AND 1000; -- "Heap Fetches: 1523" means 1523 tuples required heap access-- due to visibility map indicating non-visible pages -- Solution: VACUUM to update visibility mapVACUUM ANALYZE orders; -- Re-run EXPLAIN - should see reduced or zero heap fetchesEXPLAIN (ANALYZE, BUFFERS) SELECT customer_id, order_date, total_amountFROM ordersWHERE customer_id BETWEEN 1 AND 1000;-- "Heap Fetches: 0" - true index-only scan achieved!A high 'Heap Fetches' count in an index-only scan indicates poor visibility map coverage. This typically means the table has high write activity without sufficient vacuuming. For write-heavy tables, consider more aggressive autovacuum settings to maintain index-only scan efficiency.
Each major database platform implements index-only scans differently, with unique terminology, capabilities, and limitations.
| Aspect | PostgreSQL | SQL Server | MySQL InnoDB | Oracle |
|---|---|---|---|---|
| Terminology | Index Only Scan | Index Seek/Scan (no Key Lookup) | Using index | INDEX FAST FULL SCAN / INDEX RANGE SCAN |
| INCLUDE Clause | Yes (v11+) | Yes | No (columns in key) | No (must use key columns) |
| Visibility Handling | Visibility Map | Lock-based (less overhead) | Implicit (clustered key) | Consistent read via undo |
| EXPLAIN Indicator | Index Only Scan, Heap Fetches | No Key Lookup in plan | Extra: Using index | Rowsource shows INDEX access |
| Prerequisites | VACUUM for visibility map | None special | All columns in index key | Index must have all columns |
PostgreSQL Index-Only Scan Details
Best Practices:
pg_stat_user_indexes.idx_scan vs idx_tup_fetchVACUUM FREEZE for read-heavy, rarely-modified tablesReading and interpreting execution plans is essential for verifying that your covering indexes are actually enabling index-only scans. Let's examine what to look for in each platform's EXPLAIN output.
12345678910111213141516171819202122232425262728
-- PostgreSQL: Confirming Index-Only Scan -- Query with covering indexEXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)SELECT customer_id, email, signup_dateFROM customersWHERE customer_id BETWEEN 1000 AND 2000; -- POSITIVE OUTPUT (Index-Only Scan):-- Index Only Scan using idx_customers_covering on customers-- Index Cond: ((customer_id >= 1000) AND (customer_id <= 2000))-- Heap Fetches: 0-- Buffers: shared hit=12-- → "Index Only Scan" = correct plan type-- → "Heap Fetches: 0" = no table access needed-- → Only index buffers accessed -- NEGATIVE OUTPUT (Fallback to heap):-- Index Only Scan using idx_customers_covering on customers-- Index Cond: ((customer_id >= 1000) AND (customer_id <= 2000))-- Heap Fetches: 487-- → Heap Fetches > 0 means visibility map check failed for some pages-- → Solution: VACUUM ANALYZE customers; -- REALLY NEGATIVE OUTPUT (Not index-only at all):-- Index Scan using idx_customers_partial on customers-- Index Cond: ...-- → "Index Scan" (not "Index Only Scan") = table access requiredWhile index-only scans are powerful, they have limitations and edge cases that can prevent their use or reduce their effectiveness.
SELECT *, you request all table columns. Unless your index includes every column (rarely practical), index-only scan is impossible.SELECT UPPER(name) might work, but complex expressions might not.IS NULL conditions may not use the index, or the index may not cover rows with NULL values.12345678910111213141516171819202122232425262728293031323334353637
-- Example 1: SELECT * breaks index-only scan-- Even with a "covering" index, SELECT * failsCREATE INDEX idx_users_covering ON users (user_id) INCLUDE (email, name, created_at); -- This WILL use index-only scan:EXPLAIN SELECT user_id, email, name, created_at FROM users WHERE user_id = 100;-- "Index Only Scan" -- This will NOT (assumes table has more columns):EXPLAIN SELECT * FROM users WHERE user_id = 100;-- "Index Scan" (not "Index Only Scan") -- Example 2: TOAST column limitation-- Large text columns stored in TOAST tablesCREATE TABLE documents ( doc_id INT PRIMARY KEY, title VARCHAR(255), content TEXT -- Likely TOAST-ed if large); CREATE INDEX idx_docs_covering ON documents (doc_id) INCLUDE (title, content); -- Index-only scan will still need TOAST access for large content values-- The index stores a pointer to TOAST, not the actual content -- Example 3: Expression preventing index-only scanCREATE INDEX idx_users_email ON users (email) INCLUDE (name); -- Works (simple column reference):EXPLAIN SELECT email, name FROM users WHERE email = 'test@example.com'; -- May not work (function on column):EXPLAIN SELECT email, name, LENGTH(email) FROM users WHERE email LIKE 'test%';-- LENGTH(email) might require heap access depending on PostgreSQL versionAvoiding SELECT * is one of the most impactful query optimization habits. By explicitly listing only needed columns, you enable index coverage, reduce network transfer, and make your code more maintainable. Modern ORMs often generate SELECT *, making this a common performance pitfall.
We've taken a deep dive into the execution mechanism behind covering indexes. Here are the key insights:
What's Next
Now that you understand how index-only scans work, we'll explore the performance benefits in depth—quantifying the I/O savings, memory efficiency improvements, and real-world benchmarks that demonstrate the transformative impact of well-designed covering indexes.
You now understand the complete lifecycle of index-only scans: how optimizers detect coverage, how visibility is verified in MVCC databases, platform-specific behaviors, and how to diagnose index-only scan execution. This knowledge enables you to design, verify, and troubleshoot covering indexes in production systems.