Loading content...
The most expensive part of an index scan is typically not the index traversal—it's fetching the actual rows from the base table. Each row fetch can require a random I/O operation, and when retrieving thousands of rows, this overhead dominates query execution time.
But what if you could skip the row fetch entirely? What if the index itself contained all the data needed to answer the query? This is the essence of an index-only scan (also called a covering index scan or index-only access)—one of the most powerful performance optimizations in relational databases.
By the end of this page, you will understand how index-only scans work, why they provide such dramatic performance improvements, how to design covering indexes, and the subtle requirements that must be met for an index-only scan to be possible.
A standard B+ tree index stores index keys (the columns being indexed) and row identifiers (pointers to the actual rows in the base table). When a query needs columns beyond the index keys, the database must:
Step 3 is the bottleneck—potentially one random I/O per row.
The Index-Only Alternative:
If the index includes all columns referenced by the query (in SELECT, WHERE, ORDER BY, GROUP BY, etc.), the database can:
12345678910111213141516171819202122
-- Table: orders (order_id, customer_id, order_date, total_amount, status)-- Index: idx_orders_customer_date ON orders(customer_id, order_date) -- Query 1: Requires table access (needs 'status' column not in index)SELECT customer_id, order_date, status FROM orders WHERE customer_id = 1234;-- Execution: Index Scan → For each match, fetch row → Extract status -- Query 2: Index-only possible (all columns in index)SELECT customer_id, order_date FROM orders WHERE customer_id = 1234;-- Execution: Index Scan → Return data directly from index leaf entries-- NO TABLE ACCESS REQUIRED! -- Query 3: Aggregation with index-onlySELECT customer_id, COUNT(*), MIN(order_date), MAX(order_date)FROM orders GROUP BY customer_id;-- If idx_orders_customer_date covers this, entire aggregation runs -- without touching the base tableIndex-only scans can be 10× to 100× faster than equivalent index scans with row fetches, especially when the table has wide rows (many columns) or poor clustering. You read only the compact index pages instead of full row pages.
Several conditions must be met for an index-only scan to be possible:
Requirement 1: All Query Columns Must Be In The Index
Every column referenced anywhere in the query must be present in the index. This includes:
12345678910111213141516171819202122
-- Index: (customer_id, order_date, total_amount) -- ✓ Index-only possible: all columns coveredSELECT customer_id, SUM(total_amount)FROM ordersWHERE order_date > '2024-01-01'GROUP BY customer_id; -- ✗ NOT index-only: 'status' not in indexSELECT customer_id, statusFROM ordersWHERE order_date > '2024-01-01'; -- ✗ NOT index-only: 'shipping_address' in expressionSELECT customer_idFROM ordersWHERE LENGTH(shipping_address) > 100; -- ✓ Index-only possible: expression uses only indexed columnsSELECT customer_idFROM ordersWHERE total_amount * 1.1 > 100;Requirement 2: Visibility Information (PostgreSQL-specific)
In PostgreSQL's MVCC implementation, each row has visibility information (transaction IDs indicating when the row was created/deleted). This information is stored in the heap (table), not in indexes.
For an index-only scan to work, PostgreSQL must be able to verify that matching rows are visible to the current transaction without reading the heap. This is possible only when:
This is why VACUUM is critical for PostgreSQL index-only scan performance—it sets visibility map bits.
| Database | Column Coverage | Additional Requirements |
|---|---|---|
| PostgreSQL | All columns in index | Visibility map must confirm row visibility (run VACUUM regularly) |
| MySQL InnoDB | All columns in index (including PK for secondary indexes) | Secondary indexes include PK columns automatically; covering index must include all needed columns |
| Oracle | All columns in index | Null values require index to store NULLs (some index types don't) |
| SQL Server | All columns in index or INCLUDE clause | Columnstore indexes have different rules |
SELECT * defeats index-only scans because it references all columns, most of which aren't indexed. Even if you end up using only a few columns in application code, * forces the database to fetch full rows. Always specify explicit column lists for queries that could benefit from index-only scans.
A covering index is an index designed specifically to enable index-only scans for particular queries. The term emphasizes that the index 'covers' all columns needed by the query.
Design Strategy: Query-Driven Indexing
To create a covering index:
Example: E-commerce Order Analytics
1234567891011121314151617181920212223242526272829
-- Frequent analytics query:SELECT customer_id, DATE_TRUNC('month', order_date) as order_month, COUNT(*) as order_count, SUM(total_amount) as total_spentFROM ordersWHERE order_date >= '2024-01-01'GROUP BY customer_id, DATE_TRUNC('month', order_date); -- Columns needed:-- - WHERE: order_date (for filtering)-- - SELECT/GROUP BY: customer_id, order_date (for grouping)-- - SELECT: total_amount (for aggregation) -- Covering index design:CREATE INDEX idx_orders_covering ON orders( order_date, -- Leading column for WHERE filter (range scan) customer_id, -- GROUP BY column (secondary ordering helps grouping) total_amount -- Aggregate source (no ordering needed)); -- Alternatively, in PostgreSQL 11+ or SQL Server, use INCLUDE:CREATE INDEX idx_orders_covering_v2 ON orders(order_date, customer_id)INCLUDE (total_amount); -- INCLUDE columns are stored in leaf pages but not in non-leaf pages,-- reducing index size and maintenance cost for columns only needed-- in SELECT (not WHERE or ORDER BY)The INCLUDE Clause (PostgreSQL 11+, SQL Server)
The INCLUDE clause allows adding columns to an index's leaf entries without including them in the index's sort order. Benefits:
Key vs Included Column Tradeoffs
Covering indexes are larger than minimal indexes because they store additional columns. This increases storage requirements and write overhead (every INSERT/UPDATE must update more index data). Design covering indexes for your most critical queries, not for every possible query.
The cost advantage of index-only scans comes from eliminating the row fetch phase entirely. Let's quantify this benefit.
Standard Index Scan Cost:
C_standard = Height + LeafPages + RowFetches × t_random
Index-Only Scan Cost:
C_index_only = Height + LeafPages
The savings equal the eliminated row fetch cost:
Savings = RowFetches × t_random
| Scenario | Matching Rows | Standard Index Scan | Index-Only Scan | Speedup |
|---|---|---|---|---|
| Point query | 1 row | 4 pages + 1 fetch ≈ 5 I/Os | 4 pages ≈ 4 I/Os | 1.25× |
| Small range | 100 rows, 80 pages | 4 + 80 random ≈ 84 I/Os | 4 pages ≈ 4 I/Os | 21× |
| Medium range | 10,000 rows, 5,000 pages | 4 + 5,000 random ≈ 5,004 I/Os | 50 leaf pages ≈ 50 I/Os | 100× |
| Large range (aggregation) | 1M rows, all pages | 4 + ~50,000 random | 1,000 leaf pages ≈ 1,000 I/Os | 50×+ |
Why Such Dramatic Improvements?
The speedup comes from multiple factors:
Example Comparison:
123456789101112131415161718192021222324252627282930313233343536
-- Table: transactions (1 million rows)-- Row size: 500 bytes-- Table pages: 62,500 pages (at 8KB/page, ~80 rows/page) -- Index on (account_id, transaction_date, amount)-- Index entry size: ~30 bytes-- Index leaf pages: ~4,500 pages -- Query: Monthly account summarySELECT account_id, DATE_TRUNC('month', transaction_date) as month, SUM(amount) as totalFROM transactionsWHERE transaction_date >= '2024-01-01'GROUP BY account_id, DATE_TRUNC('month', transaction_date); -- Assume 500,000 matching rows (6 months of data) === Standard Index Scan (non-covering index) ===Index traversal: 3 pagesIndex leaf scan: 2,250 pages (half the index)Row fetches: 500,000 rows × potentially different pages ≈ 15,000-40,000 random I/Os (depending on clustering) Total I/O: ~20,000+ I/OsAt 0.2ms/random I/O: ~4,000ms = 4 seconds === Index-Only Scan (covering index) ===Index traversal: 3 pagesIndex leaf scan: 2,250 pages (sequential)Row fetches: 0 (not needed!) Total I/O: 2,253 sequential I/OsAt 0.05ms/seq I/O: ~113ms = 0.1 seconds Speedup: ~35× faster!Index-only scans provide the greatest benefit for aggregation queries over large data sets. These queries touch many rows but only need specific columns. Without index-only access, they would require massive random I/O; with it, they zip through compact index pages.
PostgreSQL's MVCC implementation creates a unique challenge for index-only scans. Let's understand this in detail.
The Problem: Tuple Visibility
In PostgreSQL, each row (tuple) has visibility attributes (xmin, xmax) that determine which transactions can see it. This information is stored in the heap (table), not in indexes. When reading an index entry, PostgreSQL doesn't inherently know if the corresponding row is visible to the current transaction.
The Solution: Visibility Map
The visibility map is a companion structure for each table—a bitmap with one bit per table page. If the bit is set, all tuples on that page are visible to all transactions (the page is 'all-visible'). This means:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
-- Examine visibility map statisticsSELECT relname, n_tup_ins as inserts, n_tup_upd as updates, n_tup_del as deletes, n_live_tup as live_tuples, n_dead_tup as dead_tuplesFROM pg_stat_user_tablesWHERE relname = 'orders'; -- Check index-only scan usage in execution planEXPLAIN (ANALYZE, BUFFERS) SELECT customer_id, order_date FROM orders WHERE customer_id = 1234; /*Index Only Scan using idx_orders_customer_date on orders (cost=0.43..45.12 rows=200 width=12) (actual time=0.023..0.456 rows=189 loops=1) Index Cond: (customer_id = 1234) Heap Fetches: 23 -- <-- CRITICAL METRIC! Buffers: shared hit=15*/ -- "Heap Fetches: 23" means 23 tuples required visibility check via heap-- Ideal: Heap Fetches = 0 (100% index-only)-- If Heap Fetches ≈ actual rows, visibility map is unhelpful -- Force visibility map updateVACUUM orders; -- Re-run query after VACUUMEXPLAIN (ANALYZE, BUFFERS) SELECT customer_id, order_date FROM orders WHERE customer_id = 1234; /*Index Only Scan using idx_orders_customer_date on orders (cost=0.43..45.12 rows=200 width=12) (actual time=0.019..0.234 rows=189 loops=1) Index Cond: (customer_id = 1234) Heap Fetches: 0 -- <-- Perfect! Buffers: shared hit=8 -- Fewer buffers needed!*/VACUUM and Index-Only Scans
The visibility map is updated by VACUUM:
If you skip VACUUM:
This is why autovacuum is critical for PostgreSQL performance.
Each major database implements index-only scans differently. Understanding these differences is essential for cross-platform development.
| Database | Feature Name | INCLUDE Clause | Special Considerations |
|---|---|---|---|
| PostgreSQL | Index Only Scan | Yes (v11+) | Visibility map required; check Heap Fetches metric |
| MySQL InnoDB | Using index (covering) | No | Secondary indexes include PK implicitly; covering index must include all columns |
| Oracle | INDEX FAST FULL SCAN | No (but function-based indexes possible) | Must handle NULLs correctly; IOT (index-organized tables) are inherently covering |
| SQL Server | Index Scan (non-clustered, covering) | Yes | INCLUDE clause for non-key columns; columnstore has different rules |
| SQLite | COVERING INDEX | No | Automatically recognized when index covers query |
1234567891011121314151617181920212223242526272829303132333435
=== MySQL ===-- Check if query uses covering indexEXPLAIN SELECT customer_id, order_date FROM orders WHERE customer_id = 1234; -- Look for "Using index" in Extra column:-- +----+...+-------+------+---------------+------------------------+-- | id |...| type | key | Extra |-- +----+...+-------+------+---------------+------------------------+-- | 1 |...| ref | idx | Using index | <-- Covering index!-- +----+...+-------+------+---------------+------------------------+ -- MySQL InnoDB: secondary indexes include primary key columns-- If PK is (id), and index is (customer_id), the index actually stores (customer_id, id)-- This means PK is always "covered" by secondary indexes === SQL Server ===-- Create covering index with INCLUDECREATE NONCLUSTERED INDEX idx_orders_covering ON orders (customer_id, order_date)INCLUDE (total_amount, status); -- Check execution plan for "Index Scan" without "Key Lookup"-- Key Lookup = heap/clustered fetch = NOT index-only === Oracle ===-- Look for "INDEX FAST FULL SCAN" or absence of "TABLE ACCESS" stepsEXPLAIN PLAN FORSELECT customer_id, order_date FROM orders WHERE customer_id = 1234; SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY); -- If plan shows only index operations, no TABLE ACCESS BY INDEX ROWID, -- it's an index-only executionIn InnoDB, secondary indexes automatically include primary key columns because they use the PK to locate rows in the clustered index. This means an index on (a, b) for a table with PK (id) actually stores (a, b, id). Queries needing only a, b, and id can use index-only access without explicitly including id in the index.
Designing effective covering indexes requires balancing query performance against storage and maintenance costs.
Pattern 1: Query-Specific Covering Index
Create a single index that covers a specific critical query:
1234567891011121314
-- Critical dashboard query runs every 5 seconds:SELECT status, COUNT(*) as count, AVG(processing_time_ms) as avg_timeFROM jobsWHERE created_at > NOW() - INTERVAL '1 hour'GROUP BY status; -- Covering index for this exact query:CREATE INDEX idx_jobs_dashboard ON jobs(created_at, status)INCLUDE (processing_time_ms); -- Result: Query runs in <10ms instead of >500msPattern 2: Multi-Query Composite Covering Index
Design an index that covers multiple related queries:
12345678910111213
-- Multiple queries against orders table:-- Q1: Orders by customer in date range-- Q2: Order count by customer-- Q3: Total spent by customer by month -- Unified covering index:CREATE INDEX idx_orders_multi ON orders( customer_id, -- First: equality conditions in all queries order_date, -- Second: range conditions, ORDER BY total_amount -- Covered: for SUM/AVG aggregations); -- This single index enables index-only scans for all three queriesTrade-offs to Consider:
Creating a covering index for every possible query leads to index explosion. Focus on: (1) queries that run frequently, (2) queries that are performance-critical, (3) queries over large result sets where row fetch overhead is substantial. Let non-critical queries use standard index scans.
You now understand index-only scans—one of the most powerful query optimization techniques available. Here are the essential takeaways:
What's Next:
The next page covers Bitmap Scans—a hybrid access method that combines the selectivity benefits of indexes with the sequential access efficiency of table scans. Bitmap scans are particularly useful when multiple indexes can be combined or when index scans would otherwise have poor clustering.
You now have mastery of index-only scans—understanding their mechanics, requirements, cost benefits, and design patterns. This knowledge enables you to design covering indexes that can transform query performance from seconds to milliseconds.