Loading content...
In traditional index usage, the database follows a two-step process: first, navigate the index to find matching row locations, then fetch the actual rows from the table. This second step—the table lookup or bookmark lookup—often dominates query execution time, especially when many rows match or the table is poorly clustered.
A covering index eliminates this second step entirely. By including all columns needed by the query within the index itself, the database can satisfy the query using only the index—never touching the main table. The result can be 10x, 100x, or even 1000x performance improvements for suitable queries.
By the end of this page, you will understand what makes an index 'covering', how to design covering indexes effectively, the trade-offs involved, included columns (non-key columns in indexes), and techniques for analyzing whether your queries can benefit from covering indexes.
To appreciate covering indexes, we must first understand the problem they solve. When a query uses a non-clustered index, the typical execution flow is:
Step 1: Index Navigation
The database traverses the B-tree index to find entries matching the WHERE clause. This is highly efficient—O(log n) for the initial lookup, then sequential reads for range scans.
Step 2: Table Lookup (The Problem)
For each matching index entry, the database must fetch the full row from the table to:
This step is called table lookup, bookmark lookup, key lookup, or heap fetch depending on the database.
Query: SELECT customer_id, name, email FROM customers WHERE country = 'USA'; Index: idx_country (country) -- Only contains country + row pointer Execution with table lookup:============================ 1. Navigate index to country = 'USA' Read: 3 index pages (root → internal → leaf) 2. Index entries found: 50,000 matching rows 3. For EACH matching row: - Read row pointer from index - Fetch actual row from table (RANDOM I/O!) - Extract customer_id, name, email 4. Total I/O: - Index: ~3 pages initially + ~500 leaf pages for 50K entries - Table: Up to 50,000 random page reads (if rows scattered)! The table lookups dominate the query cost.50,000 random I/Os at 0.1ms each = 5 seconds of I/O time!Why Table Lookups Are Expensive
Table lookups suffer from multiple performance issues:
1. Random I/O Pattern Unless the table happens to be ordered by the index key, each lookup fetches a different page. Random I/O is 10-100x slower than sequential I/O.
2. Read Amplification Each row lookup reads an entire page (typically 8KB) even though you only need one row. If you need 1000 scattered rows, you read up to 1000 pages.
3. Buffer Pool Thrashing Massive random reads may evict useful cached pages, hurting concurrent queries.
4. Multiplied by Result Size The cost scales linearly with the number of matching rows—10x more matches means 10x more lookups.
The table lookup overhead explains why optimizers sometimes prefer full table scans over index scans. Beyond a certain selectivity threshold (often 5-20% of rows), the random I/O from table lookups exceeds the cost of sequentially scanning the entire table. Covering indexes eliminate this tipping point.
An index is covering for a query if it contains all columns needed to execute that query. This includes:
When all required columns exist in the index, the query can be satisfied with an Index Only Scan (PostgreSQL), Covering Index Scan (MySQL), or avoid Key Lookups entirely (SQL Server).
12345678910111213141516171819202122
-- Table: orders-- Columns: order_id, customer_id, order_date, status, total, shipping_address, notes -- Query pattern to optimize:SELECT customer_id, order_date, totalFROM ordersWHERE customer_id = 12345ORDER BY order_date DESC; -- Non-covering index (requires table lookup):CREATE INDEX idx_customer ON orders(customer_id);-- Index contains: customer_id, order_id (row pointer)-- Missing: order_date, total → table lookup required -- Covering index (no table lookup needed):CREATE INDEX idx_customer_covering ON orders(customer_id, order_date DESC, total);-- Index contains all columns: customer_id, order_date, total-- Query satisfied entirely from index! -- Execution comparison:-- Non-covering: 4,000 index entries → 4,000 table lookups → ~2 seconds-- Covering: 4,000 index entries → 0 table lookups → ~50 millisecondsIdentifying Covering Index Opportunities
Examine the execution plan to determine if table lookups are occurring:
123456789101112131415161718
-- PostgreSQL: Look for "Index Only Scan" vs "Index Scan"EXPLAIN SELECT customer_id, order_date FROM orders WHERE customer_id = 12345; -- Index Only Scan (covering):-- "Index Only Scan using idx_customer_date on orders (cost=0.43..45.50 rows=100...)" -- Index Scan (not covering, has heap fetches):-- "Index Scan using idx_customer on orders (cost=0.43..450.00 rows=100...)"-- "Heap Fetches: 100" <- These are the table lookups! -- MySQL: Check "Extra" column for "Using index"EXPLAIN SELECT customer_id, order_date FROM orders WHERE customer_id = 12345;-- Extra: "Using index" means covering (index only)-- No "Using index" means table lookups needed -- SQL Server: Look for "Key Lookup" operator in execution plan-- If you see Key Lookup / RID Lookup → not covering-- If you only see Index Seek/Scan → coveringRemember that primary key columns are implicitly included in all non-clustered indexes (they're used as row pointers). So if your query only needs the indexed columns plus the primary key, the index is effectively covering without explicitly including the PK.
Creating covering indexes requires careful thought about column selection and ordering. The key principle: balance coverage with index size and maintenance cost.
Column Ordering Strategy
The order of columns in a covering index matters significantly:
= in WHERE>, <, BETWEEN12345678910111213141516171819202122232425262728
-- Query to optimize:SELECT product_name, price, stock_quantityFROM productsWHERE category_id = 5 AND price BETWEEN 10 AND 100ORDER BY stock_quantity DESCLIMIT 20; -- Optimal covering index design:CREATE INDEX idx_products_covering ON products( category_id, -- 1. Equality predicate (most selective) price, -- 2. Range predicate stock_quantity DESC, -- 3. Sort column (matching direction!) product_name -- 4. Select-only column); -- Why this order?-- 1. category_id first: Jump directly to category 5 (equality)-- 2. price second: Range scan within category for 10-100-- 3. stock_quantity third with DESC: Already sorted for ORDER BY-- 4. product_name last: Only needed for output, doesn't affect search -- The query can:-- ✓ Seek to category_id = 5-- ✓ Range scan price 10-100-- ✓ Read in stock_quantity DESC order (no sort needed!)-- ✓ Get product_name from index (no table lookup!)-- ✓ Stop after 20 rows (LIMIT optimization)Coverage vs. Selectivity Trade-off
A common mistake is making indexes too wide trying to cover many queries. Consider:
Identify the top 5-10 most expensive or most frequent queries. Design covering indexes for those specific patterns. Don't try to cover every possible query—the maintenance overhead outweighs the benefits for rarely-executed queries.
Modern databases offer a powerful feature for covering indexes: included columns (SQL Server) or INCLUDE clause (PostgreSQL). These are columns stored in the index leaf pages but NOT in the index key structure.
Why Included Columns Matter
When you add a column to the index key, it affects:
Included columns are only stored in leaf pages. This provides:
123456789101112131415161718192021222324
-- SQL Server: INCLUDE clauseCREATE INDEX idx_orders_customer ON orders(customer_id, order_date)INCLUDE (total, status, shipping_address); -- Key columns: customer_id, order_date-- Used for seeking/scanning, determines order-- Included columns: total, status, shipping_address-- Only stored in leaf pages for coverage -- PostgreSQL: INCLUDE clause (PostgreSQL 11+)CREATE INDEX idx_orders_customer ON orders(customer_id, order_date)INCLUDE (total, status, shipping_address); -- Query that benefits:SELECT customer_id, order_date, total, status, shipping_addressFROM ordersWHERE customer_id = 12345ORDER BY order_date DESC; -- Without INCLUDE: Would need 5 columns in index key-- With INCLUDE: Only 2 columns in key, 3 in leaf pages-- Result: Smaller index, better performance, full coverageKey Columns vs. Included Columns: Decision Guide
| Use As Key Column When... | Use As Included Column When... |
|---|---|
| Column appears in WHERE with equality/range | Column only appears in SELECT list |
| Column appears in ORDER BY | Column only needed for coverage |
| Column appears in GROUP BY | Column is wide (text, large varchar) |
| Column used for index seeks | You want to minimize index key width |
| Column determines useful sort order | Column doesn't affect filtering or sorting |
12345678910111213141516171819202122
-- Example: Search products, return descriptionSELECT product_id, name, descriptionFROM productsWHERE category_id = 5 AND is_active = 1; -- Option A: All columns in keyCREATE INDEX idx_opt_a ON products(category_id, is_active, name, description);-- Problems:-- - description might be 1000+ characters-- - Makes internal nodes very large-- - Wastes space duplicating large text in B-tree levels -- Option B: Filtering columns in key, output in INCLUDECREATE INDEX idx_opt_b ON products(category_id, is_active)INCLUDE (name, description);-- Benefits:-- - Small, efficient B-tree structure-- - Fast seeks on category_id and is_active-- - Description stored only in leaves-- - Full coverage, no table lookup -- Option B is typically 3-5x smaller and faster!MySQL does not support INCLUDE columns as of MySQL 8.0. To create covering indexes in MySQL, you must add all columns to the index key. This makes covering index design in MySQL require more careful consideration of the trade-offs.
Covering indexes provide significant read performance benefits, but they aren't free. Understanding the trade-offs helps you make informed decisions.
Storage Overhead
Every column added to an index increases its size. A covering index with many columns can approach the size of the table itself:
Table: orders (10 million rows)Columns: order_id (4B), customer_id (4B), order_date (8B), status (4B), total (8B), shipping_address (100B avg), notes (500B avg) Total row size: ~628 bytesTable size: ~6 GB Index 1: idx_customer (customer_id)Key size: 4 bytes + 4 bytes (row pointer) = 8 bytesIndex size: ~80 MB (1.3% of table) Index 2: idx_customer_covering (customer_id, order_date, status, total)Key size: 4 + 8 + 4 + 8 + 4 = 28 bytesIndex size: ~280 MB (4.7% of table) Index 3: idx_customer_full_cover (all columns except notes)Key size: ~128 bytesIndex size: ~1.3 GB (21% of table!) Trade-off: Index 3 covers more queries but consumes significant storageand increases maintenance overhead for writes.Write Performance Impact
Every INSERT, UPDATE, and DELETE must maintain all indexes. Wider indexes mean:
| Operation | Covered Query Benefit | Index Maintenance Cost |
|---|---|---|
| SELECT (covered) | 10-100x faster read | No cost |
| SELECT (not covered) | Same as narrow index | Storage overhead only |
| INSERT | No benefit | Must update all index columns |
| UPDATE (non-indexed column) | No benefit | No index update needed |
| UPDATE (indexed column) | No benefit | Must update index entry |
| DELETE | No benefit | Must remove from index |
For tables with high write rates (logs, events, real-time metrics), covering indexes may do more harm than good. The write amplification from maintaining wide indexes can become a bottleneck. Favor narrow indexes for write-heavy workloads and add covering indexes selectively for read-critical paths.
Memory Consumption
Indexes compete for buffer pool space. Large covering indexes may:
This is especially important in memory-constrained environments or when multiple covering indexes exist on the same table.
Different databases implement index-only scans with varying capabilities and limitations:
PostgreSQL Index-Only Scans
PostgreSQL's MVCC implementation adds an interesting consideration: the visibility map. For an index-only scan to avoid heap (table) access, PostgreSQL must confirm row visibility. If a page isn't marked all-visible in the visibility map, PostgreSQL must fetch the heap page anyway.
12345678910111213141516171819202122
-- Check visibility map coverageSELECT relname, n_live_tup, n_dead_tup, heap_blks_read, heap_blks_hitFROM pg_stat_user_tablesWHERE relname = 'orders'; -- Monitor index-only scan effectivenessEXPLAIN (ANALYZE, BUFFERS) SELECT customer_id, order_date FROM orders WHERE customer_id = 12345; -- Look for "Heap Fetches" in output:-- Index Only Scan using idx_customer_covering on orders-- Heap Fetches: 0 <- Perfect! All from index-- Heap Fetches: 50 <- Had to check heap for visibility -- Improve visibility map coverage by running:VACUUM orders; -- Updates visibility map for all-visible pagesMySQL Covering Index Considerations
MySQL has strong covering index support through InnoDB's secondary indexes:
12345678910111213141516171819
-- InnoDB secondary indexes include the primary key automatically-- Table: orders with PRIMARY KEY (order_id)-- Index: idx_customer (customer_id)-- Effectively contains: (customer_id, order_id) -- So this query is automatically covered:SELECT customer_id, order_id FROM orders WHERE customer_id = 12345;-- Extra: Using index (covering!) -- But this needs table lookup:SELECT customer_id, order_date FROM orders WHERE customer_id = 12345;-- Extra: (nothing) - needs table access -- Covering index for the second query:CREATE INDEX idx_customer_date ON orders(customer_id, order_date); -- Check coverage in EXPLAIN:EXPLAIN SELECT customer_id, order_date FROM orders WHERE customer_id = 12345;-- Look for "Using index" in Extra columnSQL Server Index-Only Access
SQL Server shows excellent covering index support, with execution plans clearly showing when Key Lookups are eliminated:
1234567891011121314151617181920
-- Create covering index with INCLUDECREATE INDEX idx_customer_covering ON orders(customer_id)INCLUDE (order_date, total, status); -- Query using the covering indexSELECT customer_id, order_date, total, statusFROM ordersWHERE customer_id = 12345; -- Execution plan will show:-- Index Seek (Nonclustered) on idx_customer_covering-- NO Key Lookup operator! -- Without INCLUDE, you'd see:-- Index Seek on idx_customer-- Key Lookup (Clustered) on orders <- This is the table access!-- Nested Loops to combine them -- The Key Lookup can be the most expensive part of the planBeyond basic coverage, several advanced patterns leverage covering indexes:
Pattern 1: Covering Indexes for Aggregations
Covering indexes enable highly efficient aggregation:
123456789101112131415161718
-- Query: Monthly sales totals by categorySELECT category_id, DATE_TRUNC('month', order_date) as month, SUM(total) as monthly_totalFROM ordersWHERE status = 'completed'GROUP BY category_id, DATE_TRUNC('month', order_date); -- Covering index for this aggregation:CREATE INDEX idx_orders_agg ON orders(status, category_id, order_date)INCLUDE (total); -- Execution: -- 1. Seek to status = 'completed'-- 2. Scan index in (category_id, order_date) order-- 3. Aggregate totals directly from index-- 4. No table access whatsoever!Pattern 2: Covering Indexes for Pagination
Pagination with OFFSET/LIMIT benefits greatly from covering indexes:
1234567891011121314151617181920
-- Query: Paginated order listSELECT order_id, customer_id, order_date, totalFROM ordersWHERE status = 'pending'ORDER BY order_date DESCLIMIT 20 OFFSET 100; -- Covering index:CREATE INDEX idx_pending_orders ON orders(status, order_date DESC)INCLUDE (customer_id, total); -- Without covering: -- Skip 100 rows, but each might need table lookup-- Even rows we skip might require table access for ORDER BY -- With covering:-- Skip entirely within index, no table access-- Only access leaf pages, which are smaller and cached-- Dramatically faster for deep paginationPattern 3: Covering Joins
Covering indexes can also accelerate join operations by providing all needed columns without accessing the base table:
12345678910111213141516171819
-- Query: Customer orders with totalsSELECT c.customer_name, o.order_date, o.totalFROM customers cJOIN orders o ON c.customer_id = o.customer_idWHERE c.region = 'WEST'; -- Covering index on orders:CREATE INDEX idx_orders_join ON orders(customer_id)INCLUDE (order_date, total); -- Execution:-- 1. Access customers where region = 'WEST'-- 2. For each customer, seek orders index by customer_id-- 3. Read order_date and total directly from index-- 4. NO orders table access at all! -- This is especially powerful when orders table is very large-- and customers table is small (many orders per customer)For complex queries, you might need a covering index that combines filtering, sorting, AND joining requirements. Analyze the full query pattern to design an index that covers the entire execution path. The payoff can be enormous—queries that took seconds completing in milliseconds.
Covering indexes represent one of the most impactful optimization techniques in database performance tuning. Let's consolidate the key principles:
What's Next
While covering indexes maximize index benefits, indexes also have important limitations that must be understood. The next page examines Index Limitations—situations where indexes can't help, might hurt performance, or require special consideration for optimal use.
You now understand covering indexes in depth—from the table lookup problem they solve, through design strategies and INCLUDE columns, to advanced patterns for aggregations, pagination, and joins. Apply these techniques to your highest-impact queries for dramatic performance improvements.