Index Usage - Learning Module

Loading content...

0/252

Covering Indexes

The Ultimate Index Optimization

In traditional index usage, the database follows a two-step process: first, navigate the index to find matching row locations, then fetch the actual rows from the table. This second step—the table lookup or bookmark lookup—often dominates query execution time, especially when many rows match or the table is poorly clustered.

A covering index eliminates this second step entirely. By including all columns needed by the query within the index itself, the database can satisfy the query using only the index—never touching the main table. The result can be 10x, 100x, or even 1000x performance improvements for suitable queries.

What You Will Learn

By the end of this page, you will understand what makes an index 'covering', how to design covering indexes effectively, the trade-offs involved, included columns (non-key columns in indexes), and techniques for analyzing whether your queries can benefit from covering indexes.

Understanding the Table Lookup Problem

To appreciate covering indexes, we must first understand the problem they solve. When a query uses a non-clustered index, the typical execution flow is:

Step 1: Index Navigation

The database traverses the B-tree index to find entries matching the WHERE clause. This is highly efficient—O(log n) for the initial lookup, then sequential reads for range scans.

Step 2: Table Lookup (The Problem)

For each matching index entry, the database must fetch the full row from the table to:

Retrieve columns not in the index (for SELECT list)
Apply additional WHERE conditions not in the index
Verify row visibility (for MVCC databases)

This step is called table lookup, bookmark lookup, key lookup, or heap fetch depending on the database.

table-lookup-example.txt
Query: SELECT customer_id, name, email FROM customers WHERE country = 'USA';
 
Index: idx_country (country)  -- Only contains country + row pointer
 
Execution with table lookup:
============================
 
1. Navigate index to country = 'USA'
   Read: 3 index pages (root → internal → leaf)
 
2. Index entries found: 50,000 matching rows
 
3. For EACH matching row:
   - Read row pointer from index
   - Fetch actual row from table (RANDOM I/O!)
   - Extract customer_id, name, email
 
4. Total I/O:
   - Index: ~3 pages initially + ~500 leaf pages for 50K entries
   - Table: Up to 50,000 random page reads (if rows scattered)!
 
The table lookups dominate the query cost.
50,000 random I/Os at 0.1ms each = 5 seconds of I/O time!

Why Table Lookups Are Expensive

Table lookups suffer from multiple performance issues:

1. Random I/O Pattern Unless the table happens to be ordered by the index key, each lookup fetches a different page. Random I/O is 10-100x slower than sequential I/O.

2. Read Amplification Each row lookup reads an entire page (typically 8KB) even though you only need one row. If you need 1000 scattered rows, you read up to 1000 pages.

3. Buffer Pool Thrashing Massive random reads may evict useful cached pages, hurting concurrent queries.

4. Multiplied by Result Size The cost scales linearly with the number of matching rows—10x more matches means 10x more lookups.

The Tipping Point Problem

The table lookup overhead explains why optimizers sometimes prefer full table scans over index scans. Beyond a certain selectivity threshold (often 5-20% of rows), the random I/O from table lookups exceeds the cost of sequentially scanning the entire table. Covering indexes eliminate this tipping point.

What Makes an Index 'Covering'

An index is covering for a query if it contains all columns needed to execute that query. This includes:

Columns in the SELECT list
Columns in the WHERE clause
Columns in the ORDER BY clause
Columns in the GROUP BY clause
Columns in the HAVING clause
Columns in JOIN conditions (for the indexed table)

When all required columns exist in the index, the query can be satisfied with an Index Only Scan (PostgreSQL), Covering Index Scan (MySQL), or avoid Key Lookups entirely (SQL Server).

covering-index-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Table: orders
-- Columns: order_id, customer_id, order_date, status, total, shipping_address, notes
 
-- Query pattern to optimize:
SELECT customer_id, order_date, total
FROM orders
WHERE customer_id = 12345
ORDER BY order_date DESC;
 
-- Non-covering index (requires table lookup):
CREATE INDEX idx_customer ON orders(customer_id);
-- Index contains: customer_id, order_id (row pointer)
-- Missing: order_date, total → table lookup required
 
-- Covering index (no table lookup needed):
CREATE INDEX idx_customer_covering ON orders(customer_id, order_date DESC, total);
-- Index contains all columns: customer_id, order_date, total
-- Query satisfied entirely from index!
 
-- Execution comparison:
-- Non-covering: 4,000 index entries → 4,000 table lookups → ~2 seconds
-- Covering: 4,000 index entries → 0 table lookups → ~50 milliseconds

Identifying Covering Index Opportunities

Examine the execution plan to determine if table lookups are occurring:

identify-covering-need.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- PostgreSQL: Look for "Index Only Scan" vs "Index Scan"
EXPLAIN SELECT customer_id, order_date FROM orders WHERE customer_id = 12345;
 
-- Index Only Scan (covering):
-- "Index Only Scan using idx_customer_date on orders (cost=0.43..45.50 rows=100...)"
 
-- Index Scan (not covering, has heap fetches):
-- "Index Scan using idx_customer on orders (cost=0.43..450.00 rows=100...)"
--    "Heap Fetches: 100"  <- These are the table lookups!
 
-- MySQL: Check "Extra" column for "Using index"
EXPLAIN SELECT customer_id, order_date FROM orders WHERE customer_id = 12345;
-- Extra: "Using index" means covering (index only)
-- No "Using index" means table lookups needed
 
-- SQL Server: Look for "Key Lookup" operator in execution plan
-- If you see Key Lookup / RID Lookup → not covering
-- If you only see Index Seek/Scan → covering

Implicit Columns

Remember that primary key columns are implicitly included in all non-clustered indexes (they're used as row pointers). So if your query only needs the indexed columns plus the primary key, the index is effectively covering without explicitly including the PK.

Designing Effective Covering Indexes

Creating covering indexes requires careful thought about column selection and ordering. The key principle: balance coverage with index size and maintenance cost.

Column Ordering Strategy

The order of columns in a covering index matters significantly:

Equality predicates first: Columns used with = in WHERE
Range predicates next: Columns used with >, <, BETWEEN
Sort columns: Columns in ORDER BY (matching direction)
Select-only columns last: Columns only needed for output

covering-index-design.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Query to optimize:
SELECT product_name, price, stock_quantity
FROM products
WHERE category_id = 5 
  AND price BETWEEN 10 AND 100
ORDER BY stock_quantity DESC
LIMIT 20;
 
-- Optimal covering index design:
CREATE INDEX idx_products_covering ON products(
    category_id,           -- 1. Equality predicate (most selective)
    price,                 -- 2. Range predicate
    stock_quantity DESC,   -- 3. Sort column (matching direction!)
    product_name           -- 4. Select-only column
);
 
-- Why this order?
-- 1. category_id first: Jump directly to category 5 (equality)
-- 2. price second: Range scan within category for 10-100
-- 3. stock_quantity third with DESC: Already sorted for ORDER BY
-- 4. product_name last: Only needed for output, doesn't affect search
 
-- The query can:
-- ✓ Seek to category_id = 5
-- ✓ Range scan price 10-100
-- ✓ Read in stock_quantity DESC order (no sort needed!)
-- ✓ Get product_name from index (no table lookup!)
-- ✓ Stop after 20 rows (LIMIT optimization)

Coverage vs. Selectivity Trade-off

A common mistake is making indexes too wide trying to cover many queries. Consider:

Good Covering Index

•Covers 1-3 high-value query patterns
•4-6 columns typically
•Frequently executed queries
•Queries with clear column requirements
•Measurable performance impact

Problematic Covering Index

•Tries to cover every possible query
•10+ columns (approaching table width)
•Rarely executed queries
•Queries with varying column needs
•Write-heavy tables

The 80/20 Rule

Identify the top 5-10 most expensive or most frequent queries. Design covering indexes for those specific patterns. Don't try to cover every possible query—the maintenance overhead outweighs the benefits for rarely-executed queries.

Included Columns: Non-Key Index Columns

Modern databases offer a powerful feature for covering indexes: included columns (SQL Server) or INCLUDE clause (PostgreSQL). These are columns stored in the index leaf pages but NOT in the index key structure.

Why Included Columns Matter

When you add a column to the index key, it affects:

B-tree internal nodes (for navigation)
Sort order of the index
Index search capabilities

Included columns are only stored in leaf pages. This provides:

Coverage for queries (no table lookup)
Smaller internal nodes (faster tree navigation)
No impact on index ordering
Lower maintenance overhead than key columns

included-columns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- SQL Server: INCLUDE clause
CREATE INDEX idx_orders_customer 
ON orders(customer_id, order_date)
INCLUDE (total, status, shipping_address);
 
-- Key columns: customer_id, order_date
--   Used for seeking/scanning, determines order
-- Included columns: total, status, shipping_address
--   Only stored in leaf pages for coverage
 
-- PostgreSQL: INCLUDE clause (PostgreSQL 11+)
CREATE INDEX idx_orders_customer 
ON orders(customer_id, order_date)
INCLUDE (total, status, shipping_address);
 
-- Query that benefits:
SELECT customer_id, order_date, total, status, shipping_address
FROM orders
WHERE customer_id = 12345
ORDER BY order_date DESC;
 
-- Without INCLUDE: Would need 5 columns in index key
-- With INCLUDE: Only 2 columns in key, 3 in leaf pages
-- Result: Smaller index, better performance, full coverage

Key Columns vs. Included Columns: Decision Guide

When to Use Key vs. Included Columns
Use As Key Column When...	Use As Included Column When...
Column appears in WHERE with equality/range	Column only appears in SELECT list
Column appears in ORDER BY	Column only needed for coverage
Column appears in GROUP BY	Column is wide (text, large varchar)
Column used for index seeks	You want to minimize index key width
Column determines useful sort order	Column doesn't affect filtering or sorting

include-vs-key.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Example: Search products, return description
SELECT product_id, name, description
FROM products
WHERE category_id = 5 AND is_active = 1;
 
-- Option A: All columns in key
CREATE INDEX idx_opt_a ON products(category_id, is_active, name, description);
-- Problems:
-- - description might be 1000+ characters
-- - Makes internal nodes very large
-- - Wastes space duplicating large text in B-tree levels
 
-- Option B: Filtering columns in key, output in INCLUDE
CREATE INDEX idx_opt_b ON products(category_id, is_active)
INCLUDE (name, description);
-- Benefits:
-- - Small, efficient B-tree structure
-- - Fast seeks on category_id and is_active
-- - Description stored only in leaves
-- - Full coverage, no table lookup
 
-- Option B is typically 3-5x smaller and faster!

MySQL Limitation

MySQL does not support INCLUDE columns as of MySQL 8.0. To create covering indexes in MySQL, you must add all columns to the index key. This makes covering index design in MySQL require more careful consideration of the trade-offs.

Covering Index Trade-offs

Covering indexes provide significant read performance benefits, but they aren't free. Understanding the trade-offs helps you make informed decisions.

Storage Overhead

Every column added to an index increases its size. A covering index with many columns can approach the size of the table itself:

storage-comparison.txt
Table: orders (10 million rows)
Columns: order_id (4B), customer_id (4B), order_date (8B), status (4B), 
         total (8B), shipping_address (100B avg), notes (500B avg)
 
Total row size: ~628 bytes
Table size: ~6 GB
 
Index 1: idx_customer (customer_id)
Key size: 4 bytes + 4 bytes (row pointer) = 8 bytes
Index size: ~80 MB (1.3% of table)
 
Index 2: idx_customer_covering (customer_id, order_date, status, total)
Key size: 4 + 8 + 4 + 8 + 4 = 28 bytes
Index size: ~280 MB (4.7% of table)
 
Index 3: idx_customer_full_cover (all columns except notes)
Key size: ~128 bytes
Index size: ~1.3 GB (21% of table!)
 
Trade-off: Index 3 covers more queries but consumes significant storage
and increases maintenance overhead for writes.

Write Performance Impact

Every INSERT, UPDATE, and DELETE must maintain all indexes. Wider indexes mean:

More bytes written per operation
Higher page split frequency (wide keys fill pages faster)
Increased lock contention (more pages affected)
Longer transaction log entries

Covering Index Impact on Operations
Operation	Covered Query Benefit	Index Maintenance Cost
SELECT (covered)	10-100x faster read	No cost
SELECT (not covered)	Same as narrow index	Storage overhead only
INSERT	No benefit	Must update all index columns
UPDATE (non-indexed column)	No benefit	No index update needed
UPDATE (indexed column)	No benefit	Must update index entry
DELETE	No benefit	Must remove from index

Write-Heavy Tables

For tables with high write rates (logs, events, real-time metrics), covering indexes may do more harm than good. The write amplification from maintaining wide indexes can become a bottleneck. Favor narrow indexes for write-heavy workloads and add covering indexes selectively for read-critical paths.

Memory Consumption

Indexes compete for buffer pool space. Large covering indexes may:

Reduce space available for table data caching
Require more memory to cache the entire hot set
Increase page eviction rates for concurrent workloads

This is especially important in memory-constrained environments or when multiple covering indexes exist on the same table.

Index-Only Scans in Practice

Different databases implement index-only scans with varying capabilities and limitations:

PostgreSQL Index-Only Scans

PostgreSQL's MVCC implementation adds an interesting consideration: the visibility map. For an index-only scan to avoid heap (table) access, PostgreSQL must confirm row visibility. If a page isn't marked all-visible in the visibility map, PostgreSQL must fetch the heap page anyway.

postgresql-index-only.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Check visibility map coverage
SELECT relname, 
       n_live_tup,
       n_dead_tup,
       heap_blks_read,
       heap_blks_hit
FROM pg_stat_user_tables
WHERE relname = 'orders';
 
-- Monitor index-only scan effectiveness
EXPLAIN (ANALYZE, BUFFERS) 
SELECT customer_id, order_date 
FROM orders 
WHERE customer_id = 12345;
 
-- Look for "Heap Fetches" in output:
-- Index Only Scan using idx_customer_covering on orders
--   Heap Fetches: 0  <- Perfect! All from index
--   Heap Fetches: 50 <- Had to check heap for visibility
 
-- Improve visibility map coverage by running:
VACUUM orders;  -- Updates visibility map for all-visible pages

MySQL Covering Index Considerations

MySQL has strong covering index support through InnoDB's secondary indexes:

mysql-covering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- InnoDB secondary indexes include the primary key automatically
-- Table: orders with PRIMARY KEY (order_id)
-- Index: idx_customer (customer_id)
-- Effectively contains: (customer_id, order_id)
 
-- So this query is automatically covered:
SELECT customer_id, order_id FROM orders WHERE customer_id = 12345;
-- Extra: Using index (covering!)
 
-- But this needs table lookup:
SELECT customer_id, order_date FROM orders WHERE customer_id = 12345;
-- Extra: (nothing) - needs table access
 
-- Covering index for the second query:
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
 
-- Check coverage in EXPLAIN:
EXPLAIN SELECT customer_id, order_date FROM orders WHERE customer_id = 12345;
-- Look for "Using index" in Extra column

SQL Server Index-Only Access

SQL Server shows excellent covering index support, with execution plans clearly showing when Key Lookups are eliminated:

sqlserver-covering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Create covering index with INCLUDE
CREATE INDEX idx_customer_covering 
ON orders(customer_id)
INCLUDE (order_date, total, status);
 
-- Query using the covering index
SELECT customer_id, order_date, total, status
FROM orders
WHERE customer_id = 12345;
 
-- Execution plan will show:
-- Index Seek (Nonclustered) on idx_customer_covering
-- NO Key Lookup operator!
 
-- Without INCLUDE, you'd see:
-- Index Seek on idx_customer
-- Key Lookup (Clustered) on orders  <- This is the table access!
-- Nested Loops to combine them
 
-- The Key Lookup can be the most expensive part of the plan

Advanced Covering Index Patterns

Beyond basic coverage, several advanced patterns leverage covering indexes:

Pattern 1: Covering Indexes for Aggregations

Covering indexes enable highly efficient aggregation:

covering-aggregation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Query: Monthly sales totals by category
SELECT category_id, 
       DATE_TRUNC('month', order_date) as month,
       SUM(total) as monthly_total
FROM orders
WHERE status = 'completed'
GROUP BY category_id, DATE_TRUNC('month', order_date);
 
-- Covering index for this aggregation:
CREATE INDEX idx_orders_agg 
ON orders(status, category_id, order_date)
INCLUDE (total);
 
-- Execution: 
-- 1. Seek to status = 'completed'
-- 2. Scan index in (category_id, order_date) order
-- 3. Aggregate totals directly from index
-- 4. No table access whatsoever!

Pattern 2: Covering Indexes for Pagination

Pagination with OFFSET/LIMIT benefits greatly from covering indexes:

covering-pagination.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Query: Paginated order list
SELECT order_id, customer_id, order_date, total
FROM orders
WHERE status = 'pending'
ORDER BY order_date DESC
LIMIT 20 OFFSET 100;
 
-- Covering index:
CREATE INDEX idx_pending_orders 
ON orders(status, order_date DESC)
INCLUDE (customer_id, total);
 
-- Without covering: 
-- Skip 100 rows, but each might need table lookup
-- Even rows we skip might require table access for ORDER BY
 
-- With covering:
-- Skip entirely within index, no table access
-- Only access leaf pages, which are smaller and cached
-- Dramatically faster for deep pagination

Pattern 3: Covering Joins

Covering indexes can also accelerate join operations by providing all needed columns without accessing the base table:

covering-joins.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Query: Customer orders with totals
SELECT c.customer_name, o.order_date, o.total
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE c.region = 'WEST';
 
-- Covering index on orders:
CREATE INDEX idx_orders_join 
ON orders(customer_id)
INCLUDE (order_date, total);
 
-- Execution:
-- 1. Access customers where region = 'WEST'
-- 2. For each customer, seek orders index by customer_id
-- 3. Read order_date and total directly from index
-- 4. NO orders table access at all!
 
-- This is especially powerful when orders table is very large
-- and customers table is small (many orders per customer)

Composite Pattern

For complex queries, you might need a covering index that combines filtering, sorting, AND joining requirements. Analyze the full query pattern to design an index that covers the entire execution path. The payoff can be enormous—queries that took seconds completing in milliseconds.

Summary: Mastering Covering Indexes

Covering indexes represent one of the most impactful optimization techniques in database performance tuning. Let's consolidate the key principles:

Key Takeaways

•Table lookups are expensive — Random I/O for each row dominates query cost for non-covering indexes
•Covering indexes eliminate table access — All columns needed for the query exist in the index
•Column ordering matters — Equality predicates first, then ranges, then sort columns, then output-only columns
•INCLUDE columns optimize coverage — Store output columns in leaf pages without bloating the B-tree key
•Trade-offs exist — Storage overhead, write amplification, and memory competition must be considered
•Focus on high-value queries — Design covering indexes for frequent, performance-critical query patterns
•Verify with EXPLAIN — Confirm index-only scans are occurring and table accesses are eliminated

What's Next

While covering indexes maximize index benefits, indexes also have important limitations that must be understood. The next page examines Index Limitations—situations where indexes can't help, might hurt performance, or require special consideration for optimal use.

Page Complete

You now understand covering indexes in depth—from the table lookup problem they solve, through design strategies and INCLUDE columns, to advanced patterns for aggregations, pagination, and joins. Apply these techniques to your highest-impact queries for dramatic performance improvements.