Loading learning content...
If indexing were simply about making queries fast, every database would have an index on every column. The reason they don't reveals the fundamental truth of database engineering: every index is a trade-off.
Indexes accelerate reads but slow writes. They consume storage but save CPU. They benefit some queries while being useless for others. The art of indexing lies not in knowing how to create indexes, but in knowing when to create them, which columns to include, and when the costs outweigh the benefits.
This page synthesizes everything we've learned about indexes into a comprehensive framework for making indexing decisions in production systems. You'll learn to reason systematically about trade-offs, avoid common pitfalls, and design indexing strategies that serve your application's unique needs.
By the end of this page, you'll understand the read/write trade-off quantitatively, know how to evaluate storage costs, master index maintenance strategies, design for different workload profiles, and develop a systematic approach to indexing decisions that balances performance, cost, and complexity.
The core trade-off in indexing is simple: indexes make reads faster but writes slower. Every index on a table creates additional work during INSERT, UPDATE, and DELETE operations.
Write Amplification Explained:
When you insert a row into a table with 5 indexes:
Total: 6 write operations for 1 logical insert. This is write amplification.
For updates, it's even worse if the updated column is indexed:
of Indexes | INSERT Cost | UPDATE Cost* | DELETE Cost |
|---|---|---|---|
| 0 | 1 write | 1 write | 1 write |
| 1 | 2 writes | 1-3 writes | 2 writes |
| 3 | 4 writes | 1-7 writes | 4 writes |
| 5 | 6 writes | 1-11 writes | 6 writes |
| 10 | 11 writes | 1-21 writes | 11 writes |
*UPDATE cost depends on which columns are modified. Updating an unindexed column may only affect the heap.
Quantifying the Read Benefit:
To justify an index, the read performance benefit must outweigh the write performance cost:
Benefit = (queries_per_second) × (time_saved_per_query)
Cost = (writes_per_second) × (time_added_per_write)
Index is worthwhile when: Benefit > Cost
Example Calculation:
Query without index: 100ms (full table scan)
Query with index: 1ms (index lookup)
Time saved per query: 99ms
Query frequency: 1000 queries/second
Read benefit: 99 seconds of CPU time saved per second
Write without index: 1ms
Write with index: 2ms
Time added per write: 1ms
Write frequency: 100 writes/second
Write cost: 0.1 seconds of CPU time added per second
Net benefit: 98.9 seconds saved per second → Index is clearly worthwhile
As a rough guideline: If your read/write ratio is 10:1 or higher, indexes almost always help. At 1:1 ratio, carefully evaluate each index. Below 1:1 (write-heavy), be very selective about indexes—each one significantly impacts throughput.
Indexes consume storage proportional to the data they index. Understanding storage costs helps you make informed decisions about index design.
Index Size Calculation:
For a B-tree index, approximate size:
index_size ≈ num_rows × (key_size + pointer_size) × overhead_factor
Where:
Example:
| Index Type | Column Type | 100M Row Size | Notes |
|---|---|---|---|
| Single (INT) | 4 bytes | ~1.5 GB | Most compact |
| Single (BIGINT) | 8 bytes | ~2 GB | Common for IDs |
| Single (UUID) | 16 bytes | ~3 GB | Random distribution |
| Single (VARCHAR 100) | up to 100 bytes | ~10-15 GB | Variable length |
| Composite (INT, INT) | 8 bytes | ~2 GB | Efficient |
| Composite (INT, TIMESTAMP) | 12 bytes | ~2.5 GB | Common pattern |
| Covering (5 columns) | varies | ~5-20 GB | Duplicates table data |
Memory Pressure:
Indexes don't just consume disk space—they compete for memory in the buffer pool:
The Working Set Problem:
Your database's working set is the data actively accessed by queries. Ideally, the entire working set fits in memory:
working_set = hot_table_data + hot_index_data + temp_space
buffer_pool_size >= working_set → mostly memory operations
buffer_pool_size < working_set → disk I/O, slower queries
Practical Impact:
Adding a 5 GB index to a database with 16 GB buffer pool might push other hot data out of memory, degrading overall performance even for queries that don't use the new index.
Monitoring Storage Impact:
123456789101112131415161718192021222324252627
-- PostgreSQL: Index sizes for a tableSELECT indexrelname AS index_name, pg_size_pretty(pg_relation_size(indexrelid)) AS index_sizeFROM pg_stat_user_indexesWHERE relname = 'orders'ORDER BY pg_relation_size(indexrelid) DESC; -- PostgreSQL: Relation (table + indexes) total sizeSELECT relname, pg_size_pretty(pg_total_relation_size(relid)) AS total_size, pg_size_pretty(pg_relation_size(relid)) AS table_size, pg_size_pretty(pg_indexes_size(relid)) AS indexes_sizeFROM pg_catalog.pg_statio_user_tablesORDER BY pg_total_relation_size(relid) DESC; -- PostgreSQL: Buffer pool usage by relationSELECT c.relname, count(*) AS buffers, pg_size_pretty(count(*) * 8192) AS size_in_bufferFROM pg_buffercache bJOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid)GROUP BY c.relnameORDER BY count(*) DESCLIMIT 20;Indexes can bloat over time due to fragmentation and dead tuples. A heavily updated table might have indexes 2-3x larger than necessary. Regular REINDEX or VACUUM operations compact indexes back to their optimal size. Monitor index bloat as part of database maintenance.
Beyond read/write trade-offs, indexes create ongoing maintenance overhead that affects database operations.
Transaction Logging:
In databases with write-ahead logging (WAL), every index modification generates log entries:
Vacuum and Maintenance Operations:
Indexes must be maintained during vacuum/analyze operations:
Lock Contention:
Index operations can create lock contention:
Concurrent Index Operations:
Modern databases support concurrent index operations to minimize disruption:
-- PostgreSQL: Create index without locking writes
CREATE INDEX CONCURRENTLY idx_orders_user
ON orders(user_id);
-- Takes longer but doesn't block other operations
-- PostgreSQL: Reindex without heavy locking
REINDEX INDEX CONCURRENTLY idx_orders_user;
However, concurrent operations:
Schedule index maintenance during low-traffic periods. For critical systems, use concurrent operations and monitor progress. Consider creating new indexes concurrently, then dropping old redundant ones, rather than modifying indexes in place.
Different application workloads demand different indexing approaches. Understanding your workload profile guides index design.
| Workload | Read/Write Ratio | Typical Indexes | Strategy |
|---|---|---|---|
| OLTP (transactions) | 10:1 to 100:1 | 3-5 per table | Index hot query paths, avoid over-indexing |
| OLAP (analytics) | 1000:1+ | Many, including wide covering | Comprehensive indexes, prioritize query speed |
| Write-heavy (logging) | 1:10 to 1:100 | 0-2 per table | Minimal indexes, bulk-load patterns |
| Hybrid (mixed) | varies | Selective | Partition by workload if possible |
| Time-series | Write-heavy, read varies | Time-based partitioning | Partition indexes, retention policies |
OLTP (Transactional) Systems:
Typical characteristics:
Index strategy:
OLAP (Analytical) Systems:
Typical characteristics:
Index strategy:
Use read replicas to experiment with different indexing strategies. Add indexes to a replica, benchmark query performance, then promote the strategy to production if it helps. This avoids impacting production writes during experimentation.
Partial indexes (PostgreSQL) or filtered indexes (SQL Server) index only a subset of rows matching a specified condition. They offer significant trade-off advantages for certain query patterns.
Why Partial Indexes?
Consider orders table with 100 million rows:
Queries almost always filter for active orders. A full index on status wastes 98% of its space on rows that are rarely queried.
Partial Index Solution:
123456789101112131415161718192021222324252627282930
-- Full index (indexes all 100M rows)CREATE INDEX idx_orders_status ON orders(status);-- Size: ~2 GB -- Partial index (indexes only active orders)CREATE INDEX idx_orders_active ON orders(status)WHERE status IN ('pending', 'processing', 'shipped');-- Size: ~40 MB (98% smaller!) -- Even more focused: just one statusCREATE INDEX idx_orders_pending ON orders(created_at)WHERE status = 'pending';-- Tiny index, only pending orders, sorted by creation time -- Query that uses partial indexSELECT * FROM orders WHERE status = 'pending' ORDER BY created_at;-- Uses idx_orders_pending perfectly -- Soft-delete patternCREATE INDEX idx_active_users ON users(email)WHERE deleted_at IS NULL;-- Index only non-deleted users (much smaller than full index) -- Unique constraint on subsetCREATE UNIQUE INDEX idx_unique_active_email ON users(email) WHERE deleted_at IS NULL;-- Allows re-registration of deleted emails while ensuring-- only one active user per emailBenefits of Partial Indexes:
Limitations:
Common Partial Index Patterns:
Partial indexes are one of the most powerful yet underused indexing features. If your queries consistently filter to a small subset of data, partial indexes can provide 10-100x storage savings while maintaining identical query performance. Always consider them for status-based, soft-delete, or time-windowed query patterns.
Recognizing common indexing mistakes helps you avoid them in your own systems.
The 'More Indexes is Better' Fallacy:
A common belief is that adding indexes always improves performance. Reality:
The Pareto Principle of Indexes:
80% of query patterns can be served by 20% of possible indexes. Identify your critical query paths and index those. Accept that rare queries may be slower—the trade-off is usually worth it.
Watch for: Write latency increasing over time. INSERT throughput declining. Vacuum operations taking hours. Index size exceeding table size by 3x+. Queries using unexpected indexes (optimizer confusion). These all suggest over-indexing.
Effective indexing requires ongoing monitoring to identify both missing indexes and unused ones.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
-- PostgreSQL: Find unused indexesSELECT schemaname || '.' || relname AS table, indexrelname AS index, pg_size_pretty(pg_relation_size(i.indexrelid)) AS size, idx_scan AS scansFROM pg_stat_user_indexes uiJOIN pg_index i ON ui.indexrelid = i.indexrelidWHERE NOT i.indisunique -- Exclude primary keys and unique constraints AND idx_scan < 50 -- Rarely used AND pg_relation_size(i.indexrelid) > 5000000 -- > 5MBORDER BY pg_relation_size(i.indexrelid) DESC; -- PostgreSQL: Tables missing primary key indexesSELECT relname FROM pg_class WHERE relkind = 'r' AND relname NOT IN ( SELECT indrelid::regclass::text FROM pg_index WHERE indisprimary ); -- PostgreSQL: Find slow queries potentially needing indexes-- (Requires pg_stat_statements extension)SELECT query, calls, mean_time::numeric(10,2) AS avg_ms, total_time::numeric(10,2) AS total_msFROM pg_stat_statementsWHERE mean_time > 100 -- Queries averaging over 100msORDER BY total_time DESCLIMIT 20; -- PostgreSQL: Identify sequential scans on large tablesSELECT relname AS table, seq_scan, seq_tup_read, idx_scan, pg_size_pretty(pg_relation_size(relid)) AS sizeFROM pg_stat_user_tablesWHERE seq_scan > 100 -- Many sequential scans AND pg_relation_size(relid) > 10000000 -- > 10MB AND idx_scan < seq_scan * 0.1 -- Very low index usageORDER BY seq_tup_read DESC;Key Metrics to Monitor:
| Metric | What It Indicates | Action |
|---|---|---|
| idx_scan = 0 | Index never used | Consider dropping |
| seq_scan high on large tables | Missing index | Analyze query patterns |
| idx_tup_read >> idx_tup_fetch | Low selectivity | Index may not help |
| Index size >> expected | Bloat or fragmentation | REINDEX |
| Buffer cache hit ratio low | Not enough memory | Increase buffer pool or reduce indexes |
EXPLAIN ANALYZE for Index Investigation:
When optimizing specific queries, always use EXPLAIN ANALYZE to understand execution:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders
WHERE user_id = 123 AND created_at > '2024-01-01';
Look for:
Schedule quarterly index reviews: Identify and drop unused indexes. Analyze slow query logs for indexing opportunities. Check index bloat and reindex if needed. Verify index sizes are proportional to their value. Document why each index exists.
When faced with an indexing decision, follow this systematic framework to make informed choices.
The Decision Matrix:
| Query Frequency | Write Frequency | Recommendation |
|---|---|---|
| High (100+/sec) | Low (< 10/sec) | ✅ Index aggressively, consider covering indexes |
| High (100+/sec) | High (100+/sec) | ⚠️ Index carefully, monitor write latency |
| Medium (10-100/sec) | Low (< 10/sec) | ✅ Index if query is slow without it |
| Medium (10-100/sec) | High (100+/sec) | ⚠️ Index only critical paths |
| Low (< 10/sec) | Low (< 10/sec) | ⚠️ Index if query is unacceptably slow |
| Low (< 10/sec) | High (100+/sec) | ❌ Usually don't index, accept slower reads |
Production Deployment Best Practices:
Expert database engineers don't add indexes reflexively. They analyze query patterns, quantify trade-offs, consider alternatives, validate with data, and monitor continuously. This systematic approach, not intuition, is what separates great database performance from mediocre.
We've explored the full spectrum of index trade-offs. Let's consolidate the key principles:
Module Complete:
You've now completed a comprehensive exploration of database indexes and query performance:
With this knowledge, you can design indexing strategies that optimize query performance while managing costs and complexity—the mark of a true database expert.
You've mastered the art and science of database indexing. From B-tree internals to composite index column ordering to trade-off analysis, you possess the comprehensive knowledge needed to design high-performance database systems. Apply these principles systematically, monitor continuously, and your databases will serve applications reliably at any scale.