Loading learning content...
There is no free lunch in computer science, and indexing is no exception. Every index that accelerates reads imposes costs elsewhere—storage space, write overhead, memory pressure, and maintenance complexity. The goal is not to create as many indexes as possible but to create the right indexes that maximize benefit while minimizing cost.
Many production database problems stem from misunderstanding these trade-offs: over-indexed tables that grind to a halt on writes, under-indexed tables with queries that timeout, or poorly-designed indexes that consume resources without providing value. Mastering index trade-offs separates database administrators from database experts.
By the end of this page, you will understand the full cost structure of indexes including storage, write overhead, and operational complexity. You will learn to quantify these costs, recognize common indexing mistakes, and apply principles for balanced indexing strategies.
Every index occupies disk space. For small tables, this is inconsequential. For large tables with multiple indexes, storage overhead can exceed the table itself.
Index Size Calculation:
The size of a B+-tree index depends on:
Approximate Formula:
Index Size ≈ (N × Entry Size) / Fill Factor × 1.2 (for internal nodes)
Example:
| Index Type | Key Column Type | Typical Size vs. Table | Notes |
|---|---|---|---|
| Primary (clustered) | INT | ~100% (IS the table) | Data stored in index leaves |
| Secondary on INT | INT | 5-15% | 4-byte key + 8-byte pointer |
| Secondary on VARCHAR(50) | VARCHAR | 15-40% | Variable key size dominates |
| Composite (3 INTs) | INT, INT, INT | 10-20% | Larger entries, higher fanout impact |
| Covering index | INT + 4 columns | 30-80% | Extra columns inflate leaf size |
It's common to see tables where combined index storage is 2-5× the table storage. A 100GB table might have 300GB of indexes. This isn't necessarily wrong—if those indexes are all actively used for critical queries—but it represents real infrastructure cost that should be intentionally incurred.
Storage Implications:
Storage Optimization Strategies:
Every data modification must update all relevant indexes. This is the most significant trade-off of indexing and the primary reason not to over-index.
What Happens on INSERT:
The Multiplication Factor:
A table with 5 indexes means each INSERT does essentially 6 separate inserts (1 table + 5 indexes). The overhead is not merely additive—it includes:
UPDATE Overhead:
Updates involving indexed columns are particularly expensive:
A column that changes frequently should rarely be indexed.
DELETE Overhead:
| Operation | No Indexes | 3 Indexes | 10 Indexes | Relative Overhead |
|---|---|---|---|---|
| INSERT | 1 table write | 4 writes | 11 writes | 4-11× |
| UPDATE (indexed col) | 1 table write | 3 deletes + 3 inserts | 10 deletes + 10 inserts | 6-20× |
| UPDATE (non-indexed) | 1 table write | 1 write | 1 write | 1× |
| DELETE | 1 table operation | 4 operations | 11 operations | 4-11× |
Before adding an index to a write-heavy table, calculate: How many reads does this index accelerate per day? How many writes does it slow down? A typical rule: An index is worthwhile if read improvement × read frequency > write overhead × write frequency. For tables with 100+ writes per second, every index must earn its place.
Indexes must be loaded into memory (the buffer pool) to be traversed. Index pages compete with data pages for limited memory. Too many indexes can degrade overall performance by evicting useful data or other index pages.
The Buffer Pool Economy:
A database buffer pool is a fixed-size cache holding frequently-accessed pages. When full, loading a new page evicts an old one. Consider a scenario:
Without indexes C and D (6 GB), more table data stays cached, improving both indexed and non-indexed queries.
Index Working Set:
Not all index pages are equally important. The working set is the portion actively used:
For a well-designed index, the upper levels fit entirely in memory. Only leaf page access causes I/O. This is why fanout matters—higher fanout means shallower trees, more levels cached.
Memory Pressure Symptoms:
An index larger than available memory will never be fully cached. Each query may require physical I/O. For very large tables (TB scale), this is unavoidable—the strategy shifts to ensuring the working set (upper levels + frequently-accessed leaves) fits in memory while accepting I/O for infrequent accesses.
Indexes require ongoing maintenance beyond their creation. This operational overhead is often underestimated but can consume significant DBA time and system resources.
Fragmentation:
Over time, insert and delete operations fragment indexes:
Impact of Fragmentation:
Rebuilding and Reorganizing:
To combat fragmentation, indexes need periodic maintenance:
REBUILD (Offline or Online):
REORGANIZE (Online):
Statistics Updates:
Query optimizers depend on accurate statistics about indexes:
| Operation | Purpose | Impact | Typical Frequency |
|---|---|---|---|
| REBUILD INDEX | Eliminate fragmentation | Resource-intensive, may lock | Monthly-Yearly |
| REORGANIZE INDEX | Compact leaf pages | Lower impact, online | Weekly-Monthly |
| UPDATE STATISTICS | Refresh optimizer info | Low impact, quick | Daily-Weekly |
| VALIDATE INDEX | Check for corruption | Read-intensive scan | Monthly or after issues |
| MONITOR USAGE | Track index utilization | Minimal, query catalog | Weekly audit |
Every index adds to maintenance burden. A table with 15 indexes requires 15 rebuild cycles. If your maintenance window is 2 hours and each rebuild takes 10 minutes, you've consumed 2.5 hours—exceeding your window. This is a real constraint in 24/7 systems and a reason to minimize index count.
Indexes introduce additional lockable resources. Each index page, and potentially each index entry, can be locked. With more indexes, there are more locks, and therefore more opportunities for contention.
Index Lock Types:
Page-Level Locks: Entire index page locked during modification
Key-Level Locks: Individual index entries locked
Gap Locks: Lock ranges between keys (in some databases)
Hot Spot Problem:
Sequential indexes (auto-increment IDs, timestamps) create hot spots:
Mitigation Strategies:
Deadlocks:
Multiple indexes increase deadlock risk:
Databases detect and resolve deadlocks, but at the cost of rolling back transactions.
Use database tools to identify contention: Wait statistics (SQL Server), Lock wait events (PostgreSQL), innodb_row_lock_waits (MySQL). If index page waits are high, consider reducing index count, reorganizing to reduce page splits, or using non-sequential key strategies.
More indexes give the query optimizer more choices—which is both good and bad. Good because more paths may exist to answer queries efficiently. Bad because:
The Combinatorial Explosion:
For a query touching 3 tables with 5 indexes each:
Plan Regression Risk:
The optimizer chooses plans based on statistics. With many indexes, the chosen plan is more sensitive to:
A query that performed well for months suddenly slows down—not because anything changed, but because the optimizer now chooses a different index due to subtle statistical shifts.
Managing Optimizer Complexity:
Effective indexing requires balancing competing concerns. There is no formula—each system has unique workload characteristics. But there are principles that guide good decisions.
Principle 1: Workload-Driven Design
Indexes should reflect actual query patterns, not theoretical completeness:
Principle 2: Consolidation Over Proliferation
One well-designed composite index often serves multiple query patterns:
Principle 3: Read/Write Ratio Awareness
| Table Characteristic | Indexing Strategy |
|---|---|
| Read-heavy (OLAP) | Index generously for query patterns |
| Write-heavy (OLTP) | Minimize indexes; only critical paths |
| Mixed workload | Careful balance; monitor both impacts |
| Batch loading | Drop indexes during load; rebuild after |
Principle 4: Lifecycle Management
Indexes should not be permanent fixtures:
Index trade-offs are not obstacles to avoid but realities to manage. Expert database professionals don't minimize or maximize indexes—they optimize, balancing costs against benefits for their specific workload. Let's consolidate the key concepts:
Module Complete:
With this page, we have completed Module 1: Index Concept. You now have a comprehensive understanding of what indexes are, how they are organized, how they accelerate queries, and what trade-offs they entail. This foundation prepares you for deeper exploration of specific index types, operations, and optimization techniques in subsequent modules.
Congratulations! You have mastered the fundamental concepts of database indexing: definition, search keys, index entries, lookup acceleration, and trade-offs. You are now equipped to reason about indexes at a professional level and ready to explore specific index types and advanced indexing strategies in the modules ahead.