Database Management SystemsIndex Concept

Understanding Database Indexes

LevelIntermediate

Duration60 mins

TopicIndex Concept

5 / 5

Index Trade-offs

The Price of Speed

There is no free lunch in computer science, and indexing is no exception. Every index that accelerates reads imposes costs elsewhere—storage space, write overhead, memory pressure, and maintenance complexity. The goal is not to create as many indexes as possible but to create the right indexes that maximize benefit while minimizing cost.

Many production database problems stem from misunderstanding these trade-offs: over-indexed tables that grind to a halt on writes, under-indexed tables with queries that timeout, or poorly-designed indexes that consume resources without providing value. Mastering index trade-offs separates database administrators from database experts.

What You Will Learn

By the end of this page, you will understand the full cost structure of indexes including storage, write overhead, and operational complexity. You will learn to quantify these costs, recognize common indexing mistakes, and apply principles for balanced indexing strategies.

Storage Overhead: The Space Index Demands

Every index occupies disk space. For small tables, this is inconsequential. For large tables with multiple indexes, storage overhead can exceed the table itself.

Index Size Calculation:

The size of a B+-tree index depends on:

Entry count: N entries (one per row, or one per unique key for clustered)
Entry size: Key size + RID size (or full row for clustered)
Fill factor: Percentage of each page actually used (typically 70-90%)
Overhead: Internal nodes, page headers, free space management

Approximate Formula:

Index Size ≈ (N × Entry Size) / Fill Factor × 1.2 (for internal nodes)

Example:

Table: 10 million rows
Index on VARCHAR(50) column: Average key = 25 bytes
RID: 8 bytes
Entry size ≈ 33 bytes + overhead ≈ 40 bytes
Fill factor: 80%
Index size ≈ (10M × 40) / 0.8 × 1.2 = 600 MB

Typical Index Size Relative to Table Size
Index Type	Key Column Type	Typical Size vs. Table	Notes
Primary (clustered)	INT	~100% (IS the table)	Data stored in index leaves
Secondary on INT	INT	5-15%	4-byte key + 8-byte pointer
Secondary on VARCHAR(50)	VARCHAR	15-40%	Variable key size dominates
Composite (3 INTs)	INT, INT, INT	10-20%	Larger entries, higher fanout impact
Covering index	INT + 4 columns	30-80%	Extra columns inflate leaf size

When Index Storage Exceeds Table Storage

It's common to see tables where combined index storage is 2-5× the table storage. A 100GB table might have 300GB of indexes. This isn't necessarily wrong—if those indexes are all actively used for critical queries—but it represents real infrastructure cost that should be intentionally incurred.

Storage Implications:

Disk Cost: Raw storage expense scales with index size
Backup Size: Backups include all indexes, increasing time and storage
Replication Overhead: Replicas must store and sync all indexes
Cloud Cost: Pay-per-GB storage charges add up across environments

Storage Optimization Strategies:

Consolidate indexes: One composite index may replace multiple simple indexes
Remove unused indexes: Audit index usage; drop those never accessed
Partial indexes: Index only rows that matter (e.g., active users only)
Index compression: Some databases support compressed indexes

Write Overhead: The Insert/Update/Delete Tax

Every data modification must update all relevant indexes. This is the most significant trade-off of indexing and the primary reason not to over-index.

What Happens on INSERT:

Insert record into base table
For EACH index on the table:
- Traverse tree to correct leaf
- Insert new entry
- If leaf is full: split page, update parent (may cascade)
- Write dirty pages (or log for WAL)

The Multiplication Factor:

A table with 5 indexes means each INSERT does essentially 6 separate inserts (1 table + 5 indexes). The overhead is not merely additive—it includes:

Additional I/O (random for each index traversal)
Additional lock contention (each index entry is a lockable resource)
Additional logging (each index change is logged separately)
Additional CPU (tree traversal, page management)

UPDATE Overhead:

Updates involving indexed columns are particularly expensive:

Find the record (may use index)
Update the table record (in place or move)
For EACH index containing updated columns:
- Delete old index entry (tree traversal + deletion)
- Insert new index entry (tree traversal + insertion)

A column that changes frequently should rarely be indexed.

DELETE Overhead:

Find the record to delete
Mark record as deleted in table (or remove)
For EACH index:
- Traverse to the entry
- Delete or mark as deleted
- Handle underflow (merge nodes if necessary)

Write Overhead by Operation and Index Count
Operation	No Indexes	3 Indexes	10 Indexes	Relative Overhead
INSERT	1 table write	4 writes	11 writes	4-11×
UPDATE (indexed col)	1 table write	3 deletes + 3 inserts	10 deletes + 10 inserts	6-20×
UPDATE (non-indexed)	1 table write	1 write	1 write	1×
DELETE	1 table operation	4 operations	11 operations	4-11×

The Write-Heavy Workload Test

Before adding an index to a write-heavy table, calculate: How many reads does this index accelerate per day? How many writes does it slow down? A typical rule: An index is worthwhile if read improvement × read frequency > write overhead × write frequency. For tables with 100+ writes per second, every index must earn its place.

Memory Pressure: Competing for Buffer Pool

Indexes must be loaded into memory (the buffer pool) to be traversed. Index pages compete with data pages for limited memory. Too many indexes can degrade overall performance by evicting useful data or other index pages.

The Buffer Pool Economy:

A database buffer pool is a fixed-size cache holding frequently-accessed pages. When full, loading a new page evicts an old one. Consider a scenario:

Buffer pool: 8 GB
Table data: 50 GB
Index A: 5 GB (frequently used)
Index B: 3 GB (occasionally used)
Index C: 4 GB (rarely used)
Index D: 2 GB (almost never used)
Total index: 14 GB

Without indexes C and D (6 GB), more table data stays cached, improving both indexed and non-indexed queries.

Index Working Set:

Not all index pages are equally important. The working set is the portion actively used:

Root node: Always hot (loaded once, stays cached)
Upper internal levels: Usually cached (small, frequently accessed)
Lower internal levels: Partially cached (depends on access patterns)
Leaf nodes: Rarely all cached (too large, accessed by specific queries)

For a well-designed index, the upper levels fit entirely in memory. Only leaf page access causes I/O. This is why fanout matters—higher fanout means shallower trees, more levels cached.

Memory Pressure Symptoms:

Increased I/O on queries that were previously fast
Buffer pool hit ratio dropping over time
Query performance varying with workload mix
Periodic performance degradation as cache churns

Memory Optimization Strategies

•Increase buffer pool size: More memory = more cached pages (up to diminishing returns)
•Remove unused indexes: Eliminate competition for cache from unused indexes
•Use smaller key types: INT vs. VARCHAR reduces index page count
•Enable index compression: Compressed pages fit more entries, reducing memory footprint
•Monitor hit ratios: Track buffer pool efficiency to detect pressure early

The Large Index Problem

An index larger than available memory will never be fully cached. Each query may require physical I/O. For very large tables (TB scale), this is unavoidable—the strategy shifts to ensuring the working set (upper levels + frequently-accessed leaves) fits in memory while accepting I/O for infrequent accesses.

Index Maintenance: Ongoing Operational Costs

Indexes require ongoing maintenance beyond their creation. This operational overhead is often underestimated but can consume significant DBA time and system resources.

Fragmentation:

Over time, insert and delete operations fragment indexes:

Internal fragmentation: Pages become partially filled as entries are deleted or scattered by splits
External fragmentation: Logical page sequence diverges from physical disk sequence
Free space accumulation: Dead entries and free space reduce effective page utilization

Impact of Fragmentation:

Larger index size on disk than necessary
More pages to read for range scans
Reduced buffer pool efficiency
Slower sequential scan patterns

Rebuilding and Reorganizing:

To combat fragmentation, indexes need periodic maintenance:

REBUILD (Offline or Online):

Creates entirely new index structure
Eliminates all fragmentation
Expensive: May lock table (offline) or consume significant I/O (online)
Frequency: Monthly to yearly depending on churn rate

REORGANIZE (Online):

Compacts leaf pages in place
Less disruptive than rebuild
Doesn't address internal node fragmentation
Frequency: Weekly to monthly

Statistics Updates:

Query optimizers depend on accurate statistics about indexes:

Row counts, distinct values, value distribution histograms
Stale statistics lead to poor query plans
Statistics must be refreshed as data changes
Frequency: Daily to weekly, or after significant data changes

Index Maintenance Operations
Operation	Purpose	Impact	Typical Frequency
REBUILD INDEX	Eliminate fragmentation	Resource-intensive, may lock	Monthly-Yearly
REORGANIZE INDEX	Compact leaf pages	Lower impact, online	Weekly-Monthly
UPDATE STATISTICS	Refresh optimizer info	Low impact, quick	Daily-Weekly
VALIDATE INDEX	Check for corruption	Read-intensive scan	Monthly or after issues
MONITOR USAGE	Track index utilization	Minimal, query catalog	Weekly audit

The Maintenance Window Squeeze

Every index adds to maintenance burden. A table with 15 indexes requires 15 rebuild cycles. If your maintenance window is 2 hours and each rebuild takes 10 minutes, you've consumed 2.5 hours—exceeding your window. This is a real constraint in 24/7 systems and a reason to minimize index count.

Lock Contention: Concurrency Implications

Indexes introduce additional lockable resources. Each index page, and potentially each index entry, can be locked. With more indexes, there are more locks, and therefore more opportunities for contention.

Index Lock Types:

Page-Level Locks: Entire index page locked during modification
- Insert into page = shared lock for traversal + exclusive for modification
- Higher contention on 'hot' pages (recent inserts in sequential indexes)
Key-Level Locks: Individual index entries locked
- Finer granularity reduces contention
- Higher overhead (more locks to manage)
- Used by most modern databases for row-level locking
Gap Locks: Lock ranges between keys (in some databases)
- Prevents phantom reads in serializable isolation
- Can cause unexpected blocking

Hot Spot Problem:

Sequential indexes (auto-increment IDs, timestamps) create hot spots:

All inserts go to the same leaf page (the rightmost one)
Multiple transactions compete for that page's lock
Throughput limited by page lock serialization

Mitigation Strategies:

Use random IDs: UUIDs scatter inserts across pages (but hurt range queries)
Reverse-key indexes: Oracle feature that reverses key bytes
Partitioned indexes: Spread inserts across partitions
Reduce index count: Fewer indexes = fewer locks per insert

Deadlocks:

Multiple indexes increase deadlock risk:

Transaction A: Update row via Index 1, then Index 2
Transaction B: Update same row via Index 2, then Index 1
Result: Deadlock if they interleave

Databases detect and resolve deadlocks, but at the cost of rolling back transactions.

Diagnosing Index Contention

Use database tools to identify contention: Wait statistics (SQL Server), Lock wait events (PostgreSQL), innodb_row_lock_waits (MySQL). If index page waits are high, consider reducing index count, reorganizing to reduce page splits, or using non-sequential key strategies.

Query Optimizer Complexity

More indexes give the query optimizer more choices—which is both good and bad. Good because more paths may exist to answer queries efficiently. Bad because:

Planning time increases: Optimizer must evaluate more alternatives
Suboptimal choices risk: Wrong index selection can be worse than table scan
Plan instability: Subtle data changes may cause plan flips

The Combinatorial Explosion:

For a query touching 3 tables with 5 indexes each:

Each table has 6 access methods (5 indexes + table scan)
Join order permutations: 3! = 6
Access method combinations: 6³ = 216
Total plan space: Potentially thousands of plans to consider

Plan Regression Risk:

The optimizer chooses plans based on statistics. With many indexes, the chosen plan is more sensitive to:

Statistics accuracy (stale stats → wrong plan)
Data distribution changes (skew changes → different best index)
Parameter values (different predicate value → different selectivity)

A query that performed well for months suddenly slows down—not because anything changed, but because the optimizer now chooses a different index due to subtle statistical shifts.

Managing Optimizer Complexity:

Keep statistics fresh: Regular updates reduce bad decisions
Use plan guides/hints: Lock critical queries to specific indexes when needed
Remove unused indexes: Fewer choices = simpler optimization
Monitor plan changes: Alert on execution plan shifts for important queries
Test index additions: Measure impact before production deployment

Many Indexes - Benefits

•More query patterns covered
•Optimizer has options for each query
•Can handle diverse workloads
•Different queries find best paths

Many Indexes - Risks

•Longer optimization time
•Higher risk of bad plan choices
•Plan instability increases
•Harder to predict behavior

The Balancing Act: Developing Index Strategy

Effective indexing requires balancing competing concerns. There is no formula—each system has unique workload characteristics. But there are principles that guide good decisions.

Principle 1: Workload-Driven Design

Indexes should reflect actual query patterns, not theoretical completeness:

Analyze query logs to identify high-frequency, high-cost queries
Design indexes to accelerate specific bottleneck queries
Resist creating indexes "just in case" they might help

Principle 2: Consolidation Over Proliferation

One well-designed composite index often serves multiple query patterns:

Index (A, B, C) supports: A, A+B, A+B+C
This replaces: Index (A), Index (A, B), Index (A, B, C)
Fewer indexes = less write overhead, less maintenance

Principle 3: Read/Write Ratio Awareness

Table Characteristic	Indexing Strategy
Read-heavy (OLAP)	Index generously for query patterns
Write-heavy (OLTP)	Minimize indexes; only critical paths
Mixed workload	Careful balance; monitor both impacts
Batch loading	Drop indexes during load; rebuild after

Principle 4: Lifecycle Management

Indexes should not be permanent fixtures:

Audit usage quarterly: identify unused indexes
Remove indexes for deprecated features
Re-evaluate when query patterns change
Document why each index exists (naming convention helps)

Index Strategy Checklist

•Profile your workload — Know your top 20 queries by frequency and cost
•Design for the 80/20 — Optimize the queries that matter most
•Measure before and after — Quantify impact of each index addition
•Monitor write overhead — Track insert/update/delete performance
•Audit regularly — Remove unused indexes; they have zero benefit and real cost
•Document decisions — Future you will want to know why this index exists

Summary: Mastering Index Trade-offs

Index trade-offs are not obstacles to avoid but realities to manage. Expert database professionals don't minimize or maximize indexes—they optimize, balancing costs against benefits for their specific workload. Let's consolidate the key concepts:

Key Takeaways

•Storage overhead is real and measurable — Indexes can exceed table size; plan for total storage requirements.
•Write overhead multiplies with index count — Each index adds I/O, locks, and logging per write operation.
•Memory pressure affects all queries — Index pages compete with data pages for limited buffer pool space.
•Maintenance is ongoing work — Fragmentation, statistics, and rebuilds require scheduling and resources.
•Lock contention increases with indexes — More indexes mean more lockable resources and potential conflicts.
•Optimizer complexity grows with choices — Many indexes can lead to longer planning times and plan instability.
•Strategy requires balance — Right-size indexing based on workload, read/write ratio, and feature usage.

Module Complete:

With this page, we have completed Module 1: Index Concept. You now have a comprehensive understanding of what indexes are, how they are organized, how they accelerate queries, and what trade-offs they entail. This foundation prepares you for deeper exploration of specific index types, operations, and optimization techniques in subsequent modules.

Module Complete

Congratulations! You have mastered the fundamental concepts of database indexing: definition, search keys, index entries, lookup acceleration, and trade-offs. You are now equipped to reason about indexes at a professional level and ready to explore specific index types and advanced indexing strategies in the modules ahead.

5 / 5

Loading learning content...

Database Management SystemsIndex Concept

Understanding Database Indexes

LevelIntermediate

Duration60 mins

TopicIndex Concept

5 / 5

Index Trade-offs

The Price of Speed

What You Will Learn

Storage Overhead: The Space Index Demands

Every index occupies disk space. For small tables, this is inconsequential. For large tables with multiple indexes, storage overhead can exceed the table itself.

Index Size Calculation:

The size of a B+-tree index depends on:

Entry count: N entries (one per row, or one per unique key for clustered)
Entry size: Key size + RID size (or full row for clustered)
Fill factor: Percentage of each page actually used (typically 70-90%)
Overhead: Internal nodes, page headers, free space management

Approximate Formula:

Index Size ≈ (N × Entry Size) / Fill Factor × 1.2 (for internal nodes)

Example:

Table: 10 million rows
Index on VARCHAR(50) column: Average key = 25 bytes
RID: 8 bytes
Entry size ≈ 33 bytes + overhead ≈ 40 bytes
Fill factor: 80%
Index size ≈ (10M × 40) / 0.8 × 1.2 = 600 MB

Typical Index Size Relative to Table Size
Index Type	Key Column Type	Typical Size vs. Table	Notes
Primary (clustered)	INT	~100% (IS the table)	Data stored in index leaves
Secondary on INT	INT	5-15%	4-byte key + 8-byte pointer
Secondary on VARCHAR(50)	VARCHAR	15-40%	Variable key size dominates
Composite (3 INTs)	INT, INT, INT	10-20%	Larger entries, higher fanout impact
Covering index	INT + 4 columns	30-80%	Extra columns inflate leaf size

When Index Storage Exceeds Table Storage

Storage Implications:

Disk Cost: Raw storage expense scales with index size
Backup Size: Backups include all indexes, increasing time and storage
Replication Overhead: Replicas must store and sync all indexes
Cloud Cost: Pay-per-GB storage charges add up across environments

Storage Optimization Strategies:

Consolidate indexes: One composite index may replace multiple simple indexes
Remove unused indexes: Audit index usage; drop those never accessed
Partial indexes: Index only rows that matter (e.g., active users only)
Index compression: Some databases support compressed indexes

Write Overhead: The Insert/Update/Delete Tax

Every data modification must update all relevant indexes. This is the most significant trade-off of indexing and the primary reason not to over-index.

What Happens on INSERT:

Insert record into base table
For EACH index on the table:
- Traverse tree to correct leaf
- Insert new entry
- If leaf is full: split page, update parent (may cascade)
- Write dirty pages (or log for WAL)

The Multiplication Factor:

A table with 5 indexes means each INSERT does essentially 6 separate inserts (1 table + 5 indexes). The overhead is not merely additive—it includes:

Additional I/O (random for each index traversal)
Additional lock contention (each index entry is a lockable resource)
Additional logging (each index change is logged separately)
Additional CPU (tree traversal, page management)

UPDATE Overhead:

Updates involving indexed columns are particularly expensive:

Find the record (may use index)
Update the table record (in place or move)
For EACH index containing updated columns:
- Delete old index entry (tree traversal + deletion)
- Insert new index entry (tree traversal + insertion)

A column that changes frequently should rarely be indexed.

DELETE Overhead:

Find the record to delete
Mark record as deleted in table (or remove)
For EACH index:
- Traverse to the entry
- Delete or mark as deleted
- Handle underflow (merge nodes if necessary)

Write Overhead by Operation and Index Count
Operation	No Indexes	3 Indexes	10 Indexes	Relative Overhead
INSERT	1 table write	4 writes	11 writes	4-11×
UPDATE (indexed col)	1 table write	3 deletes + 3 inserts	10 deletes + 10 inserts	6-20×
UPDATE (non-indexed)	1 table write	1 write	1 write	1×
DELETE	1 table operation	4 operations	11 operations	4-11×

The Write-Heavy Workload Test

Memory Pressure: Competing for Buffer Pool

The Buffer Pool Economy:

A database buffer pool is a fixed-size cache holding frequently-accessed pages. When full, loading a new page evicts an old one. Consider a scenario:

Buffer pool: 8 GB
Table data: 50 GB
Index A: 5 GB (frequently used)
Index B: 3 GB (occasionally used)
Index C: 4 GB (rarely used)
Index D: 2 GB (almost never used)
Total index: 14 GB

Without indexes C and D (6 GB), more table data stays cached, improving both indexed and non-indexed queries.

Index Working Set:

Not all index pages are equally important. The working set is the portion actively used:

Root node: Always hot (loaded once, stays cached)
Upper internal levels: Usually cached (small, frequently accessed)
Lower internal levels: Partially cached (depends on access patterns)
Leaf nodes: Rarely all cached (too large, accessed by specific queries)

For a well-designed index, the upper levels fit entirely in memory. Only leaf page access causes I/O. This is why fanout matters—higher fanout means shallower trees, more levels cached.

Memory Pressure Symptoms:

Increased I/O on queries that were previously fast
Buffer pool hit ratio dropping over time
Query performance varying with workload mix
Periodic performance degradation as cache churns

Memory Optimization Strategies

•Increase buffer pool size: More memory = more cached pages (up to diminishing returns)
•Remove unused indexes: Eliminate competition for cache from unused indexes
•Use smaller key types: INT vs. VARCHAR reduces index page count
•Enable index compression: Compressed pages fit more entries, reducing memory footprint
•Monitor hit ratios: Track buffer pool efficiency to detect pressure early

The Large Index Problem

Index Maintenance: Ongoing Operational Costs

Indexes require ongoing maintenance beyond their creation. This operational overhead is often underestimated but can consume significant DBA time and system resources.

Fragmentation:

Over time, insert and delete operations fragment indexes:

Internal fragmentation: Pages become partially filled as entries are deleted or scattered by splits
External fragmentation: Logical page sequence diverges from physical disk sequence
Free space accumulation: Dead entries and free space reduce effective page utilization

Impact of Fragmentation:

Larger index size on disk than necessary
More pages to read for range scans
Reduced buffer pool efficiency
Slower sequential scan patterns

Rebuilding and Reorganizing:

To combat fragmentation, indexes need periodic maintenance:

REBUILD (Offline or Online):

Creates entirely new index structure
Eliminates all fragmentation
Expensive: May lock table (offline) or consume significant I/O (online)
Frequency: Monthly to yearly depending on churn rate

REORGANIZE (Online):

Compacts leaf pages in place
Less disruptive than rebuild
Doesn't address internal node fragmentation
Frequency: Weekly to monthly

Statistics Updates:

Query optimizers depend on accurate statistics about indexes:

Row counts, distinct values, value distribution histograms
Stale statistics lead to poor query plans
Statistics must be refreshed as data changes
Frequency: Daily to weekly, or after significant data changes

Index Maintenance Operations
Operation	Purpose	Impact	Typical Frequency
REBUILD INDEX	Eliminate fragmentation	Resource-intensive, may lock	Monthly-Yearly
REORGANIZE INDEX	Compact leaf pages	Lower impact, online	Weekly-Monthly
UPDATE STATISTICS	Refresh optimizer info	Low impact, quick	Daily-Weekly
VALIDATE INDEX	Check for corruption	Read-intensive scan	Monthly or after issues
MONITOR USAGE	Track index utilization	Minimal, query catalog	Weekly audit

The Maintenance Window Squeeze

Lock Contention: Concurrency Implications

Index Lock Types:

Page-Level Locks: Entire index page locked during modification
- Insert into page = shared lock for traversal + exclusive for modification
- Higher contention on 'hot' pages (recent inserts in sequential indexes)
Key-Level Locks: Individual index entries locked
- Finer granularity reduces contention
- Higher overhead (more locks to manage)
- Used by most modern databases for row-level locking
Gap Locks: Lock ranges between keys (in some databases)
- Prevents phantom reads in serializable isolation
- Can cause unexpected blocking

Hot Spot Problem:

Sequential indexes (auto-increment IDs, timestamps) create hot spots:

All inserts go to the same leaf page (the rightmost one)
Multiple transactions compete for that page's lock
Throughput limited by page lock serialization

Mitigation Strategies:

Use random IDs: UUIDs scatter inserts across pages (but hurt range queries)
Reverse-key indexes: Oracle feature that reverses key bytes
Partitioned indexes: Spread inserts across partitions
Reduce index count: Fewer indexes = fewer locks per insert

Deadlocks:

Multiple indexes increase deadlock risk:

Transaction A: Update row via Index 1, then Index 2
Transaction B: Update same row via Index 2, then Index 1
Result: Deadlock if they interleave

Databases detect and resolve deadlocks, but at the cost of rolling back transactions.

Diagnosing Index Contention

Query Optimizer Complexity

More indexes give the query optimizer more choices—which is both good and bad. Good because more paths may exist to answer queries efficiently. Bad because:

Planning time increases: Optimizer must evaluate more alternatives
Suboptimal choices risk: Wrong index selection can be worse than table scan
Plan instability: Subtle data changes may cause plan flips

The Combinatorial Explosion:

For a query touching 3 tables with 5 indexes each:

Each table has 6 access methods (5 indexes + table scan)
Join order permutations: 3! = 6
Access method combinations: 6³ = 216
Total plan space: Potentially thousands of plans to consider

Plan Regression Risk:

The optimizer chooses plans based on statistics. With many indexes, the chosen plan is more sensitive to:

Statistics accuracy (stale stats → wrong plan)
Data distribution changes (skew changes → different best index)
Parameter values (different predicate value → different selectivity)

A query that performed well for months suddenly slows down—not because anything changed, but because the optimizer now chooses a different index due to subtle statistical shifts.

Managing Optimizer Complexity:

Keep statistics fresh: Regular updates reduce bad decisions
Use plan guides/hints: Lock critical queries to specific indexes when needed
Remove unused indexes: Fewer choices = simpler optimization
Monitor plan changes: Alert on execution plan shifts for important queries
Test index additions: Measure impact before production deployment

Many Indexes - Benefits

•More query patterns covered
•Optimizer has options for each query
•Can handle diverse workloads
•Different queries find best paths

Many Indexes - Risks

•Longer optimization time
•Higher risk of bad plan choices
•Plan instability increases
•Harder to predict behavior

The Balancing Act: Developing Index Strategy

Effective indexing requires balancing competing concerns. There is no formula—each system has unique workload characteristics. But there are principles that guide good decisions.

Principle 1: Workload-Driven Design

Indexes should reflect actual query patterns, not theoretical completeness:

Analyze query logs to identify high-frequency, high-cost queries
Design indexes to accelerate specific bottleneck queries
Resist creating indexes "just in case" they might help

Principle 2: Consolidation Over Proliferation

One well-designed composite index often serves multiple query patterns:

Index (A, B, C) supports: A, A+B, A+B+C
This replaces: Index (A), Index (A, B), Index (A, B, C)
Fewer indexes = less write overhead, less maintenance

Principle 3: Read/Write Ratio Awareness

Table Characteristic	Indexing Strategy
Read-heavy (OLAP)	Index generously for query patterns
Write-heavy (OLTP)	Minimize indexes; only critical paths
Mixed workload	Careful balance; monitor both impacts
Batch loading	Drop indexes during load; rebuild after

Principle 4: Lifecycle Management

Indexes should not be permanent fixtures:

Audit usage quarterly: identify unused indexes
Remove indexes for deprecated features
Re-evaluate when query patterns change
Document why each index exists (naming convention helps)

Index Strategy Checklist

•Profile your workload — Know your top 20 queries by frequency and cost
•Design for the 80/20 — Optimize the queries that matter most
•Measure before and after — Quantify impact of each index addition
•Monitor write overhead — Track insert/update/delete performance
•Audit regularly — Remove unused indexes; they have zero benefit and real cost
•Document decisions — Future you will want to know why this index exists

Summary: Mastering Index Trade-offs

Key Takeaways

•Storage overhead is real and measurable — Indexes can exceed table size; plan for total storage requirements.
•Write overhead multiplies with index count — Each index adds I/O, locks, and logging per write operation.
•Memory pressure affects all queries — Index pages compete with data pages for limited buffer pool space.
•Maintenance is ongoing work — Fragmentation, statistics, and rebuilds require scheduling and resources.
•Lock contention increases with indexes — More indexes mean more lockable resources and potential conflicts.
•Optimizer complexity grows with choices — Many indexes can lead to longer planning times and plan instability.
•Strategy requires balance — Right-size indexing based on workload, read/write ratio, and feature usage.

Module Complete:

Module Complete

5 / 5