Database Management SystemsIndex Selection

Index Selection: The Art and Science of Strategic Indexing

LevelAdvanced

Duration75 mins

TopicIndex Selection

1 / 5

When to Create Indexes

The Indexing Paradox

Indexes are among the most powerful tools in a database administrator's arsenal—yet they are also among the most misused. The ability to reduce a query from seconds to milliseconds is seductive, leading many to believe that more indexes are always better. This assumption is not just wrong; it can be catastrophically wrong.

The core paradox of indexing: Indexes accelerate reads but decelerate writes. Every index you create is a trade-off, a balance between the benefit of faster lookups and the cost of maintaining additional data structures. The art of indexing lies not in creating indexes, but in knowing when to create them—and more importantly, when not to.

What You Will Learn

By the end of this page, you will understand the fundamental decision framework for index creation, the key factors that influence indexing decisions, common anti-patterns to avoid, and the structured methodology used by database architects to make strategic indexing choices.

The Fundamental Question

Before creating any index, you must answer one fundamental question: Will this index provide more benefit than cost?

This question seems simple, but answering it correctly requires understanding the complex interplay between read patterns, write patterns, data distribution, query types, and system constraints. Let's establish the foundational principles that govern this decision.

Core Principles of Index Selection

•Indexes accelerate SELECT but slow INSERT, UPDATE, DELETE — Every modification operation must update all affected indexes. A table with 10 indexes requires 10 additional write operations per row modification.
•Indexes consume storage space — Each index is a separate data structure, often stored as a B+-tree, that duplicates key columns. For large tables, indexes can consume more space than the base table itself.
•Indexes require memory — To deliver optimal performance, indexes need to be cached in the buffer pool. More indexes mean more competition for limited memory resources.
•Indexes have maintenance overhead — Beyond per-row costs, indexes require periodic maintenance operations like reorganization, rebuilding, and statistics updates.
•Not all queries benefit equally — An index that dramatically improves one query may provide zero benefit to another query on the same table.

The Naive Approach Fails

The most common mistake is to create an index for every column referenced in a WHERE clause. This naive approach ignores selectivity, query frequency, write patterns, and system constraints. It leads to over-indexed databases that are slow to write, expensive to maintain, and often no faster to read.

The Decision Framework

Professional database architects use a structured decision framework to evaluate potential indexes. This framework considers multiple factors and applies them systematically to determine whether an index should be created.

The Index Decision Matrix:

Every proposed index should be evaluated against several key criteria. The matrix below provides a structured approach to this evaluation:

Index Decision Matrix
Factor	Favorable for Indexing (✓)	Unfavorable for Indexing (✗)
Query Frequency	Query executes thousands of times per hour	Query runs once per day in batch job
Base Table Size	Table contains millions of rows	Table has fewer than 1,000 rows
Selectivity	Predicate returns < 5% of rows	Predicate returns > 30% of rows
Write Frequency	Table is updated rarely (< 10% write ratio)	Table has high write frequency (> 50% write ratio)
Column Cardinality	Column has high cardinality (many distinct values)	Column has low cardinality (few distinct values)
Query Performance Gap	Current query takes seconds; target is milliseconds	Query already meets SLA requirements
Available Storage	Disk space is abundant	Storage is constrained and expensive
Memory Availability	Buffer pool can accommodate index	Memory is already over-committed

Applying the Framework:

For each proposed index, score it against each factor. A strong candidate should have favorable conditions for most factors. However, some factors carry more weight than others:

Selectivity is paramount — An index on a non-selective column is almost never beneficial
Write ratio is critical — High-write workloads can turn indexes from assets to liabilities
Query frequency matters — Optimizing a rare query at the expense of common ones is counterproductive

Think of this framework as your first-pass filter. Indexes that fail multiple criteria are unlikely to provide net benefit.

The 80/20 Rule of Indexing

In most systems, 80% of database activity comes from 20% of queries. Focus your indexing efforts on this critical 20%. A perfectly indexed rare query is worth less than a moderately indexed frequent query.

Selectivity: The Most Critical Factor

Selectivity measures what fraction of rows a predicate returns. It is calculated as:

$$\text{Selectivity} = \frac{\text{Number of rows matching predicate}}{\text{Total number of rows}}$$

Selectivity determines whether an index can provide meaningful benefit. Understanding this concept is essential for index decision-making.

Selectivity Guidelines
Selectivity Range	Interpretation	Index Recommendation
< 1%	Highly selective	Strong candidate — Index almost always beneficial
1-5%	Selective	Good candidate — Index likely beneficial for most workloads
5-15%	Moderately selective	Evaluate carefully — Benefits depend on other factors
15-30%	Low selectivity	Unlikely beneficial — Full scan often more efficient
30%	Non-selective	Avoid indexing — Index adds overhead without benefit

Why Selectivity Matters:

Consider a table with 1,000,000 rows. An index lookup typically involves:

Traversing the B+-tree (3-4 I/O operations)
Reading leaf pages for matching keys
Fetching data pages for non-clustered indexes (one I/O per row, potentially)

If your predicate returns 50% of rows (500,000 rows), the index lookup could require 500,000+ I/O operations, while a sequential table scan would require only the number of pages containing the table (perhaps 10,000 operations for a typical table). The index is dramatically worse.

Conversely, if your predicate returns 0.1% of rows (1,000 rows), the index lookup requires roughly 1,000 operations plus B+-tree traversal, while the full scan still requires 10,000 operations. The index wins decisively.

High Selectivity Examples

•WHERE user_id = 12345 — Unique or near-unique lookups
•WHERE order_date = '2024-01-15' — Specific date in large range
•WHERE email = 'user@example.com' — Unique constraint column
•WHERE transaction_id = 'TXN-789012' — Primary key lookup
•WHERE ssn = '123-45-6789' — Natural unique identifier

Low Selectivity Examples

•WHERE status = 'active' — 90% of users are active
•WHERE gender = 'M' — Binary/low-cardinality column
•WHERE is_deleted = 0 — Boolean flag
•WHERE country = 'USA' — Dominant value in skewed distribution
•WHERE year >= 2020 — Range covering most data

Selectivity is Context-Dependent

The same column can have different selectivity depending on the query. An index on country might be excellent for WHERE country = 'Vatican City' (highly selective) but useless for WHERE country = 'USA' (non-selective). The query optimizer uses statistics to make these determinations at runtime.

Understanding Write Amplification

Write amplification refers to the phenomenon where a single logical write operation in your application results in multiple physical write operations in the database. Indexes are a primary source of write amplification.

Consider what happens when you execute a simple INSERT:

INSERT INTO orders (order_id, customer_id, order_date, status, total)
VALUES (12345, 789, '2024-01-15', 'pending', 299.99);

If this table has 8 indexes, the database must:

Write Operations for a Single INSERT

•Insert row into table — Write to the heap or clustered index
•Update primary key index — Insert key into B+-tree
•Update customer_id index — Insert key into B+-tree
•Update order_date index — Insert key into B+-tree
•Update status index — Insert key into B+-tree
•Update total index — Insert key into B+-tree
•Update composite index 1 — Insert keys into B+-tree
•Update composite index 2 — Insert keys into B+-tree
•Write transaction log entries — Log all modifications for recovery

The Amplification Factor:

A single logical INSERT becomes 9+ physical operations. For UPDATE operations affecting indexed columns, the cost is even higher—the old key must be removed and the new key inserted, effectively doubling the index maintenance work.

Quantifying the Impact:

Let's calculate the write overhead for a realistic scenario:

Write Amplification Analysis
Scenario	Base Write Cost	With 5 Indexes	With 10 Indexes	With 20 Indexes
INSERT (1 row)	1 operation	6 operations (6×)	11 operations (11×)	21 operations (21×)
UPDATE (1 indexed column)	1 operation	3 operations (3×)	3 operations (3×)	3 operations (3×)
UPDATE (all indexed columns)	1 operation	11 operations (11×)	21 operations (21×)	41 operations (41×)
DELETE (1 row)	1 operation	6 operations (6×)	11 operations (11×)	21 operations (21×)
Bulk INSERT (10,000 rows)	10K operations	60K operations	110K operations	210K operations

Real-World Impact

A production system processing 10,000 writes per second with 15 indexes is actually performing 150,000+ index operations per second. This overhead directly impacts transaction latency, lock contention, and system throughput. Over-indexing is a common cause of unexplained write performance degradation.

The Read/Write Ratio Analysis

The read/write ratio of your workload is a critical factor in indexing decisions. This ratio determines whether the read acceleration from an index outweighs its write penalties.

Analyzing Your Workload:

For any given table, you should understand:

How many SELECT queries hit this table per hour?
How many INSERT/UPDATE/DELETE operations affect this table per hour?
Which SELECT queries would benefit from the proposed index?
Which write operations would be slowed by the proposed index?

Workload Classification and Indexing Strategy
Workload Type	Read/Write Ratio	Index Strategy
Read-Heavy (OLAP)	95% reads	Index liberally. Read optimization is paramount. Write overhead is acceptable.
Read-Dominant	80-95% reads	Index strategically. Focus on high-impact queries. Monitor write latency.
Balanced	50-80% reads	Index conservatively. Justify each index with measurable read benefit.
Write-Dominant	20-50% reads	Index minimally. Only index for critical queries. Consider materialized views.
Write-Heavy (OLTP)	< 20% reads	Index sparingly. Prefer covering indexes over wide indexes. Minimize index count.

Calculating Index ROI:

For a proposed index, calculate the return on investment:

$$\text{Index ROI} = \frac{\text{Read Benefit} \times \text{Read Frequency}}{\text{Write Cost} \times \text{Write Frequency}}$$

Where:

Read Benefit = Time saved per query using the index
Read Frequency = Number of queries that would use the index per time period
Write Cost = Additional time per write operation to maintain the index
Write Frequency = Number of write operations per time period

An index with ROI > 1 provides net benefit; ROI < 1 indicates the index costs more than it saves.

Index ROI Calculation Example
SQL Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Step 1: Analyze current query performance
-- Average response time without index: 450ms
 
-- Step 2: Estimate performance with index (from test environment)
-- Average response time with index: 15ms
-- Read Benefit = 450ms - 15ms = 435ms
 
-- Step 3: Count read frequency
SELECT COUNT(*) as query_count
FROM query_log
WHERE query_text LIKE '%customer_id = %'
  AND timestamp > NOW() - INTERVAL '1 hour';
-- Result: 12,500 queries per hour
 
-- Step 4: Estimate write overhead
-- Benchmark shows index adds 0.3ms per INSERT
-- Write Cost = 0.3ms
 
-- Step 5: Count write frequency
SELECT COUNT(*) as write_count
FROM audit_log
WHERE table_name = 'customers'
  AND operation IN ('INSERT', 'UPDATE', 'DELETE')
  AND timestamp > NOW() - INTERVAL '1 hour';
-- Result: 850 writes per hour
 
-- Step 6: Calculate ROI
-- Read Benefit × Read Frequency = 435ms × 12,500 = 5,437,500ms saved
-- Write Cost × Write Frequency = 0.3ms × 850 = 255ms added
-- ROI = 5,437,500 / 255 = 21,324
 
-- Interpretation: This index returns 21,000× its cost!
-- Strong candidate for creation.

Always Measure, Never Assume

The ROI calculation above uses estimates. Before creating indexes in production, always validate these assumptions in a staging environment with production-like data. Selectivity, caching behavior, and lock contention can significantly affect actual performance.

Special Cases: When Standard Rules Don't Apply

While the general framework and selectivity guidelines cover most scenarios, there are important special cases where standard indexing rules require modification.

Foreign key columns should almost always be indexed. Even if the column has low selectivity for SELECT queries, the index is critical for maintaining referential integrity.

Why:

When you UPDATE or DELETE a row in the parent table, the database must verify no child rows reference it. Without an index on the foreign key column, this check requires a full table scan of the child table—for every parent modification.

Example:

Consider orders.customer_id → customers.id. If you delete a customer, the database must verify no orders reference that customer. Without an index on orders.customer_id:

Deleting 1 customer scans 10,000,000 orders
Deleting 100 customers scans 1,000,000,000 rows

This can make parent table modifications excruciatingly slow or even cause timeouts.

PostgreSQL Note

Unlike some databases, PostgreSQL does NOT automatically create indexes on foreign key columns. You must create them explicitly. This is a common oversight that causes mysterious performance problems in production.

Common Indexing Anti-Patterns

Learning what not to do is as important as learning what to do. These anti-patterns appear frequently in production databases and cause predictable problems.

Indexing Anti-Patterns to Avoid

•The 'Index Everything' Approach — Creating indexes on every column 'just in case.' This maximizes write overhead while most indexes go unused. Instead: Create indexes reactively based on observed query patterns.
•Duplicate Indexes — Having both INDEX(a, b) and INDEX(a). The composite index already supports prefix queries. The standalone index wastes space and maintenance. Audit regularly for redundant indexes.
•Boolean/Low-Cardinality Indexes — Indexing columns like is_active, gender, or status with only 2-5 distinct values. These indexes are almost never used because selectivity is too low. Exception: Partial indexes for rare values.
•Wide Composite Indexes Without Prefix Queries — Creating INDEX(a, b, c, d, e) when you only query on columns b, c, or d individually. The index only helps queries that start with column 'a'. Understand composite index usage rules.
•Indexing Expressions Without Expression Indexes — Expecting INDEX(column) to help with WHERE UPPER(column) = 'VALUE'. Functions on the column prevent index usage. Create expression indexes when needed.
•Ignoring NULL Handling — In some databases, NULL values are not indexed (Oracle) or are indexed differently. Queries for NULL values may not benefit from standard indexes. Test your specific DBMS behavior.
•Over-reliance on Query Optimizer Suggestions — Blindly creating every index the query optimizer suggests. Optimizers suggest indexes for individual queries without considering the overall workload or maintenance costs. Evaluate suggestions holistically.

The Maintenance Trap

Anti-patterns compound over time. A system with 50 poorly-chosen indexes requires massive maintenance overhead, fragments storage, competes for buffer pool space, and makes every write operation slower. The technical debt accumulates invisibly until the system becomes untenable.

The Index Creation Decision Checklist

Before creating any index, walk through this checklist. A 'No' answer to any critical question should give you pause.

Pre-Creation Checklist

•Is there a specific query this index will improve? If you can't name the query, you're guessing.
•Have you measured the current query performance? Establish a baseline before optimizing.
•Is the query frequent enough to justify ongoing index maintenance? Optimizing a monthly report at the expense of daily operations is counterproductive.
•What is the selectivity of the predicate? If > 15%, strongly reconsider.
•What is the table's read/write ratio? High-write tables require extra scrutiny.
•Does an existing index already cover this query? Check for composite indexes that might serve.
•Have you tested the index in staging with production-like data? Synthetic data can give misleading results.
•Have you checked for redundant/duplicate indexes? New indexes might make old ones unnecessary.
•Do you have a plan to monitor and validate the index after deployment? Unused indexes should be dropped.
•Have you considered the index creation's impact on production? Large tables may lock during index creation.

Page Complete

You now understand the fundamental decision framework for index creation. You know when to index, when not to, and how to evaluate the trade-offs. Next, we'll explore the costs of index maintenance in detail—understanding not just that indexes have overhead, but precisely what that overhead entails.

1 / 5

Loading learning content...

Database Management SystemsIndex Selection

Index Selection: The Art and Science of Strategic Indexing

LevelAdvanced

Duration75 mins

TopicIndex Selection

1 / 5

When to Create Indexes

The Indexing Paradox

What You Will Learn

The Fundamental Question

Before creating any index, you must answer one fundamental question: Will this index provide more benefit than cost?

Core Principles of Index Selection

•Indexes accelerate SELECT but slow INSERT, UPDATE, DELETE — Every modification operation must update all affected indexes. A table with 10 indexes requires 10 additional write operations per row modification.
•Indexes consume storage space — Each index is a separate data structure, often stored as a B+-tree, that duplicates key columns. For large tables, indexes can consume more space than the base table itself.
•Indexes require memory — To deliver optimal performance, indexes need to be cached in the buffer pool. More indexes mean more competition for limited memory resources.
•Indexes have maintenance overhead — Beyond per-row costs, indexes require periodic maintenance operations like reorganization, rebuilding, and statistics updates.
•Not all queries benefit equally — An index that dramatically improves one query may provide zero benefit to another query on the same table.

The Naive Approach Fails

The Decision Framework

The Index Decision Matrix:

Every proposed index should be evaluated against several key criteria. The matrix below provides a structured approach to this evaluation:

Index Decision Matrix
Factor	Favorable for Indexing (✓)	Unfavorable for Indexing (✗)
Query Frequency	Query executes thousands of times per hour	Query runs once per day in batch job
Base Table Size	Table contains millions of rows	Table has fewer than 1,000 rows
Selectivity	Predicate returns < 5% of rows	Predicate returns > 30% of rows
Write Frequency	Table is updated rarely (< 10% write ratio)	Table has high write frequency (> 50% write ratio)
Column Cardinality	Column has high cardinality (many distinct values)	Column has low cardinality (few distinct values)
Query Performance Gap	Current query takes seconds; target is milliseconds	Query already meets SLA requirements
Available Storage	Disk space is abundant	Storage is constrained and expensive
Memory Availability	Buffer pool can accommodate index	Memory is already over-committed

Applying the Framework:

For each proposed index, score it against each factor. A strong candidate should have favorable conditions for most factors. However, some factors carry more weight than others:

Selectivity is paramount — An index on a non-selective column is almost never beneficial
Write ratio is critical — High-write workloads can turn indexes from assets to liabilities
Query frequency matters — Optimizing a rare query at the expense of common ones is counterproductive

Think of this framework as your first-pass filter. Indexes that fail multiple criteria are unlikely to provide net benefit.

The 80/20 Rule of Indexing

Selectivity: The Most Critical Factor

Selectivity measures what fraction of rows a predicate returns. It is calculated as:

$$\text{Selectivity} = \frac{\text{Number of rows matching predicate}}{\text{Total number of rows}}$$

Selectivity determines whether an index can provide meaningful benefit. Understanding this concept is essential for index decision-making.

Selectivity Guidelines
Selectivity Range	Interpretation	Index Recommendation
< 1%	Highly selective	Strong candidate — Index almost always beneficial
1-5%	Selective	Good candidate — Index likely beneficial for most workloads
5-15%	Moderately selective	Evaluate carefully — Benefits depend on other factors
15-30%	Low selectivity	Unlikely beneficial — Full scan often more efficient
30%	Non-selective	Avoid indexing — Index adds overhead without benefit

Why Selectivity Matters:

Consider a table with 1,000,000 rows. An index lookup typically involves:

Traversing the B+-tree (3-4 I/O operations)
Reading leaf pages for matching keys
Fetching data pages for non-clustered indexes (one I/O per row, potentially)

High Selectivity Examples

•WHERE user_id = 12345 — Unique or near-unique lookups
•WHERE order_date = '2024-01-15' — Specific date in large range
•WHERE email = 'user@example.com' — Unique constraint column
•WHERE transaction_id = 'TXN-789012' — Primary key lookup
•WHERE ssn = '123-45-6789' — Natural unique identifier

Low Selectivity Examples

•WHERE status = 'active' — 90% of users are active
•WHERE gender = 'M' — Binary/low-cardinality column
•WHERE is_deleted = 0 — Boolean flag
•WHERE country = 'USA' — Dominant value in skewed distribution
•WHERE year >= 2020 — Range covering most data

Selectivity is Context-Dependent

Understanding Write Amplification

Consider what happens when you execute a simple INSERT:

INSERT INTO orders (order_id, customer_id, order_date, status, total)
VALUES (12345, 789, '2024-01-15', 'pending', 299.99);

If this table has 8 indexes, the database must:

Write Operations for a Single INSERT

•Insert row into table — Write to the heap or clustered index
•Update primary key index — Insert key into B+-tree
•Update customer_id index — Insert key into B+-tree
•Update order_date index — Insert key into B+-tree
•Update status index — Insert key into B+-tree
•Update total index — Insert key into B+-tree
•Update composite index 1 — Insert keys into B+-tree
•Update composite index 2 — Insert keys into B+-tree
•Write transaction log entries — Log all modifications for recovery

The Amplification Factor:

Quantifying the Impact:

Let's calculate the write overhead for a realistic scenario:

Write Amplification Analysis
Scenario	Base Write Cost	With 5 Indexes	With 10 Indexes	With 20 Indexes
INSERT (1 row)	1 operation	6 operations (6×)	11 operations (11×)	21 operations (21×)
UPDATE (1 indexed column)	1 operation	3 operations (3×)	3 operations (3×)	3 operations (3×)
UPDATE (all indexed columns)	1 operation	11 operations (11×)	21 operations (21×)	41 operations (41×)
DELETE (1 row)	1 operation	6 operations (6×)	11 operations (11×)	21 operations (21×)
Bulk INSERT (10,000 rows)	10K operations	60K operations	110K operations	210K operations

Real-World Impact

The Read/Write Ratio Analysis

The read/write ratio of your workload is a critical factor in indexing decisions. This ratio determines whether the read acceleration from an index outweighs its write penalties.

Analyzing Your Workload:

For any given table, you should understand:

How many SELECT queries hit this table per hour?
How many INSERT/UPDATE/DELETE operations affect this table per hour?
Which SELECT queries would benefit from the proposed index?
Which write operations would be slowed by the proposed index?

Workload Classification and Indexing Strategy
Workload Type	Read/Write Ratio	Index Strategy
Read-Heavy (OLAP)	95% reads	Index liberally. Read optimization is paramount. Write overhead is acceptable.
Read-Dominant	80-95% reads	Index strategically. Focus on high-impact queries. Monitor write latency.
Balanced	50-80% reads	Index conservatively. Justify each index with measurable read benefit.
Write-Dominant	20-50% reads	Index minimally. Only index for critical queries. Consider materialized views.
Write-Heavy (OLTP)	< 20% reads	Index sparingly. Prefer covering indexes over wide indexes. Minimize index count.

Calculating Index ROI:

For a proposed index, calculate the return on investment:

$$\text{Index ROI} = \frac{\text{Read Benefit} \times \text{Read Frequency}}{\text{Write Cost} \times \text{Write Frequency}}$$

Where:

Read Benefit = Time saved per query using the index
Read Frequency = Number of queries that would use the index per time period
Write Cost = Additional time per write operation to maintain the index
Write Frequency = Number of write operations per time period

An index with ROI > 1 provides net benefit; ROI < 1 indicates the index costs more than it saves.

Index ROI Calculation Example
SQL Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Step 1: Analyze current query performance
-- Average response time without index: 450ms
 
-- Step 2: Estimate performance with index (from test environment)
-- Average response time with index: 15ms
-- Read Benefit = 450ms - 15ms = 435ms
 
-- Step 3: Count read frequency
SELECT COUNT(*) as query_count
FROM query_log
WHERE query_text LIKE '%customer_id = %'
  AND timestamp > NOW() - INTERVAL '1 hour';
-- Result: 12,500 queries per hour
 
-- Step 4: Estimate write overhead
-- Benchmark shows index adds 0.3ms per INSERT
-- Write Cost = 0.3ms
 
-- Step 5: Count write frequency
SELECT COUNT(*) as write_count
FROM audit_log
WHERE table_name = 'customers'
  AND operation IN ('INSERT', 'UPDATE', 'DELETE')
  AND timestamp > NOW() - INTERVAL '1 hour';
-- Result: 850 writes per hour
 
-- Step 6: Calculate ROI
-- Read Benefit × Read Frequency = 435ms × 12,500 = 5,437,500ms saved
-- Write Cost × Write Frequency = 0.3ms × 850 = 255ms added
-- ROI = 5,437,500 / 255 = 21,324
 
-- Interpretation: This index returns 21,000× its cost!
-- Strong candidate for creation.

Always Measure, Never Assume

Special Cases: When Standard Rules Don't Apply

While the general framework and selectivity guidelines cover most scenarios, there are important special cases where standard indexing rules require modification.

Foreign key columns should almost always be indexed. Even if the column has low selectivity for SELECT queries, the index is critical for maintaining referential integrity.

Why:

Example:

Consider orders.customer_id → customers.id. If you delete a customer, the database must verify no orders reference that customer. Without an index on orders.customer_id:

Deleting 1 customer scans 10,000,000 orders
Deleting 100 customers scans 1,000,000,000 rows

This can make parent table modifications excruciatingly slow or even cause timeouts.

PostgreSQL Note

Common Indexing Anti-Patterns

Learning what not to do is as important as learning what to do. These anti-patterns appear frequently in production databases and cause predictable problems.

Indexing Anti-Patterns to Avoid

•The 'Index Everything' Approach — Creating indexes on every column 'just in case.' This maximizes write overhead while most indexes go unused. Instead: Create indexes reactively based on observed query patterns.
•Duplicate Indexes — Having both INDEX(a, b) and INDEX(a). The composite index already supports prefix queries. The standalone index wastes space and maintenance. Audit regularly for redundant indexes.
•Boolean/Low-Cardinality Indexes — Indexing columns like is_active, gender, or status with only 2-5 distinct values. These indexes are almost never used because selectivity is too low. Exception: Partial indexes for rare values.
•Wide Composite Indexes Without Prefix Queries — Creating INDEX(a, b, c, d, e) when you only query on columns b, c, or d individually. The index only helps queries that start with column 'a'. Understand composite index usage rules.
•Indexing Expressions Without Expression Indexes — Expecting INDEX(column) to help with WHERE UPPER(column) = 'VALUE'. Functions on the column prevent index usage. Create expression indexes when needed.
•Ignoring NULL Handling — In some databases, NULL values are not indexed (Oracle) or are indexed differently. Queries for NULL values may not benefit from standard indexes. Test your specific DBMS behavior.
•Over-reliance on Query Optimizer Suggestions — Blindly creating every index the query optimizer suggests. Optimizers suggest indexes for individual queries without considering the overall workload or maintenance costs. Evaluate suggestions holistically.

The Maintenance Trap

The Index Creation Decision Checklist

Before creating any index, walk through this checklist. A 'No' answer to any critical question should give you pause.

Pre-Creation Checklist

•Is there a specific query this index will improve? If you can't name the query, you're guessing.
•Have you measured the current query performance? Establish a baseline before optimizing.
•Is the query frequent enough to justify ongoing index maintenance? Optimizing a monthly report at the expense of daily operations is counterproductive.
•What is the selectivity of the predicate? If > 15%, strongly reconsider.
•What is the table's read/write ratio? High-write tables require extra scrutiny.
•Does an existing index already cover this query? Check for composite indexes that might serve.
•Have you tested the index in staging with production-like data? Synthetic data can give misleading results.
•Have you checked for redundant/duplicate indexes? New indexes might make old ones unnecessary.
•Do you have a plan to monitor and validate the index after deployment? Unused indexes should be dropped.
•Have you considered the index creation's impact on production? Large tables may lock during index creation.

Page Complete

1 / 5