Database Management SystemsMulti-Column Indexes

Multi-Column Indexes

LevelIntermediate

Duration60 mins

TopicMulti-Column Indexes

5 / 5

Design Guidelines

From Theory to Production

You've now learned the theoretical foundations of multi-column indexes: composite index structure, column ordering rules, prefix matching behavior, and index intersection mechanics. But theory alone doesn't create efficient production systems.

This final page bridges the gap from theory to practice, providing actionable design guidelines distilled from decades of database engineering experience. These principles help you make sound indexing decisions under real-world constraints: imperfect information, conflicting requirements, and evolving workloads.

Think of this as a checklist and decision framework for the multi-column index design problems you'll encounter throughout your career.

What You Will Learn

By the end of this page, you will have a practical framework for multi-column index design, including a systematic design methodology, common patterns and anti-patterns, maintenance strategies, and a decision checklist for production indexing decisions.

The Multi-Column Index Design Methodology

Effective index design follows a systematic process. Skipping steps leads to suboptimal indexes, wasted resources, and frustrating performance issues down the line.

The 6-Step Index Design Process:

Systematic Index Design Steps

•Identify Critical Queries — Extract the most important and frequent queries through query logs, application code review, and stakeholder input. Focus on the top 20% of queries that account for 80% of load.
•Analyze Query Predicates — For each critical query, document the WHERE clause columns, their predicate types (equality, range, IN, LIKE), and any ORDER BY columns.
•Determine Selectivity — Gather statistics on column cardinality and data distribution. Identify which columns most effectively narrow the search space.
•Apply Column Ordering Rules — Order columns following: equality before range, higher selectivity before lower (among equalities), match frequent query patterns, include sort columns.
•Consolidate with Prefix Matching — Design indexes whose prefixes serve multiple query patterns. Eliminate redundant indexes subsumed by composite prefixes.
•Validate and Monitor — Test with EXPLAIN ANALYZE, deploy, monitor performance metrics, and iterate based on production behavior.

design_process_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- STEP 1: Identify Critical Queries
-- Query log analysis reveals these top queries for 'orders' table:
-- Q1 (35%): SELECT * FROM orders WHERE customer_id = ? ORDER BY order_date DESC LIMIT 10
-- Q2 (25%): SELECT * FROM orders WHERE customer_id = ? AND status = ?
-- Q3 (20%): SELECT * FROM orders WHERE order_date BETWEEN ? AND ?
-- Q4 (15%): SELECT * FROM orders WHERE customer_id = ? AND order_date BETWEEN ? AND ?
-- Q5 (5%):  SELECT SUM(total) FROM orders WHERE product_id = ?
 
-- STEP 2: Analyze Query Predicates
-- Q1: customer_id (equality), order_date (sort only)
-- Q2: customer_id (equality), status (equality)
-- Q3: order_date (range)
-- Q4: customer_id (equality), order_date (range)
-- Q5: product_id (equality)
 
-- STEP 3: Determine Selectivity (from statistics)
-- customer_id: 50,000 distinct (~0.002% per value) - HIGH selectivity
-- status: 5 distinct (20% per value) - LOW selectivity
-- order_date: 1,000 distinct (0.1% per value) - MEDIUM selectivity
-- product_id: 10,000 distinct (0.01% per value) - MEDIUM-HIGH selectivity
 
-- STEP 4 & 5: Design Composite Indexes
-- Primary index (serving Q1, Q2, Q4):
CREATE INDEX idx_orders_customer ON orders (
    customer_id,     -- Equality, highest selectivity, in all queries
    status,          -- Equality (Q2), before date range
    order_date DESC  -- Range (Q4), sort (Q1)
);
 
-- This single index serves:
-- Q1: Uses (customer_id) prefix + order_date sort ✅
-- Q2: Uses (customer_id, status) prefix ✅
-- Q4: Uses (customer_id) prefix + order_date range ✅
 
-- Separate index for Q3 (different access pattern):
CREATE INDEX idx_orders_date ON orders (order_date);
 
-- Separate index for Q5 (different access pattern):
CREATE INDEX idx_orders_product ON orders (product_id);
 
-- STEP 6: Validate
EXPLAIN ANALYZE SELECT * FROM orders 
WHERE customer_id = 12345 ORDER BY order_date DESC LIMIT 10;
-- Verify: Uses idx_orders_customer, no filesort

Core Design Principles

These principles guide multi-column index design decisions:

Principle 1: Equality Predicates Before Range Predicates

This is the most important rule. Range predicates (>, <, BETWEEN, LIKE 'abc%') stop the index from being used for subsequent columns.

Why it matters:

Equality predicates define exact positions in the B+-tree
Range predicates define intervals that may span many branches
Columns after a range cannot use the sorted property effectively

1
2
3
4
5
6
7
8
9
10
11
-- Query: WHERE status = 'active' AND created_at > '2023-01-01' AND priority = 1
 
-- ❌ WRONG: Range in the middle
CREATE INDEX idx_bad ON tasks (status, created_at, priority);
-- Only uses: status, created_at
-- priority check is a filter, not index scan
 
-- ✅ CORRECT: All equalities first
CREATE INDEX idx_good ON tasks (status, priority, created_at);
-- Uses: status, priority (both equality), then created_at (range)
-- All columns participate in index navigation

Common Design Patterns

Certain index design patterns appear repeatedly across different applications. Recognizing these patterns accelerates design decisions.

Multi-Column Index Design Patterns
Pattern	Use Case	Index Design
Tenant + Entity	Multi-tenant SaaS	(tenant_id, entity_type, entity_id)
Entity + Timeline	Activity feeds, histories	(entity_id, created_at DESC)
Category + Sort	Product listings, search results	(category_id, sort_column, id)
Status + Queue	Job queues, task processing	(status, priority DESC, created_at)
Geo + Time	Location-based services	(region_id, timestamp DESC)
Composite Unique	Business rule enforcement	UNIQUE (org_id, email)

design_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- PATTERN 1: Tenant + Entity (Multi-tenant)
-- Every query is scoped to a tenant; tenant_id is always first
CREATE INDEX idx_tenant_users ON users (tenant_id, user_type, last_active DESC);
CREATE INDEX idx_tenant_orders ON orders (tenant_id, status, created_at DESC);
CREATE INDEX idx_tenant_products ON products (tenant_id, category_id, name);
 
-- PATTERN 2: Entity + Timeline (Activity/History)
-- Retrieve recent activity for an entity
CREATE INDEX idx_user_activity ON activity_log (
    user_id, 
    created_at DESC
) INCLUDE (action, metadata);
 
-- PATTERN 3: Category + Sort (Listings)
-- Paginated, sorted results within categories
CREATE INDEX idx_product_listing ON products (
    category_id,      -- Filter
    price ASC,        -- Sort for "low to high"
    product_id        -- Tie-breaker for stable pagination
);
 
CREATE INDEX idx_product_listing_desc ON products (
    category_id,
    price DESC,       -- Sort for "high to low"
    product_id
);
 
-- PATTERN 4: Status + Queue (Job Processing)
-- Workers poll for pending jobs
CREATE INDEX idx_job_queue ON jobs (
    status,           -- Filter: 'pending'
    priority DESC,    -- High priority first
    created_at ASC    -- FIFO within same priority
) WHERE status = 'pending';  -- Partial index
 
-- PATTERN 5: Compound Unique Constraint
-- Business rule: one email per organization
CREATE UNIQUE INDEX idx_unique_email ON users (organization_id, email);
 
-- PATTERN 6: Soft Delete Support
-- Queries always filter on deleted = false
CREATE INDEX idx_active_users ON users (
    organization_id,
    role,
    created_at DESC
) WHERE deleted_at IS NULL;  -- Partial index for active records only

Anti-Patterns to Avoid

Recognizing common mistakes helps avoid them. These anti-patterns represent recurring errors in multi-column index design:

Multi-Column Index Anti-Patterns

•The Kitchen Sink Index — Including every frequently-queried column in one massive index. Results in bloated indexes that slow writes and may not help most queries.
•Schema-Order Indexing — Creating indexes like (col1, col2, col3) because that's the column order in CREATE TABLE, without analyzing query patterns.
•Duplicate Prefix Indexes — Having (A) and (A, B) as separate indexes. The composite subsumes the single-column index.
•Low-Selectivity Leading Column — Placing columns with few distinct values (e.g., boolean flags, status) as the leading column when they're not used in all queries.
•Ignoring Range Predicate Position — Placing range predicate columns before equality columns, wasting index potential.
•Index for Every Query — Creating a dedicated composite index for each query, leading to index explosion and write performance degradation.

anti_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- ANTI-PATTERN 1: Kitchen Sink Index
-- ❌ WRONG: Include everything
CREATE INDEX idx_orders_everything ON orders (
    customer_id, order_date, status, product_id, 
    quantity, total, shipping_address, created_at
);
-- Size: Massive (possibly larger than table!)
-- Maintenance: Extremely slow writes
-- Benefit: Marginal (most queries don't use all columns)
 
-- ✅ BETTER: Focused indexes for specific patterns
CREATE INDEX idx_orders_customer ON orders (customer_id, order_date);
CREATE INDEX idx_orders_status ON orders (status, created_at);
 
 
-- ANTI-PATTERN 2: Duplicate Prefixes
-- ❌ REDUNDANT:
CREATE INDEX idx_a ON orders (customer_id);
CREATE INDEX idx_ab ON orders (customer_id, order_date);
CREATE INDEX idx_abc ON orders (customer_id, order_date, status);
-- idx_a and idx_ab are completely redundant!
 
-- ✅ EFFICIENT: Single composite serves all prefixes
CREATE INDEX idx_abc ON orders (customer_id, order_date, status);
 
 
-- ANTI-PATTERN 3: Low-Selectivity Leading Column
-- Query: WHERE is_active = true AND category_id = ?
-- ❌ WRONG: Boolean first
CREATE INDEX idx_bad ON products (is_active, category_id);
-- is_active has only 2 values; category_id narrowing is suboptimal
 
-- ✅ BETTER: Higher-selectivity column first
CREATE INDEX idx_good ON products (category_id, is_active);
-- Or use partial index:
CREATE INDEX idx_active ON products (category_id) WHERE is_active = true;
 
 
-- ANTI-PATTERN 4: Index Per Query
-- ❌ EXCESSIVE:
CREATE INDEX idx_q1 ON orders (customer_id);
CREATE INDEX idx_q2 ON orders (customer_id, status);
CREATE INDEX idx_q3 ON orders (customer_id, order_date);
CREATE INDEX idx_q4 ON orders (customer_id, status, order_date);
-- 4 overlapping indexes!
 
-- ✅ CONSOLIDATED:
CREATE INDEX idx_orders_main ON orders (customer_id, status, order_date);
-- Serves Q1, Q2, Q3 (partial), Q4 via prefix matching
-- Only need one additional index if Q3's order_date range needs optimization

Index Column Count Guidelines

How many columns should a composite index include? There's no universal answer, but these guidelines help:

The Sweet Spot: 2-4 Columns

Most effective composite indexes have 2-4 key columns. Beyond this:

Each additional column provides diminishing selectivity returns
Index size increases, slowing writes and scans
Query patterns rarely use more than 3-4 predicates together

Column Count Impact Analysis
Columns	Typical Use	Considerations
2	Most common patterns	Simple, efficient, easy to maintain
3	Common for complex queries	Good balance of coverage and efficiency
4	Frequent in OLTP	May be optimal for high-frequency queries
5+	Covering indexes only	Include columns as INCLUDE, not key columns
7+	Almost never appropriate	Diminishing returns; consider query redesign

column_count_strategy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Key columns vs INCLUDE columns
 
-- Query: SELECT name, email FROM users 
--        WHERE tenant_id = ? AND status = 'active' AND role = 'admin'
--        ORDER BY created_at DESC
 
-- Option 1: All columns in key (5 key columns)
CREATE INDEX idx_v1 ON users (
    tenant_id, status, role, created_at DESC, name, email
);
-- ❌ Email changes require full index entry repositioning
 
-- Option 2: Filter + Sort in key, output in INCLUDE (3 key columns)
CREATE INDEX idx_v2 ON users (tenant_id, status, role, created_at DESC)
INCLUDE (name, email);
-- ✅ Email changes only update the INCLUDE portion
-- ✅ Key portion is more compact, faster to traverse
 
-- Option 3: Most selective in key, rest as filters (2 key columns)
CREATE INDEX idx_v3 ON users (tenant_id, created_at DESC)
INCLUDE (status, role, name, email);
-- ⚠️ Status and role become filters, not seeks
-- May be acceptable if most users are 'active' admins
 
 
-- Guidelines for choosing:
-- 1. Key columns: Those used in WHERE = or ORDER BY (up to 3-4)
-- 2. INCLUDE columns: Those only needed in SELECT output
-- 3. Filter columns: Low-selectivity predicates (may go in INCLUDE)

The INCLUDE Optimization

Modern databases (PostgreSQL, SQL Server) support INCLUDE clauses that add columns to leaf nodes without affecting B+-tree ordering. Use INCLUDE for covering index columns that aren't used in predicates or sorting. This reduces key size and minimizes index maintenance when those columns change.

Maintenance and Monitoring

Index design is not a one-time activity. Workloads evolve, data grows, and indexes that were optimal may become suboptimal. Implement ongoing monitoring and maintenance:

Index Maintenance Activities

•Monitor index usage — Track which indexes are used, how often, and for which operations. Remove unused indexes.
•Analyze slow queries — Regularly review slow query logs. Indexes may need adjustment as query patterns change.
•Update statistics — Stale statistics lead to poor optimizer decisions. Schedule regular ANALYZE/UPDATE STATISTICS.
•Rebuild fragmented indexes — Heavy write activity causes fragmentation. Periodic rebuilds maintain efficiency.
•Review after schema changes — New columns, changed queries, or altered data patterns may require index redesign.

index_monitoring.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
-- PostgreSQL: Find unused indexes
SELECT 
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS size,
    idx_scan AS index_scans
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE NOT indisunique  -- Exclude unique constraints
  AND idx_scan < 50     -- Rarely used
  AND pg_relation_size(i.indexrelid) > 1024*1024  -- > 1MB
ORDER BY pg_relation_size(i.indexrelid) DESC;
 
 
-- MySQL: Index usage statistics
SELECT 
    object_schema,
    object_name,
    index_name,
    count_star AS total_accesses,
    count_read AS reads,
    count_write AS writes
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE object_schema = 'your_database'
ORDER BY count_star DESC;
 
 
-- SQL Server: Index usage and missing index suggestions
-- Usage stats
SELECT 
    OBJECT_NAME(s.object_id) AS TableName,
    i.name AS IndexName,
    user_seeks + user_scans + user_lookups AS Reads,
    user_updates AS Writes,
    CASE WHEN user_seeks + user_scans + user_lookups = 0 
         THEN 'UNUSED' ELSE 'USED' END AS Status
FROM sys.dm_db_index_usage_stats s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE database_id = DB_ID()
ORDER BY Reads DESC;
 
-- Missing index suggestions
SELECT TOP 20
    CONVERT(DECIMAL(18,2), migs.avg_total_user_cost * migs.avg_user_impact * 
            (migs.user_seeks + migs.user_scans)) AS improvement_measure,
    mid.statement AS table_name,
    mid.equality_columns,
    mid.inequality_columns,
    mid.included_columns
FROM sys.dm_db_missing_index_groups mig
JOIN sys.dm_db_missing_index_group_stats migs 
    ON mig.index_group_handle = migs.group_handle
JOIN sys.dm_db_missing_index_details mid 
    ON mig.index_handle = mid.index_handle
ORDER BY improvement_measure DESC;
 
 
-- Index fragmentation check (SQL Server)
SELECT 
    OBJECT_NAME(ips.object_id) AS TableName,
    i.name AS IndexName,
    ips.avg_fragmentation_in_percent,
    ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') ips
JOIN sys.indexes i ON ips.object_id = i.object_id AND ips.index_id = i.index_id
WHERE ips.avg_fragmentation_in_percent > 30
  AND ips.page_count > 1000
ORDER BY ips.avg_fragmentation_in_percent DESC;

Multi-Column Index Decision Checklist

Use this checklist when designing or reviewing multi-column indexes:

Pre-Creation Checklist

•☐ Have I identified the specific queries this index will serve?
•☐ Are equality predicate columns ordered before range predicate columns?
•☐ For equality columns, are higher-selectivity columns placed first?
•☐ Does the column order match the most common query pattern's filter order?
•☐ If ORDER BY is needed, do the sort directions match the index?
•☐ Can this index's prefixes serve other queries (consolidation opportunity)?
•☐ Does an existing index already cover these columns as a prefix?
•☐ Have I considered a covering index with INCLUDE columns?
•☐ Is the column count reasonable (typically ≤4 key columns)?
•☐ Have I validated with EXPLAIN that the query uses the index as expected?

Post-Creation Checklist

•☐ Is the index being used? (Check usage statistics)
•☐ Has query performance improved as expected?
•☐ Are write operations still acceptably fast?
•☐ Is the index size reasonable?
•☐ Have I removed any redundant indexes this new index subsumes?
•☐ Are statistics up to date for accurate cost estimation?
•☐ Have I documented why this index was created and for which queries?

Document Your Indexes

Always document the rationale for each index: which queries it serves, why the columns are ordered that way, and any trade-offs made. Future maintainers (including yourself) will thank you when revisiting indexing decisions months later.

Summary: Multi-Column Index Mastery

You've now completed the comprehensive study of multi-column indexes. Here's the synthesized knowledge:

Module Key Takeaways

•Composite indexes create lexicographic ordering across multiple columns, enabling efficient multi-predicate queries.
•Column order is critical — Equality before range, higher selectivity first, match common query patterns.
•Prefix matching allows one index to serve multiple query patterns, reducing index proliferation.
•Index intersection combines multiple indexes using set operations, providing flexibility for diverse queries.
•Design systematically — Analyze queries, selectivity, and access patterns before creating indexes.
•Avoid anti-patterns — Don't create kitchen-sink indexes, duplicate prefixes, or index every query separately.
•Monitor and maintain — Track usage, remove unused indexes, keep statistics current, and iterate as workloads evolve.

Quick Reference: Core Rules
Rule	Application
Leftmost prefix	Queries must use contiguous prefix from left
Equality before range	= columns precede <, >, BETWEEN columns
Range stops index	Columns after range predicates aren't used for seeks
Sort direction matters	ORDER BY must match or fully invert index order
Prefix subsumption	Index (A,B,C) makes separate (A) or (A,B) redundant
2-4 key columns	Sweet spot for most composite indexes

Module Complete

Congratulations! You've mastered multi-column index design for database systems. You understand not just the mechanics, but the strategic thinking required to design indexes that maximize query performance while minimizing overhead. This knowledge is foundational for database performance engineering at any scale.

5 / 5

Loading learning content...

Database Management SystemsMulti-Column Indexes

Multi-Column Indexes

LevelIntermediate

Duration60 mins

TopicMulti-Column Indexes

5 / 5

Design Guidelines

From Theory to Production

Think of this as a checklist and decision framework for the multi-column index design problems you'll encounter throughout your career.

What You Will Learn

The Multi-Column Index Design Methodology

Effective index design follows a systematic process. Skipping steps leads to suboptimal indexes, wasted resources, and frustrating performance issues down the line.

The 6-Step Index Design Process:

Systematic Index Design Steps

•Identify Critical Queries — Extract the most important and frequent queries through query logs, application code review, and stakeholder input. Focus on the top 20% of queries that account for 80% of load.
•Analyze Query Predicates — For each critical query, document the WHERE clause columns, their predicate types (equality, range, IN, LIKE), and any ORDER BY columns.
•Determine Selectivity — Gather statistics on column cardinality and data distribution. Identify which columns most effectively narrow the search space.
•Apply Column Ordering Rules — Order columns following: equality before range, higher selectivity before lower (among equalities), match frequent query patterns, include sort columns.
•Consolidate with Prefix Matching — Design indexes whose prefixes serve multiple query patterns. Eliminate redundant indexes subsumed by composite prefixes.
•Validate and Monitor — Test with EXPLAIN ANALYZE, deploy, monitor performance metrics, and iterate based on production behavior.

design_process_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- STEP 1: Identify Critical Queries
-- Query log analysis reveals these top queries for 'orders' table:
-- Q1 (35%): SELECT * FROM orders WHERE customer_id = ? ORDER BY order_date DESC LIMIT 10
-- Q2 (25%): SELECT * FROM orders WHERE customer_id = ? AND status = ?
-- Q3 (20%): SELECT * FROM orders WHERE order_date BETWEEN ? AND ?
-- Q4 (15%): SELECT * FROM orders WHERE customer_id = ? AND order_date BETWEEN ? AND ?
-- Q5 (5%):  SELECT SUM(total) FROM orders WHERE product_id = ?
 
-- STEP 2: Analyze Query Predicates
-- Q1: customer_id (equality), order_date (sort only)
-- Q2: customer_id (equality), status (equality)
-- Q3: order_date (range)
-- Q4: customer_id (equality), order_date (range)
-- Q5: product_id (equality)
 
-- STEP 3: Determine Selectivity (from statistics)
-- customer_id: 50,000 distinct (~0.002% per value) - HIGH selectivity
-- status: 5 distinct (20% per value) - LOW selectivity
-- order_date: 1,000 distinct (0.1% per value) - MEDIUM selectivity
-- product_id: 10,000 distinct (0.01% per value) - MEDIUM-HIGH selectivity
 
-- STEP 4 & 5: Design Composite Indexes
-- Primary index (serving Q1, Q2, Q4):
CREATE INDEX idx_orders_customer ON orders (
    customer_id,     -- Equality, highest selectivity, in all queries
    status,          -- Equality (Q2), before date range
    order_date DESC  -- Range (Q4), sort (Q1)
);
 
-- This single index serves:
-- Q1: Uses (customer_id) prefix + order_date sort ✅
-- Q2: Uses (customer_id, status) prefix ✅
-- Q4: Uses (customer_id) prefix + order_date range ✅
 
-- Separate index for Q3 (different access pattern):
CREATE INDEX idx_orders_date ON orders (order_date);
 
-- Separate index for Q5 (different access pattern):
CREATE INDEX idx_orders_product ON orders (product_id);
 
-- STEP 6: Validate
EXPLAIN ANALYZE SELECT * FROM orders 
WHERE customer_id = 12345 ORDER BY order_date DESC LIMIT 10;
-- Verify: Uses idx_orders_customer, no filesort

Core Design Principles

These principles guide multi-column index design decisions:

Principle 1: Equality Predicates Before Range Predicates

This is the most important rule. Range predicates (>, <, BETWEEN, LIKE 'abc%') stop the index from being used for subsequent columns.

Why it matters:

Equality predicates define exact positions in the B+-tree
Range predicates define intervals that may span many branches
Columns after a range cannot use the sorted property effectively

1
2
3
4
5
6
7
8
9
10
11
-- Query: WHERE status = 'active' AND created_at > '2023-01-01' AND priority = 1
 
-- ❌ WRONG: Range in the middle
CREATE INDEX idx_bad ON tasks (status, created_at, priority);
-- Only uses: status, created_at
-- priority check is a filter, not index scan
 
-- ✅ CORRECT: All equalities first
CREATE INDEX idx_good ON tasks (status, priority, created_at);
-- Uses: status, priority (both equality), then created_at (range)
-- All columns participate in index navigation

Common Design Patterns

Certain index design patterns appear repeatedly across different applications. Recognizing these patterns accelerates design decisions.

Multi-Column Index Design Patterns
Pattern	Use Case	Index Design
Tenant + Entity	Multi-tenant SaaS	(tenant_id, entity_type, entity_id)
Entity + Timeline	Activity feeds, histories	(entity_id, created_at DESC)
Category + Sort	Product listings, search results	(category_id, sort_column, id)
Status + Queue	Job queues, task processing	(status, priority DESC, created_at)
Geo + Time	Location-based services	(region_id, timestamp DESC)
Composite Unique	Business rule enforcement	UNIQUE (org_id, email)

design_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- PATTERN 1: Tenant + Entity (Multi-tenant)
-- Every query is scoped to a tenant; tenant_id is always first
CREATE INDEX idx_tenant_users ON users (tenant_id, user_type, last_active DESC);
CREATE INDEX idx_tenant_orders ON orders (tenant_id, status, created_at DESC);
CREATE INDEX idx_tenant_products ON products (tenant_id, category_id, name);
 
-- PATTERN 2: Entity + Timeline (Activity/History)
-- Retrieve recent activity for an entity
CREATE INDEX idx_user_activity ON activity_log (
    user_id, 
    created_at DESC
) INCLUDE (action, metadata);
 
-- PATTERN 3: Category + Sort (Listings)
-- Paginated, sorted results within categories
CREATE INDEX idx_product_listing ON products (
    category_id,      -- Filter
    price ASC,        -- Sort for "low to high"
    product_id        -- Tie-breaker for stable pagination
);
 
CREATE INDEX idx_product_listing_desc ON products (
    category_id,
    price DESC,       -- Sort for "high to low"
    product_id
);
 
-- PATTERN 4: Status + Queue (Job Processing)
-- Workers poll for pending jobs
CREATE INDEX idx_job_queue ON jobs (
    status,           -- Filter: 'pending'
    priority DESC,    -- High priority first
    created_at ASC    -- FIFO within same priority
) WHERE status = 'pending';  -- Partial index
 
-- PATTERN 5: Compound Unique Constraint
-- Business rule: one email per organization
CREATE UNIQUE INDEX idx_unique_email ON users (organization_id, email);
 
-- PATTERN 6: Soft Delete Support
-- Queries always filter on deleted = false
CREATE INDEX idx_active_users ON users (
    organization_id,
    role,
    created_at DESC
) WHERE deleted_at IS NULL;  -- Partial index for active records only

Anti-Patterns to Avoid

Recognizing common mistakes helps avoid them. These anti-patterns represent recurring errors in multi-column index design:

Multi-Column Index Anti-Patterns

•The Kitchen Sink Index — Including every frequently-queried column in one massive index. Results in bloated indexes that slow writes and may not help most queries.
•Schema-Order Indexing — Creating indexes like (col1, col2, col3) because that's the column order in CREATE TABLE, without analyzing query patterns.
•Duplicate Prefix Indexes — Having (A) and (A, B) as separate indexes. The composite subsumes the single-column index.
•Low-Selectivity Leading Column — Placing columns with few distinct values (e.g., boolean flags, status) as the leading column when they're not used in all queries.
•Ignoring Range Predicate Position — Placing range predicate columns before equality columns, wasting index potential.
•Index for Every Query — Creating a dedicated composite index for each query, leading to index explosion and write performance degradation.

anti_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- ANTI-PATTERN 1: Kitchen Sink Index
-- ❌ WRONG: Include everything
CREATE INDEX idx_orders_everything ON orders (
    customer_id, order_date, status, product_id, 
    quantity, total, shipping_address, created_at
);
-- Size: Massive (possibly larger than table!)
-- Maintenance: Extremely slow writes
-- Benefit: Marginal (most queries don't use all columns)
 
-- ✅ BETTER: Focused indexes for specific patterns
CREATE INDEX idx_orders_customer ON orders (customer_id, order_date);
CREATE INDEX idx_orders_status ON orders (status, created_at);
 
 
-- ANTI-PATTERN 2: Duplicate Prefixes
-- ❌ REDUNDANT:
CREATE INDEX idx_a ON orders (customer_id);
CREATE INDEX idx_ab ON orders (customer_id, order_date);
CREATE INDEX idx_abc ON orders (customer_id, order_date, status);
-- idx_a and idx_ab are completely redundant!
 
-- ✅ EFFICIENT: Single composite serves all prefixes
CREATE INDEX idx_abc ON orders (customer_id, order_date, status);
 
 
-- ANTI-PATTERN 3: Low-Selectivity Leading Column
-- Query: WHERE is_active = true AND category_id = ?
-- ❌ WRONG: Boolean first
CREATE INDEX idx_bad ON products (is_active, category_id);
-- is_active has only 2 values; category_id narrowing is suboptimal
 
-- ✅ BETTER: Higher-selectivity column first
CREATE INDEX idx_good ON products (category_id, is_active);
-- Or use partial index:
CREATE INDEX idx_active ON products (category_id) WHERE is_active = true;
 
 
-- ANTI-PATTERN 4: Index Per Query
-- ❌ EXCESSIVE:
CREATE INDEX idx_q1 ON orders (customer_id);
CREATE INDEX idx_q2 ON orders (customer_id, status);
CREATE INDEX idx_q3 ON orders (customer_id, order_date);
CREATE INDEX idx_q4 ON orders (customer_id, status, order_date);
-- 4 overlapping indexes!
 
-- ✅ CONSOLIDATED:
CREATE INDEX idx_orders_main ON orders (customer_id, status, order_date);
-- Serves Q1, Q2, Q3 (partial), Q4 via prefix matching
-- Only need one additional index if Q3's order_date range needs optimization

Index Column Count Guidelines

How many columns should a composite index include? There's no universal answer, but these guidelines help:

The Sweet Spot: 2-4 Columns

Most effective composite indexes have 2-4 key columns. Beyond this:

Each additional column provides diminishing selectivity returns
Index size increases, slowing writes and scans
Query patterns rarely use more than 3-4 predicates together

Column Count Impact Analysis
Columns	Typical Use	Considerations
2	Most common patterns	Simple, efficient, easy to maintain
3	Common for complex queries	Good balance of coverage and efficiency
4	Frequent in OLTP	May be optimal for high-frequency queries
5+	Covering indexes only	Include columns as INCLUDE, not key columns
7+	Almost never appropriate	Diminishing returns; consider query redesign

column_count_strategy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Key columns vs INCLUDE columns
 
-- Query: SELECT name, email FROM users 
--        WHERE tenant_id = ? AND status = 'active' AND role = 'admin'
--        ORDER BY created_at DESC
 
-- Option 1: All columns in key (5 key columns)
CREATE INDEX idx_v1 ON users (
    tenant_id, status, role, created_at DESC, name, email
);
-- ❌ Email changes require full index entry repositioning
 
-- Option 2: Filter + Sort in key, output in INCLUDE (3 key columns)
CREATE INDEX idx_v2 ON users (tenant_id, status, role, created_at DESC)
INCLUDE (name, email);
-- ✅ Email changes only update the INCLUDE portion
-- ✅ Key portion is more compact, faster to traverse
 
-- Option 3: Most selective in key, rest as filters (2 key columns)
CREATE INDEX idx_v3 ON users (tenant_id, created_at DESC)
INCLUDE (status, role, name, email);
-- ⚠️ Status and role become filters, not seeks
-- May be acceptable if most users are 'active' admins
 
 
-- Guidelines for choosing:
-- 1. Key columns: Those used in WHERE = or ORDER BY (up to 3-4)
-- 2. INCLUDE columns: Those only needed in SELECT output
-- 3. Filter columns: Low-selectivity predicates (may go in INCLUDE)

The INCLUDE Optimization

Maintenance and Monitoring

Index design is not a one-time activity. Workloads evolve, data grows, and indexes that were optimal may become suboptimal. Implement ongoing monitoring and maintenance:

Index Maintenance Activities

•Monitor index usage — Track which indexes are used, how often, and for which operations. Remove unused indexes.
•Analyze slow queries — Regularly review slow query logs. Indexes may need adjustment as query patterns change.
•Update statistics — Stale statistics lead to poor optimizer decisions. Schedule regular ANALYZE/UPDATE STATISTICS.
•Rebuild fragmented indexes — Heavy write activity causes fragmentation. Periodic rebuilds maintain efficiency.
•Review after schema changes — New columns, changed queries, or altered data patterns may require index redesign.

index_monitoring.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
-- PostgreSQL: Find unused indexes
SELECT 
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS size,
    idx_scan AS index_scans
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE NOT indisunique  -- Exclude unique constraints
  AND idx_scan < 50     -- Rarely used
  AND pg_relation_size(i.indexrelid) > 1024*1024  -- > 1MB
ORDER BY pg_relation_size(i.indexrelid) DESC;
 
 
-- MySQL: Index usage statistics
SELECT 
    object_schema,
    object_name,
    index_name,
    count_star AS total_accesses,
    count_read AS reads,
    count_write AS writes
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE object_schema = 'your_database'
ORDER BY count_star DESC;
 
 
-- SQL Server: Index usage and missing index suggestions
-- Usage stats
SELECT 
    OBJECT_NAME(s.object_id) AS TableName,
    i.name AS IndexName,
    user_seeks + user_scans + user_lookups AS Reads,
    user_updates AS Writes,
    CASE WHEN user_seeks + user_scans + user_lookups = 0 
         THEN 'UNUSED' ELSE 'USED' END AS Status
FROM sys.dm_db_index_usage_stats s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE database_id = DB_ID()
ORDER BY Reads DESC;
 
-- Missing index suggestions
SELECT TOP 20
    CONVERT(DECIMAL(18,2), migs.avg_total_user_cost * migs.avg_user_impact * 
            (migs.user_seeks + migs.user_scans)) AS improvement_measure,
    mid.statement AS table_name,
    mid.equality_columns,
    mid.inequality_columns,
    mid.included_columns
FROM sys.dm_db_missing_index_groups mig
JOIN sys.dm_db_missing_index_group_stats migs 
    ON mig.index_group_handle = migs.group_handle
JOIN sys.dm_db_missing_index_details mid 
    ON mig.index_handle = mid.index_handle
ORDER BY improvement_measure DESC;
 
 
-- Index fragmentation check (SQL Server)
SELECT 
    OBJECT_NAME(ips.object_id) AS TableName,
    i.name AS IndexName,
    ips.avg_fragmentation_in_percent,
    ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') ips
JOIN sys.indexes i ON ips.object_id = i.object_id AND ips.index_id = i.index_id
WHERE ips.avg_fragmentation_in_percent > 30
  AND ips.page_count > 1000
ORDER BY ips.avg_fragmentation_in_percent DESC;

Multi-Column Index Decision Checklist

Use this checklist when designing or reviewing multi-column indexes:

Pre-Creation Checklist

•☐ Have I identified the specific queries this index will serve?
•☐ Are equality predicate columns ordered before range predicate columns?
•☐ For equality columns, are higher-selectivity columns placed first?
•☐ Does the column order match the most common query pattern's filter order?
•☐ If ORDER BY is needed, do the sort directions match the index?
•☐ Can this index's prefixes serve other queries (consolidation opportunity)?
•☐ Does an existing index already cover these columns as a prefix?
•☐ Have I considered a covering index with INCLUDE columns?
•☐ Is the column count reasonable (typically ≤4 key columns)?
•☐ Have I validated with EXPLAIN that the query uses the index as expected?

Post-Creation Checklist

•☐ Is the index being used? (Check usage statistics)
•☐ Has query performance improved as expected?
•☐ Are write operations still acceptably fast?
•☐ Is the index size reasonable?
•☐ Have I removed any redundant indexes this new index subsumes?
•☐ Are statistics up to date for accurate cost estimation?
•☐ Have I documented why this index was created and for which queries?

Document Your Indexes

Summary: Multi-Column Index Mastery

You've now completed the comprehensive study of multi-column indexes. Here's the synthesized knowledge:

Module Key Takeaways

•Composite indexes create lexicographic ordering across multiple columns, enabling efficient multi-predicate queries.
•Column order is critical — Equality before range, higher selectivity first, match common query patterns.
•Prefix matching allows one index to serve multiple query patterns, reducing index proliferation.
•Index intersection combines multiple indexes using set operations, providing flexibility for diverse queries.
•Design systematically — Analyze queries, selectivity, and access patterns before creating indexes.
•Avoid anti-patterns — Don't create kitchen-sink indexes, duplicate prefixes, or index every query separately.
•Monitor and maintain — Track usage, remove unused indexes, keep statistics current, and iterate as workloads evolve.

Quick Reference: Core Rules
Rule	Application
Leftmost prefix	Queries must use contiguous prefix from left
Equality before range	= columns precede <, >, BETWEEN columns
Range stops index	Columns after range predicates aren't used for seeks
Sort direction matters	ORDER BY must match or fully invert index order
Prefix subsumption	Index (A,B,C) makes separate (A) or (A,B) redundant
2-4 key columns	Sweet spot for most composite indexes

Module Complete

5 / 5