Loading learning content...
You've now learned the theoretical foundations of multi-column indexes: composite index structure, column ordering rules, prefix matching behavior, and index intersection mechanics. But theory alone doesn't create efficient production systems.
This final page bridges the gap from theory to practice, providing actionable design guidelines distilled from decades of database engineering experience. These principles help you make sound indexing decisions under real-world constraints: imperfect information, conflicting requirements, and evolving workloads.
Think of this as a checklist and decision framework for the multi-column index design problems you'll encounter throughout your career.
By the end of this page, you will have a practical framework for multi-column index design, including a systematic design methodology, common patterns and anti-patterns, maintenance strategies, and a decision checklist for production indexing decisions.
Effective index design follows a systematic process. Skipping steps leads to suboptimal indexes, wasted resources, and frustrating performance issues down the line.
The 6-Step Index Design Process:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
-- STEP 1: Identify Critical Queries-- Query log analysis reveals these top queries for 'orders' table:-- Q1 (35%): SELECT * FROM orders WHERE customer_id = ? ORDER BY order_date DESC LIMIT 10-- Q2 (25%): SELECT * FROM orders WHERE customer_id = ? AND status = ?-- Q3 (20%): SELECT * FROM orders WHERE order_date BETWEEN ? AND ?-- Q4 (15%): SELECT * FROM orders WHERE customer_id = ? AND order_date BETWEEN ? AND ?-- Q5 (5%): SELECT SUM(total) FROM orders WHERE product_id = ? -- STEP 2: Analyze Query Predicates-- Q1: customer_id (equality), order_date (sort only)-- Q2: customer_id (equality), status (equality)-- Q3: order_date (range)-- Q4: customer_id (equality), order_date (range)-- Q5: product_id (equality) -- STEP 3: Determine Selectivity (from statistics)-- customer_id: 50,000 distinct (~0.002% per value) - HIGH selectivity-- status: 5 distinct (20% per value) - LOW selectivity-- order_date: 1,000 distinct (0.1% per value) - MEDIUM selectivity-- product_id: 10,000 distinct (0.01% per value) - MEDIUM-HIGH selectivity -- STEP 4 & 5: Design Composite Indexes-- Primary index (serving Q1, Q2, Q4):CREATE INDEX idx_orders_customer ON orders ( customer_id, -- Equality, highest selectivity, in all queries status, -- Equality (Q2), before date range order_date DESC -- Range (Q4), sort (Q1)); -- This single index serves:-- Q1: Uses (customer_id) prefix + order_date sort ✅-- Q2: Uses (customer_id, status) prefix ✅-- Q4: Uses (customer_id) prefix + order_date range ✅ -- Separate index for Q3 (different access pattern):CREATE INDEX idx_orders_date ON orders (order_date); -- Separate index for Q5 (different access pattern):CREATE INDEX idx_orders_product ON orders (product_id); -- STEP 6: ValidateEXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345 ORDER BY order_date DESC LIMIT 10;-- Verify: Uses idx_orders_customer, no filesortThese principles guide multi-column index design decisions:
Principle 1: Equality Predicates Before Range Predicates
This is the most important rule. Range predicates (>, <, BETWEEN, LIKE 'abc%') stop the index from being used for subsequent columns.
Why it matters:
1234567891011
-- Query: WHERE status = 'active' AND created_at > '2023-01-01' AND priority = 1 -- ❌ WRONG: Range in the middleCREATE INDEX idx_bad ON tasks (status, created_at, priority);-- Only uses: status, created_at-- priority check is a filter, not index scan -- ✅ CORRECT: All equalities firstCREATE INDEX idx_good ON tasks (status, priority, created_at);-- Uses: status, priority (both equality), then created_at (range)-- All columns participate in index navigationCertain index design patterns appear repeatedly across different applications. Recognizing these patterns accelerates design decisions.
| Pattern | Use Case | Index Design |
|---|---|---|
| Tenant + Entity | Multi-tenant SaaS | (tenant_id, entity_type, entity_id) |
| Entity + Timeline | Activity feeds, histories | (entity_id, created_at DESC) |
| Category + Sort | Product listings, search results | (category_id, sort_column, id) |
| Status + Queue | Job queues, task processing | (status, priority DESC, created_at) |
| Geo + Time | Location-based services | (region_id, timestamp DESC) |
| Composite Unique | Business rule enforcement | UNIQUE (org_id, email) |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
-- PATTERN 1: Tenant + Entity (Multi-tenant)-- Every query is scoped to a tenant; tenant_id is always firstCREATE INDEX idx_tenant_users ON users (tenant_id, user_type, last_active DESC);CREATE INDEX idx_tenant_orders ON orders (tenant_id, status, created_at DESC);CREATE INDEX idx_tenant_products ON products (tenant_id, category_id, name); -- PATTERN 2: Entity + Timeline (Activity/History)-- Retrieve recent activity for an entityCREATE INDEX idx_user_activity ON activity_log ( user_id, created_at DESC) INCLUDE (action, metadata); -- PATTERN 3: Category + Sort (Listings)-- Paginated, sorted results within categoriesCREATE INDEX idx_product_listing ON products ( category_id, -- Filter price ASC, -- Sort for "low to high" product_id -- Tie-breaker for stable pagination); CREATE INDEX idx_product_listing_desc ON products ( category_id, price DESC, -- Sort for "high to low" product_id); -- PATTERN 4: Status + Queue (Job Processing)-- Workers poll for pending jobsCREATE INDEX idx_job_queue ON jobs ( status, -- Filter: 'pending' priority DESC, -- High priority first created_at ASC -- FIFO within same priority) WHERE status = 'pending'; -- Partial index -- PATTERN 5: Compound Unique Constraint-- Business rule: one email per organizationCREATE UNIQUE INDEX idx_unique_email ON users (organization_id, email); -- PATTERN 6: Soft Delete Support-- Queries always filter on deleted = falseCREATE INDEX idx_active_users ON users ( organization_id, role, created_at DESC) WHERE deleted_at IS NULL; -- Partial index for active records onlyRecognizing common mistakes helps avoid them. These anti-patterns represent recurring errors in multi-column index design:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
-- ANTI-PATTERN 1: Kitchen Sink Index-- ❌ WRONG: Include everythingCREATE INDEX idx_orders_everything ON orders ( customer_id, order_date, status, product_id, quantity, total, shipping_address, created_at);-- Size: Massive (possibly larger than table!)-- Maintenance: Extremely slow writes-- Benefit: Marginal (most queries don't use all columns) -- ✅ BETTER: Focused indexes for specific patternsCREATE INDEX idx_orders_customer ON orders (customer_id, order_date);CREATE INDEX idx_orders_status ON orders (status, created_at); -- ANTI-PATTERN 2: Duplicate Prefixes-- ❌ REDUNDANT:CREATE INDEX idx_a ON orders (customer_id);CREATE INDEX idx_ab ON orders (customer_id, order_date);CREATE INDEX idx_abc ON orders (customer_id, order_date, status);-- idx_a and idx_ab are completely redundant! -- ✅ EFFICIENT: Single composite serves all prefixesCREATE INDEX idx_abc ON orders (customer_id, order_date, status); -- ANTI-PATTERN 3: Low-Selectivity Leading Column-- Query: WHERE is_active = true AND category_id = ?-- ❌ WRONG: Boolean firstCREATE INDEX idx_bad ON products (is_active, category_id);-- is_active has only 2 values; category_id narrowing is suboptimal -- ✅ BETTER: Higher-selectivity column firstCREATE INDEX idx_good ON products (category_id, is_active);-- Or use partial index:CREATE INDEX idx_active ON products (category_id) WHERE is_active = true; -- ANTI-PATTERN 4: Index Per Query-- ❌ EXCESSIVE:CREATE INDEX idx_q1 ON orders (customer_id);CREATE INDEX idx_q2 ON orders (customer_id, status);CREATE INDEX idx_q3 ON orders (customer_id, order_date);CREATE INDEX idx_q4 ON orders (customer_id, status, order_date);-- 4 overlapping indexes! -- ✅ CONSOLIDATED:CREATE INDEX idx_orders_main ON orders (customer_id, status, order_date);-- Serves Q1, Q2, Q3 (partial), Q4 via prefix matching-- Only need one additional index if Q3's order_date range needs optimizationHow many columns should a composite index include? There's no universal answer, but these guidelines help:
The Sweet Spot: 2-4 Columns
Most effective composite indexes have 2-4 key columns. Beyond this:
| Columns | Typical Use | Considerations |
|---|---|---|
| 2 | Most common patterns | Simple, efficient, easy to maintain |
| 3 | Common for complex queries | Good balance of coverage and efficiency |
| 4 | Frequent in OLTP | May be optimal for high-frequency queries |
| 5+ | Covering indexes only | Include columns as INCLUDE, not key columns |
| 7+ | Almost never appropriate | Diminishing returns; consider query redesign |
1234567891011121314151617181920212223242526272829
-- Key columns vs INCLUDE columns -- Query: SELECT name, email FROM users -- WHERE tenant_id = ? AND status = 'active' AND role = 'admin'-- ORDER BY created_at DESC -- Option 1: All columns in key (5 key columns)CREATE INDEX idx_v1 ON users ( tenant_id, status, role, created_at DESC, name, email);-- ❌ Email changes require full index entry repositioning -- Option 2: Filter + Sort in key, output in INCLUDE (3 key columns)CREATE INDEX idx_v2 ON users (tenant_id, status, role, created_at DESC)INCLUDE (name, email);-- ✅ Email changes only update the INCLUDE portion-- ✅ Key portion is more compact, faster to traverse -- Option 3: Most selective in key, rest as filters (2 key columns)CREATE INDEX idx_v3 ON users (tenant_id, created_at DESC)INCLUDE (status, role, name, email);-- ⚠️ Status and role become filters, not seeks-- May be acceptable if most users are 'active' admins -- Guidelines for choosing:-- 1. Key columns: Those used in WHERE = or ORDER BY (up to 3-4)-- 2. INCLUDE columns: Those only needed in SELECT output-- 3. Filter columns: Low-selectivity predicates (may go in INCLUDE)Modern databases (PostgreSQL, SQL Server) support INCLUDE clauses that add columns to leaf nodes without affecting B+-tree ordering. Use INCLUDE for covering index columns that aren't used in predicates or sorting. This reduces key size and minimizes index maintenance when those columns change.
Index design is not a one-time activity. Workloads evolve, data grows, and indexes that were optimal may become suboptimal. Implement ongoing monitoring and maintenance:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
-- PostgreSQL: Find unused indexesSELECT schemaname || '.' || relname AS table, indexrelname AS index, pg_size_pretty(pg_relation_size(i.indexrelid)) AS size, idx_scan AS index_scansFROM pg_stat_user_indexes uiJOIN pg_index i ON ui.indexrelid = i.indexrelidWHERE NOT indisunique -- Exclude unique constraints AND idx_scan < 50 -- Rarely used AND pg_relation_size(i.indexrelid) > 1024*1024 -- > 1MBORDER BY pg_relation_size(i.indexrelid) DESC; -- MySQL: Index usage statisticsSELECT object_schema, object_name, index_name, count_star AS total_accesses, count_read AS reads, count_write AS writesFROM performance_schema.table_io_waits_summary_by_index_usageWHERE object_schema = 'your_database'ORDER BY count_star DESC; -- SQL Server: Index usage and missing index suggestions-- Usage statsSELECT OBJECT_NAME(s.object_id) AS TableName, i.name AS IndexName, user_seeks + user_scans + user_lookups AS Reads, user_updates AS Writes, CASE WHEN user_seeks + user_scans + user_lookups = 0 THEN 'UNUSED' ELSE 'USED' END AS StatusFROM sys.dm_db_index_usage_stats sJOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_idWHERE database_id = DB_ID()ORDER BY Reads DESC; -- Missing index suggestionsSELECT TOP 20 CONVERT(DECIMAL(18,2), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) AS improvement_measure, mid.statement AS table_name, mid.equality_columns, mid.inequality_columns, mid.included_columnsFROM sys.dm_db_missing_index_groups migJOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handleJOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handleORDER BY improvement_measure DESC; -- Index fragmentation check (SQL Server)SELECT OBJECT_NAME(ips.object_id) AS TableName, i.name AS IndexName, ips.avg_fragmentation_in_percent, ips.page_countFROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') ipsJOIN sys.indexes i ON ips.object_id = i.object_id AND ips.index_id = i.index_idWHERE ips.avg_fragmentation_in_percent > 30 AND ips.page_count > 1000ORDER BY ips.avg_fragmentation_in_percent DESC;Use this checklist when designing or reviewing multi-column indexes:
Always document the rationale for each index: which queries it serves, why the columns are ordered that way, and any trade-offs made. Future maintainers (including yourself) will thank you when revisiting indexing decisions months later.
You've now completed the comprehensive study of multi-column indexes. Here's the synthesized knowledge:
| Rule | Application |
|---|---|
| Leftmost prefix | Queries must use contiguous prefix from left |
| Equality before range | = columns precede <, >, BETWEEN columns |
| Range stops index | Columns after range predicates aren't used for seeks |
| Sort direction matters | ORDER BY must match or fully invert index order |
| Prefix subsumption | Index (A,B,C) makes separate (A) or (A,B) redundant |
| 2-4 key columns | Sweet spot for most composite indexes |
Congratulations! You've mastered multi-column index design for database systems. You understand not just the mechanics, but the strategic thinking required to design indexes that maximize query performance while minimizing overhead. This knowledge is foundational for database performance engineering at any scale.