Loading learning content...
Real-world queries rarely filter on a single column. Searches for orders involve customer ID and date ranges. Product lookups combine category with price filters. User authentication checks email and password hash. To optimize these multi-predicate queries, we need composite indexes—indexes built on multiple columns simultaneously.
A composite index (also called multi-column index, concatenated index, or compound index) stores a single B+-tree sorted by a combination of column values. The column order within this index is absolutely critical—it determines which queries can leverage the index effectively. Mastering composite index design is perhaps the single most impactful skill for database performance tuning.
By the end of this page, you will understand how composite indexes organize multi-column data, the critical leftmost prefix rule that governs index usability, strategies for optimal column ordering, how indexes interact with ORDER BY and GROUP BY, the concept of covering indexes for index-only scans, and advanced techniques for serving multiple query patterns.
A composite index is a B+-tree index where each entry is a tuple of values from multiple columns. The entries are sorted lexicographically—like words in a dictionary—first by the first column, then by the second within ties, then by the third, and so on.
Lexicographic ordering:
Consider an index on (last_name, first_name, birth_date):
This ordering has profound implications for which queries can use the index.
12345678910111213141516171819202122232425262728293031
Composite Index on (last_name, first_name, birth_date) B+-tree Index Structure: ┌─────────────────────────────────┐ │ Root Node │ │ (Baker, Alice, ...) │ └───────────────┬─────────────────┘ │ ┌───────────────────────┴───────────────────────┐ ▼ ▼ ┌───────────────────────────────┐ ┌───────────────────────────────┐ │ Left Subtree │ │ Right Subtree │ │ All names < (Baker, Alice) │ │ All names ≥ (Baker, Alice) │ └───────────────────────────────┘ └───────────────────────────────┘ Leaf Level (sorted entries):┌─────────────────────────────────────────────────────────────────────────────────┐│ (Adams, Alice, 1985-03-15) → Row1 ││ (Adams, Alice, 1990-07-22) → Row7 ← Same name, different dates ││ (Adams, Bob, 1988-11-30) → Row3 ││ (Adams, Carol, 1982-01-05) → Row5 ││ (Baker, Alice, 1995-09-12) → Row2 ││ (Baker, David, 1978-04-18) → Row4 ││ (Chen, Alice, 1992-06-28) → Row6 ││ ... │└─────────────────────────────────────────────────────────────────────────────────┘ Sort Order:1. Primary sort: last_name (alphabetically)2. Secondary sort: first_name (within same last_name)3. Tertiary sort: birth_date (within same last_name + first_name)The composite key is treated as a single sortable value. Just as strings 'abc' < 'abd' < 'xyz', tuples compare element-by-element: (1, 5) < (1, 7) < (2, 3). This lexicographic comparison enables binary search on the combined key.
The leftmost prefix rule is the most important concept in composite index usage. It states:
A composite index on (A, B, C) can efficiently serve queries that filter on:
- (A) alone
- (A, B) together
- (A, B, C) together
But NOT queries filtering only on:
- (B) alone
- (C) alone
- (B, C) together
Why this matters:
The index is sorted first by column A. To find entries where B = 5, you'd need to scan the entire index—entries with B = 5 are scattered across all values of A. But to find A = 'Smith', you can binary search directly to that section.
Think of it like a phone book sorted by (last_name, first_name). You can quickly find all "Smiths", or "Smith, John" specifically. But finding all "Johns" (without knowing the last name) requires reading every page.
| Query Filter | Uses Index? | Why |
|---|---|---|
| WHERE A = 1 | ✅ Yes | A is the first column, binary search works |
| WHERE A = 1 AND B = 2 | ✅ Yes | Leftmost prefix (A, B) |
| WHERE A = 1 AND B = 2 AND C = 3 | ✅ Yes | Complete key match |
| WHERE B = 2 | ❌ No* | B is not leftmost, cannot narrow search |
| WHERE C = 3 | ❌ No* | C is not leftmost |
| WHERE B = 2 AND C = 3 | ❌ No* | Neither B nor C is leftmost |
| WHERE A = 1 AND C = 3 | ⚠️ Partial | Uses index for A=1, then scans for C=3 |
1234567891011121314151617181920212223242526272829
-- Index definition:CREATE INDEX idx_orders ON orders (customer_id, order_date, status); -- Query 1: Uses index perfectly (complete prefix)SELECT * FROM ordersWHERE customer_id = 100 AND order_date = '2024-01-15' AND status = 'shipped';-- Access path: Index seek on (100, '2024-01-15', 'shipped') -- Query 2: Uses index (partial prefix)SELECT * FROM ordersWHERE customer_id = 100 AND order_date BETWEEN '2024-01-01' AND '2024-01-31';-- Access path: Index range scan for customer_id=100 within date range -- Query 3: Uses index (leftmost column only)SELECT * FROM ordersWHERE customer_id = 100;-- Access path: Index seek on customer_id=100, scan all dates/statuses -- Query 4: CANNOT use index efficientlySELECT * FROM ordersWHERE order_date = '2024-01-15';-- Access path: Full table scan (or index scan checking every entry)-- order_date is not the leftmost column! -- Query 5: Partial index useSELECT * FROM ordersWHERE customer_id = 100 AND status = 'shipped';-- Access path: Index seek on customer_id=100, filter status (skips date)-- Index helps with customer_id, but not status (due to "gap")In an index (A, B, C), a query on A and C (skipping B) can only use the index for the A filter. Once you skip a column in the prefix, subsequent columns cannot leverage the index's sort order for seeking—though they may still be used for filtering after the seek.
Given the leftmost prefix rule, the order of columns in a composite index is a critical design decision. Here are strategic principles for optimal column ordering:
Principle 1: Equality Predicates First
Columns used in equality comparisons (=) should come before columns used in range comparisons (<, >, BETWEEN, LIKE 'prefix%').
Why: After an equality match, the index can still use the sorted order of subsequent columns. But after a range scan starts, subsequent columns are scrambled within the range.
Principle 2: High Selectivity First (Sometimes)
A common guideline is to place more selective columns (columns that filter out more rows) first. However, this is secondary to the equality-before-range principle.
1234567891011121314151617181920212223242526
-- Common query pattern:SELECT * FROM ordersWHERE customer_id = 100 -- Equality AND order_date BETWEEN '2024-01-01' -- Range AND '2024-03-31' AND status = 'shipped'; -- Equality (but after range!) -- Index Option 1: (customer_id, order_date, status)-- customer_id = 100 → Seek to customer 100's section-- order_date BETWEEN → Range scan within Q1 2024-- status = 'shipped' → Filter applied (index not helping status)-- Result: Good, but status filter is post-range -- Index Option 2: (customer_id, status, order_date)-- customer_id = 100 → Seek to customer 100's section-- status = 'shipped' → Further narrow to 'shipped' orders-- order_date BETWEEN → Range scan within shipped orders-- Result: BETTER! All three columns contribute to seek/scan -- Column order matters:-- (customer_id, order_date, status) - 2 columns fully used-- (customer_id, status, order_date) - 3 columns fully used -- General pattern: Equality columns, then Range column lastCREATE INDEX idx_orders_optimized ON orders (customer_id, status, order_date);There's no universally correct column order—it depends entirely on your query patterns. Collect slow query logs, identify the most frequent and expensive queries, and design indexes that serve them. A single carefully designed composite index often outperforms multiple single-column indexes.
Composite indexes can eliminate the need for expensive sorting operations when queries include ORDER BY or GROUP BY. The key is ensuring the index order matches the requested output order.
ORDER BY optimization:
If the index is on (A, B, C) and the query is:
ORDER BY A, B → Index can provide sorted output directlyORDER BY A, B, C → Index can provide sorted output directlyORDER BY B, C → Index CANNOT help (leftmost prefix rule)ORDER BY A DESC, B ASC → Depends on database (some support mixed sort)1234567891011121314151617181920212223242526272829303132
-- Index: (customer_id, order_date)CREATE INDEX idx_customer_orders ON orders (customer_id, order_date); -- Query 1: Perfect match - no sort neededSELECT * FROM ordersWHERE customer_id = 100ORDER BY order_date;-- Index provides orders for customer 100, already sorted by date!-- EXPLAIN shows: "Using index" with no "Using filesort" -- Query 2: Still works - both columns in orderSELECT * FROM ordersORDER BY customer_id, order_date;-- Index is sorted exactly this way - no sort operation needed -- Query 3: Cannot use index for ORDER BYSELECT * FROM ordersWHERE customer_id = 100ORDER BY status; -- status not in this index-- Index used for WHERE, but filesort needed for ORDER BY -- Query 4: Partial optimizationSELECT * FROM ordersWHERE customer_id = 100ORDER BY order_date DESC;-- Index scan in reverse order - no filesort (if DB supports backward scan) -- GROUP BY works similarly:SELECT customer_id, COUNT(*) FROM ordersGROUP BY customer_id;-- Index provides rows pre-grouped by customer_id (if leading column)| ORDER BY Clause | Sort Eliminated? | Condition |
|---|---|---|
| A | ✅ Yes | Matches leftmost prefix |
| A, B | ✅ Yes | Matches prefix |
| A, B, C | ✅ Yes | Complete match |
| A DESC, B DESC, C DESC | ✅ Yes* | *If DB supports backward scan |
| B, C | ❌ No | Doesn't start with A |
| A, C | ❌ No | Gap in middle (skips B) |
| A ASC, B DESC | ⚠️ Maybe | Mixed direction requires special index |
ORDER BY optimization is especially powerful with LIMIT. For 'SELECT ... ORDER BY date DESC LIMIT 10', the database can use an index to return the top 10 directly without sorting all rows. Without an index, it must sort every matching row, then return 10—potentially millions of times slower.
A covering index is not a special type of index—it's a usage pattern where a query can be satisfied entirely from the index without accessing the base table. This occurs when all columns needed by the query (in SELECT, WHERE, ORDER BY, GROUP BY) are present in the index.
Why covering indexes are powerful:
The performance improvement can be 10-100x for queries on large tables.
1234567891011121314151617181920212223242526272829
-- Consider this query:SELECT customer_id, order_date, total_amountFROM ordersWHERE customer_id = 100 AND order_date >= '2024-01-01'; -- Index Option 1: (customer_id, order_date)-- Can use index for WHERE, but must fetch total_amount from table-- Access path: Index seek → table lookup (row by row) → return results -- Index Option 2: (customer_id, order_date, total_amount)-- Now index COVERS the query - all needed columns in index!-- Access path: Index seek → return results directly from index CREATE INDEX idx_orders_covering ON orders (customer_id, order_date, total_amount); -- EXPLAIN shows "Using index" or "Index Only Scan"-- No table access required! -- INCLUDE syntax (SQL Server, PostgreSQL 11+):-- When additional columns are only for covering, not filtering:CREATE INDEX idx_orders_coveringON orders (customer_id, order_date)INCLUDE (total_amount, status, shipping_date); -- Benefits of INCLUDE:-- - Included columns don't affect sort order-- - Smaller tree (included cols only in leaf nodes)-- - More flexible for covering multiple queriesWith Table Access:
Index Seek
│
▼
Row ID from index
│
▼
Table Lookup (random I/O)
│
▼
Fetch full row
│
▼
Extract needed columns
Cost: Index traversal + N random table reads
Covering Index (No Table Access):
Index Seek
│
▼
All columns in index
│
▼
Return directly
Cost: Index traversal only
Savings: Often 10-100x faster
Adding columns to make an index covering increases index size. A bloated index consumes more memory, has higher write overhead, and may degrade other queries. Balance covering benefits against storage and maintenance costs. INCLUDE columns help by keeping non-key columns out of internal nodes.
When a query has multiple filter conditions and no single index covers them all, some databases can use index intersection—combining results from multiple indexes using set operations.
How index intersection works:
This allows the optimizer to leverage existing single-column indexes for multi-column queries.
123456789101112131415161718192021222324252627282930
-- Existing indexes:CREATE INDEX idx_customer ON orders (customer_id);CREATE INDEX idx_status ON orders (status); -- Query with two conditions:SELECT * FROM ordersWHERE customer_id = 100 AND status = 'shipped'; -- Without composite index, optimizer has choices: -- Option 1: Use idx_customer, filter status-- Scan all orders for customer 100, filter by status in memory -- Option 2: Use idx_status, filter customer-- Scan all 'shipped' orders, filter by customer in memory -- Option 3: Index intersection (if supported)-- Step 1: Get all row IDs where customer_id = 100 from idx_customer-- Step 2: Get all row IDs where status = 'shipped' from idx_status-- Step 3: Intersect the two sets of row IDs-- Step 4: Fetch only the intersecting rows from table -- Index intersection is typically chosen when:-- - Both predicates are selective-- - Neither index alone reduces result set enough-- - A composite index doesn't exist -- However, composite index usually beats intersection:CREATE INDEX idx_customer_status ON orders (customer_id, status);-- Single index seek, no intersection overhead| Aspect | Composite Index | Index Intersection |
|---|---|---|
| Number of indexes | One multi-column | Multiple single-column |
| Query efficiency | Single index seek | Multiple seeks + merge |
| Storage overhead | One larger index | Multiple smaller indexes |
| Flexibility | Best for specific query | Covers varied query patterns |
| Typical performance | Faster (10x or more) | Slower but still better than scan |
| Database support | All databases | MySQL, PostgreSQL, SQL Server (varies) |
Index intersection shines when you can't predict query patterns in advance. An OLAP system with ad-hoc queries might benefit from single-column indexes that can be combined, rather than trying to create composite indexes for every possible filter combination.
In practice, you need to design indexes that serve multiple queries efficiently while minimizing index count. Here's a systematic approach:
Step 1: Catalog your important queries
Identify the most frequent, performance-critical, or SLA-bound queries.
Step 2: Identify common column prefixes
Look for columns that appear together across queries. These are candidates for composite indexes.
Step 3: Apply the prefix-sharing principle
A single index can serve multiple queries if their needed columns are prefixes of each other.
12345678910111213141516171819202122232425262728293031323334
-- Query workload analysis:-- Q1: WHERE customer_id = ?-- Q2: WHERE customer_id = ? AND status = ?-- Q3: WHERE customer_id = ? AND status = ? AND order_date = ?-- Q4: WHERE customer_id = ? ORDER BY order_date DESC LIMIT 10 -- Observation: customer_id is common to all queries -- Option A: Four separate indexes (wasteful!)CREATE INDEX idx1 ON orders (customer_id);CREATE INDEX idx2 ON orders (customer_id, status);CREATE INDEX idx3 ON orders (customer_id, status, order_date);CREATE INDEX idx4 ON orders (customer_id, order_date); -- Option B: Strategic consolidation (better!)-- Index 1: (customer_id, status, order_date) - serves Q1, Q2, Q3-- Index 2: (customer_id, order_date) - serves Q1, Q4 -- Even better: Can we serve all with fewer indexes?-- Q4 needs customer_id + order_date in specific order-- Q3 needs customer_id + status + order_date -- Insight: We can't fully consolidate because:-- ORDER BY order_date needs order_date to immediately follow customer_id -- Final design:CREATE INDEX idx_cust_status_date ON orders (customer_id, status, order_date);CREATE INDEX idx_cust_date ON orders (customer_id, order_date); -- 2 indexes serve all 4 query patterns efficiently-- idx1 is a prefix of idx_cust_status_date (Q1 served)-- idx2 is a prefix of idx_cust_status_date (Q2 served)-- idx3 is exactly idx_cust_status_date (Q3 served)-- idx4 is served by idx_cust_date (Q4 served)Every index adds write overhead. A table with 10 indexes means every INSERT updates 11 structures (table + 10 indexes). For write-heavy workloads, fewer well-designed composite indexes outperform many single-column indexes. Target 3-5 indexes per table as a starting guideline.
We've conducted a deep exploration of composite indexes—the multi-column structures that are essential for optimizing real-world queries that filter, sort, and aggregate across multiple attributes.
What's next:
With our understanding of primary, secondary, unique, and composite indexes complete, we'll tie everything together with an index classification framework—a systematic approach to categorizing and selecting the right index type for any given scenario.
You now have deep knowledge of composite index design: from the fundamental lexicographic ordering and leftmost prefix rule, through column ordering strategy and ORDER BY integration, to advanced techniques like covering indexes and multi-query optimization. This knowledge enables you to design high-performance index strategies.