Loading content...
In the real world of database design, queries rarely filter on just one column. Consider an e-commerce application: customers search for products by category AND price range, orders are retrieved by customer_id AND order_date, and shipping records are filtered by warehouse_id AND status AND shipment_date. Single-column indexes, while valuable, often fall short when facing these multi-predicate queries.
Composite indexes—also known as multi-column indexes, concatenated indexes, or compound indexes—solve this fundamental problem by indexing multiple columns together as a single, unified structure. Understanding how to design and leverage these indexes effectively separates routine database administrators from true database performance engineers.
By the end of this page, you will understand the fundamental structure of composite indexes, how they organize data across multiple columns, why they exist, and the key principles governing their behavior. You'll gain the foundational knowledge necessary for mastering column ordering, prefix matching, and advanced design strategies in subsequent pages.
A composite index (or multi-column index) is an index built on two or more columns of a table. Unlike single-column indexes that organize data by the values of one attribute, composite indexes create a hierarchical ordering based on the combined values of multiple columns.
Formal Definition:
Given a table T with columns (C₁, C₂, ..., Cₙ), a composite index I on columns (C₁, C₂, ..., Cₖ) where k ≥ 2 creates an ordering where:
This creates a lexicographic ordering—similar to how words are ordered in a dictionary, where the first letter is primary, the second is secondary, and so forth.
| Characteristic | Single-Column Index | Composite Index |
|---|---|---|
| Key Structure | One column value | Concatenated values of multiple columns |
| Sort Order | Simple ascending/descending | Lexicographic (hierarchical) |
| Query Coverage | One predicate efficiently | Multiple predicates on indexed columns |
| Space Overhead | Moderate | Higher (more data per entry) |
| Maintenance Cost | Lower | Higher (more columns to maintain) |
| Design Complexity | Simple | Requires strategic column ordering |
Think of a composite index like a phonebook organized by (LastName, FirstName). All 'Smiths' are grouped together (sorted by LastName), and within the Smiths, entries are sorted by FirstName (Aaron Smith before Bob Smith). This hierarchical ordering is the essence of how composite indexes work.
Most database systems implement composite indexes using B+-trees, the dominant index structure in modern RDBMS. Understanding how multi-column keys are stored in a B+-tree is crucial for designing effective composite indexes.
Key Composition in B+-Trees:
In a B+-tree for a composite index on columns (A, B, C), each index entry contains:
The tree maintains its sorted invariant based on the composite key comparison, which works lexicographically:
123456789101112131415161718192021222324252627
// Lexicographic comparison for composite keys (A, B, C)function compare(key1, key2): // Compare first column A if key1.A < key2.A: return LESS_THAN if key1.A > key2.A: return GREATER_THAN // A values are equal, compare second column B if key1.B < key2.B: return LESS_THAN if key1.B > key2.B: return GREATER_THAN // A and B values are equal, compare third column C if key1.C < key2.C: return LESS_THAN if key1.C > key2.C: return GREATER_THAN // All columns are equal return EQUAL // Example comparisons:// (1, 5, 10) < (1, 5, 20) -- C is the tiebreaker// (1, 5, 10) < (1, 6, 1) -- B is the tiebreaker// (1, 5, 10) < (2, 1, 1) -- A is the tiebreakerPhysical Layout in Leaf Nodes:
The leaf nodes of a composite index B+-tree contain entries sorted by the composite key:
Leaf Node 1: [(1,A,100), (1,A,200), (1,B,100), (1,B,150), (1,C,100)] -> next
Leaf Node 2: [(2,A,50), (2,A,100), (2,B,75), (2,C,200)] -> next
Leaf Node 3: [(3,A,25), (3,B,150), (3,C,100), (3,C,200)] -> null
Notice how entries are sorted first by the first column (1, 2, 3), then by the second column (A, B, C within each first-column value), and finally by the third column (numeric values within each two-column combination).
Internal Node Keys:
Internal nodes contain separator keys that guide searches down the tree. These separator keys are also composite, containing enough columns to correctly route queries:
The fundamental motivation for composite indexes arises from the limitations of combining multiple single-column indexes. Consider a query with two predicates:
SELECT * FROM employees
WHERE department_id = 50 AND hire_date = '2023-01-15';
With two separate single-column indexes:
This intersection operation has overhead, and neither index alone can efficiently pinpoint the exact rows.
With a composite index on (department_id, hire_date):
Composite indexes are not free. They consume more storage space, increase INSERT/UPDATE/DELETE overhead, and require careful design. A poorly designed composite index may not benefit your queries at all while still incurring maintenance costs. Understanding the principles covered in this module is essential to avoid these pitfalls.
Selectivity measures how effectively an index narrows down the search space. For a single column, selectivity is the reciprocal of the number of distinct values (assuming uniform distribution):
Selectivity(column) = 1 / NDV(column)
Where NDV is the Number of Distinct Values.
For composite indexes, selectivity combines across columns:
Selectivity(A, B) ≈ Selectivity(A) × Selectivity(B)
= 1 / (NDV(A) × NDV(B))
This multiplicative effect is why composite indexes can be dramatically more selective than single-column indexes.
The Independence Assumption:
The multiplicative selectivity formula assumes column values are statistically independent. In practice, columns are often correlated:
When columns are correlated, the composite selectivity is worse than the formula predicts because the combinations are not uniformly distributed. Database query optimizers use histograms and multi-column statistics to account for these correlations.
| Index Type | Selectivity | Rows Examined (100K table) | I/O Operations |
|---|---|---|---|
| Full Table Scan | 100% | 100,000 | High (all pages) |
| Single-Column (low selectivity) | 10% | 10,000 | Moderate |
| Single-Column (high selectivity) | 1% | 1,000 | Low-Moderate |
| Composite (two columns) | 0.1% | 100 | Low |
| Composite (three columns) | 0.01% | 10 | Very Low |
The SQL syntax for creating composite indexes is straightforward—list multiple columns in the CREATE INDEX statement. However, the order of columns in the definition is critically important and determines which queries the index can serve effectively.
Standard SQL Syntax:
CREATE INDEX index_name ON table_name (column1, column2, column3, ...);
The column order directly maps to the lexicographic ordering in the B+-tree: column1 is the primary sort key, column2 is secondary, and so on.
12345678910111213141516171819202122232425
-- Basic composite indexCREATE INDEX idx_emp_dept_date ON employees (department_id, hire_date); -- Composite index with sort order (MySQL 8.0+)CREATE INDEX idx_emp_dept_date_desc ON employees ( department_id ASC, hire_date DESC); -- Composite unique indexCREATE UNIQUE INDEX idx_unique_emp ON employees (email, department_id); -- Show index structureSHOW INDEX FROM employees; -- Analyze index usageEXPLAIN SELECT * FROM employees WHERE department_id = 50 AND hire_date = '2023-01-15'; -- Composite index with prefix length (for VARCHAR columns)CREATE INDEX idx_emp_name_dept ON employees ( last_name(20), -- First 20 characters only first_name(15), department_id);Once a composite index is created, you cannot change the column order without dropping and recreating the index. This is why understanding column ordering principles (covered in the next page) is essential before creating composite indexes in production systems.
Composite indexes excel in specific query patterns. Understanding these patterns helps identify opportunities to improve performance through strategic multi-column indexing.
WHERE customer_id = 100 AND order_date = '2023-01-15'WHERE user_id = 50 AND created_at BETWEEN '2023-01-01' AND '2023-12-31'ORDER BY last_name, first_nameComposite indexes come with measurable costs that must be weighed against their query performance benefits.
Storage Requirements:
The storage size of a composite index depends on:
Approximate formula:
Index Size ≈ Rows × (Sum of column sizes + Pointer size) × (1 / Fill factor) × Overhead factor
| Index Type | Column(s) | Column Sizes | Approx. Size |
|---|---|---|---|
| Single-column | department_id (INT) | 4 bytes | ~12 MB |
| Single-column | hire_date (DATE) | 3 bytes | ~10 MB |
| Composite (2 col) | dept_id, hire_date | 4 + 3 = 7 bytes | ~18 MB |
| Composite (3 col) | dept_id, hire_date, status | 4 + 3 + 4 = 11 bytes | ~24 MB |
| Composite (wide) | 5 columns (various) | ~40 bytes total | ~60 MB |
Maintenance Overhead:
Every INSERT, UPDATE (on indexed columns), and DELETE operation must update the composite index:
The maintenance cost increases with:
A common anti-pattern is creating many composite indexes to cover every possible query combination. With N columns, there are N! possible column orderings. Creating indexes for all combinations leads to severe storage bloat and write performance degradation. Strategic index design, not brute-force coverage, is the solution.
We've established the foundational understanding of composite indexes. Here are the key concepts to internalize:
What's Next:
Now that you understand what composite indexes are and how they work, the next critical question is: In what order should columns be placed? The next page dives deep into column ordering strategies—arguably the most important decision when designing composite indexes.
You now understand the fundamental structure and purpose of composite indexes. You know how multi-column keys are organized in B+-trees, why composite indexes outperform multiple single-column indexes for multi-predicate queries, and the trade-offs involved. Next, we'll explore the critical topic of column ordering.