Multi Column Indexes - Learning Module

Loading content...

0/241

Composite Index

Beyond Single-Column Indexing

In the real world of database design, queries rarely filter on just one column. Consider an e-commerce application: customers search for products by category AND price range, orders are retrieved by customer_id AND order_date, and shipping records are filtered by warehouse_id AND status AND shipment_date. Single-column indexes, while valuable, often fall short when facing these multi-predicate queries.

Composite indexes—also known as multi-column indexes, concatenated indexes, or compound indexes—solve this fundamental problem by indexing multiple columns together as a single, unified structure. Understanding how to design and leverage these indexes effectively separates routine database administrators from true database performance engineers.

What You Will Learn

By the end of this page, you will understand the fundamental structure of composite indexes, how they organize data across multiple columns, why they exist, and the key principles governing their behavior. You'll gain the foundational knowledge necessary for mastering column ordering, prefix matching, and advanced design strategies in subsequent pages.

What is a Composite Index?

A composite index (or multi-column index) is an index built on two or more columns of a table. Unlike single-column indexes that organize data by the values of one attribute, composite indexes create a hierarchical ordering based on the combined values of multiple columns.

Formal Definition:

Given a table T with columns (C₁, C₂, ..., Cₙ), a composite index I on columns (C₁, C₂, ..., Cₖ) where k ≥ 2 creates an ordering where:

Records are first sorted by C₁
Within each C₁ value, records are sorted by C₂
Within each (C₁, C₂) combination, records are sorted by C₃
And so on, through all k columns

This creates a lexicographic ordering—similar to how words are ordered in a dictionary, where the first letter is primary, the second is secondary, and so forth.

Composite Index vs Single-Column Index Comparison
Characteristic	Single-Column Index	Composite Index
Key Structure	One column value	Concatenated values of multiple columns
Sort Order	Simple ascending/descending	Lexicographic (hierarchical)
Query Coverage	One predicate efficiently	Multiple predicates on indexed columns
Space Overhead	Moderate	Higher (more data per entry)
Maintenance Cost	Lower	Higher (more columns to maintain)
Design Complexity	Simple	Requires strategic column ordering

The Dictionary Analogy

Think of a composite index like a phonebook organized by (LastName, FirstName). All 'Smiths' are grouped together (sorted by LastName), and within the Smiths, entries are sorted by FirstName (Aaron Smith before Bob Smith). This hierarchical ordering is the essence of how composite indexes work.

Physical Structure of Composite Indexes

Most database systems implement composite indexes using B+-trees, the dominant index structure in modern RDBMS. Understanding how multi-column keys are stored in a B+-tree is crucial for designing effective composite indexes.

Key Composition in B+-Trees:

In a B+-tree for a composite index on columns (A, B, C), each index entry contains:

Composite Key: The concatenated values (A, B, C) stored as a single searchable unit
Pointer: Either a record pointer (in clustered indexes) or a row identifier (in non-clustered indexes)

The tree maintains its sorted invariant based on the composite key comparison, which works lexicographically:

composite_key_comparison.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Lexicographic comparison for composite keys (A, B, C)
function compare(key1, key2):
    // Compare first column A
    if key1.A < key2.A:
        return LESS_THAN
    if key1.A > key2.A:
        return GREATER_THAN
    
    // A values are equal, compare second column B
    if key1.B < key2.B:
        return LESS_THAN
    if key1.B > key2.B:
        return GREATER_THAN
    
    // A and B values are equal, compare third column C
    if key1.C < key2.C:
        return LESS_THAN
    if key1.C > key2.C:
        return GREATER_THAN
    
    // All columns are equal
    return EQUAL
 
// Example comparisons:
// (1, 5, 10) < (1, 5, 20)  -- C is the tiebreaker
// (1, 5, 10) < (1, 6, 1)   -- B is the tiebreaker
// (1, 5, 10) < (2, 1, 1)   -- A is the tiebreaker

Physical Layout in Leaf Nodes:

The leaf nodes of a composite index B+-tree contain entries sorted by the composite key:

Leaf Node 1: [(1,A,100), (1,A,200), (1,B,100), (1,B,150), (1,C,100)] -> next
Leaf Node 2: [(2,A,50), (2,A,100), (2,B,75), (2,C,200)] -> next
Leaf Node 3: [(3,A,25), (3,B,150), (3,C,100), (3,C,200)] -> null

Notice how entries are sorted first by the first column (1, 2, 3), then by the second column (A, B, C within each first-column value), and finally by the third column (numeric values within each two-column combination).

Internal Node Keys:

Internal nodes contain separator keys that guide searches down the tree. These separator keys are also composite, containing enough columns to correctly route queries:

Converting Mermaid diagram...

Why Use Composite Indexes?

The fundamental motivation for composite indexes arises from the limitations of combining multiple single-column indexes. Consider a query with two predicates:

SELECT * FROM employees
WHERE department_id = 50 AND hire_date = '2023-01-15';

With two separate single-column indexes:

Index on department_id finds all employees in department 50 (say, 500 rows)
Index on hire_date finds all employees hired on 2023-01-15 (say, 20 rows)
The database must intersect these result sets to find matching rows

This intersection operation has overhead, and neither index alone can efficiently pinpoint the exact rows.

With a composite index on (department_id, hire_date):

The index directly navigates to entries where department_id = 50 AND hire_date = '2023-01-15'
The query finds the exact rows in a single index traversal
No post-processing intersection needed

Key Benefits of Composite Indexes

•Efficient Multi-Predicate Queries — A single index lookup satisfies multiple WHERE conditions, eliminating the need for index intersection or multiple index scans.
•Improved Selectivity — The combined selectivity of multiple columns is often much higher than any single column, leading to fewer rows examined.
•Covering Index Potential — Composite indexes that include all columns needed by a query can serve as covering indexes, eliminating table access entirely.
•Sort Optimization — Composite indexes can satisfy ORDER BY clauses involving multiple columns, avoiding costly filesort operations.
•Range Query Support — When designed correctly, composite indexes efficiently handle queries mixing equality and range predicates.

Composite Index Trade-offs

Composite indexes are not free. They consume more storage space, increase INSERT/UPDATE/DELETE overhead, and require careful design. A poorly designed composite index may not benefit your queries at all while still incurring maintenance costs. Understanding the principles covered in this module is essential to avoid these pitfalls.

Composite Index Selectivity

Selectivity measures how effectively an index narrows down the search space. For a single column, selectivity is the reciprocal of the number of distinct values (assuming uniform distribution):

Selectivity(column) = 1 / NDV(column)

Where NDV is the Number of Distinct Values.

For composite indexes, selectivity combines across columns:

Selectivity(A, B) ≈ Selectivity(A) × Selectivity(B)
             = 1 / (NDV(A) × NDV(B))

This multiplicative effect is why composite indexes can be dramatically more selective than single-column indexes.

Selectivity Comparison ExampleConsider an employees table with 100,000 rows:

Input

Output

The Independence Assumption:

The multiplicative selectivity formula assumes column values are statistically independent. In practice, columns are often correlated:

Job titles may cluster by department (all 'Developer' titles in Engineering)
Certain product categories may dominate specific warehouses
Geographic regions may correlate with customer demographics

When columns are correlated, the composite selectivity is worse than the formula predicts because the combinations are not uniformly distributed. Database query optimizers use histograms and multi-column statistics to account for these correlations.

Selectivity Impact on Query Performance
Index Type	Selectivity	Rows Examined (100K table)	I/O Operations
Full Table Scan	100%	100,000	High (all pages)
Single-Column (low selectivity)	10%	10,000	Moderate
Single-Column (high selectivity)	1%	1,000	Low-Moderate
Composite (two columns)	0.1%	100	Low
Composite (three columns)	0.01%	10	Very Low

Creating Composite Indexes

The SQL syntax for creating composite indexes is straightforward—list multiple columns in the CREATE INDEX statement. However, the order of columns in the definition is critically important and determines which queries the index can serve effectively.

Standard SQL Syntax:

CREATE INDEX index_name ON table_name (column1, column2, column3, ...);

The column order directly maps to the lexicographic ordering in the B+-tree: column1 is the primary sort key, column2 is secondary, and so on.

mysql_composite_indexes.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Basic composite index
CREATE INDEX idx_emp_dept_date ON employees (department_id, hire_date);
 
-- Composite index with sort order (MySQL 8.0+)
CREATE INDEX idx_emp_dept_date_desc ON employees (
    department_id ASC, 
    hire_date DESC
);
 
-- Composite unique index
CREATE UNIQUE INDEX idx_unique_emp ON employees (email, department_id);
 
-- Show index structure
SHOW INDEX FROM employees;
 
-- Analyze index usage
EXPLAIN SELECT * FROM employees 
WHERE department_id = 50 AND hire_date = '2023-01-15';
 
-- Composite index with prefix length (for VARCHAR columns)
CREATE INDEX idx_emp_name_dept ON employees (
    last_name(20),  -- First 20 characters only
    first_name(15),
    department_id
);

Column Order is Immutable

Once a composite index is created, you cannot change the column order without dropping and recreating the index. This is why understanding column ordering principles (covered in the next page) is essential before creating composite indexes in production systems.

Common Use Cases for Composite Indexes

Composite indexes excel in specific query patterns. Understanding these patterns helps identify opportunities to improve performance through strategic multi-column indexing.

Primary Use Cases

•Multi-Predicate Filtering — Queries with multiple AND conditions on different columns. Example: WHERE customer_id = 100 AND order_date = '2023-01-15'
•Foreign Key + Timestamp Patterns — Very common in transactional systems where you retrieve records for a specific entity within a time range. Example: WHERE user_id = 50 AND created_at BETWEEN '2023-01-01' AND '2023-12-31'
•Compound Uniqueness — Enforcing uniqueness across column combinations. Example: Unique constraint on (email, organization_id) in multi-tenant systems.
•Multi-Column Sorting — Queries requiring ORDER BY on multiple columns. Example: ORDER BY last_name, first_name
•Covering Index Scenarios — Including all columns needed by a query to avoid table lookups entirely.
•Partitioning Support — Composite indexes on partition key + other columns for efficient partition pruning.

Good Candidates

•High-frequency queries with multiple predicates
•Queries on columns that appear together
•Join conditions on multiple columns
•Queries needing ORDER BY optimization
•Covering index opportunities

Poor Candidates

•Low-frequency or ad-hoc queries
•Tables with heavy write operations
•Columns with low combined selectivity
•Queries that skip leading columns
•Very wide indexes (5+ columns usually)

Storage and Maintenance Overhead

Composite indexes come with measurable costs that must be weighed against their query performance benefits.

Storage Requirements:

The storage size of a composite index depends on:

Number of rows in the table
Combined size of all indexed columns
Index structure overhead (B+-tree internal nodes, pointers)
Fill factor settings

Approximate formula:

Index Size ≈ Rows × (Sum of column sizes + Pointer size) × (1 / Fill factor) × Overhead factor

Example Index Size Comparison (1 million rows)
Index Type	Column(s)	Column Sizes	Approx. Size
Single-column	department_id (INT)	4 bytes	~12 MB
Single-column	hire_date (DATE)	3 bytes	~10 MB
Composite (2 col)	dept_id, hire_date	4 + 3 = 7 bytes	~18 MB
Composite (3 col)	dept_id, hire_date, status	4 + 3 + 4 = 11 bytes	~24 MB
Composite (wide)	5 columns (various)	~40 bytes total	~60 MB

Maintenance Overhead:

Every INSERT, UPDATE (on indexed columns), and DELETE operation must update the composite index:

INSERT: New entry must be added at the correct sorted position, potentially causing page splits
UPDATE: If any indexed column changes, the entry must be repositioned (effectively delete + insert)
DELETE: Entry must be removed, potentially causing page merges

The maintenance cost increases with:

Number of composite indexes on the table
Number of columns in each composite index
Frequency of DML operations
Amount of data being modified

The Index Explosion Problem

A common anti-pattern is creating many composite indexes to cover every possible query combination. With N columns, there are N! possible column orderings. Creating indexes for all combinations leads to severe storage bloat and write performance degradation. Strategic index design, not brute-force coverage, is the solution.

Summary: Composite Index Foundations

We've established the foundational understanding of composite indexes. Here are the key concepts to internalize:

Key Takeaways

•Composite indexes index multiple columns together using lexicographic ordering—the first column is primary, subsequent columns break ties.
•They are implemented as B+-trees with concatenated keys, enabling efficient navigation to specific value combinations.
•Composite indexes provide multiplicative selectivity, dramatically reducing rows examined when filtering on multiple columns.
•They eliminate the need for index intersection operations, improving query efficiency for multi-predicate filters.
•Creation syntax lists columns in order of importance—this order determines which queries can use the index effectively.
•Storage and maintenance costs increase with the number and size of indexed columns, requiring strategic design decisions.

What's Next:

Now that you understand what composite indexes are and how they work, the next critical question is: In what order should columns be placed? The next page dives deep into column ordering strategies—arguably the most important decision when designing composite indexes.

Page Complete

You now understand the fundamental structure and purpose of composite indexes. You know how multi-column keys are organized in B+-trees, why composite indexes outperform multiple single-column indexes for multi-predicate queries, and the trade-offs involved. Next, we'll explore the critical topic of column ordering.