Loading learning content...
Real-world database queries rarely filter on a single column. Consider a typical query: WHERE user_id = 123 AND created_at > '2024-01-01' AND status = 'active'. With only single-column indexes, the database has limited options—use one index and scan for the rest, or perform a bitmap intersection of multiple index scans.
Composite indexes (also called multi-column, compound, or concatenated indexes) solve this problem by indexing multiple columns together as a single unit. A well-designed composite index can transform a query that touches thousands of rows into one that retrieves exactly what's needed.
However, composite indexes come with subtleties that trip up even experienced engineers. Column order matters enormously, and the same columns in different orders create indexes with vastly different capabilities. Understanding these nuances is essential for effective index design.
By the end of this page, you'll understand how composite indexes are structured internally, master the leftmost prefix rule, know how to choose optimal column order, design covering indexes, handle various query patterns, and avoid common composite index pitfalls.
A composite index on columns (A, B, C) creates a single B-tree where the sorting key is the concatenation of all column values. This is fundamentally different from having three separate indexes.
How Composite Keys are Ordered:
Rows are sorted by A first, then by B within each A value, then by C within each (A, B) combination. This is exactly like alphabetizing names: sort by last name first, then by first name for matching last names, then by middle name for matching first and last names.
Composite key (country, city, street) sorted:
(France, Lyon, Rue A)
(France, Lyon, Rue B)
(France, Paris, Avenue C)
(France, Paris, Boulevard D)
(Germany, Berlin, Straße E)
(Germany, Munich, Weg F)
(USA, Boston, Street G)
(USA, New York, Avenue H)
Notice how:
Composite B-tree Index on (category, status, created_at) ┌─────────────────────────────────────────┐ │ ROOT NODE │ │ (Books, active, 2024-01-15) │ │ (Electronics, pending, 2024-02-01) │ └─────────────────────────────────────────┘ / | \ / | \ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ Books section │ │ Electronics │ │ Food section │ │ (all statuses) │ │ (active/pending│ │ (all statuses) │ └────────────────┘ └────────────────┘ └────────────────┘ | | | ▼ ▼ ▼ ┌──────────────────────────────────────────────────────────────────┐ │ LEAF NODES (sorted entries) │ ├──────────────────────────────────────────────────────────────────┤ │ (Books, active, 2024-01-01, row_ptr_17) │ │ (Books, active, 2024-01-15, row_ptr_42) │ │ (Books, active, 2024-02-28, row_ptr_8) │ │ (Books, pending, 2024-01-05, row_ptr_33) │ │ (Books, pending, 2024-03-10, row_ptr_51) │ │ (Electronics, active, 2024-01-20, row_ptr_12) │ │ ... │ └──────────────────────────────────────────────────────────────────┘ Query: WHERE category = 'Books' AND status = 'active' → Navigate to (Books, active, *) range → All matching rows are contiguous! → Efficient range scan within the indexKey Insight: Composite Indexes Create Hierarchical Sorting
Think of a composite index as creating nested sort orders:
This hierarchical structure is what makes composite indexes powerful—and also what creates their constraints. The hierarchy must be traversed in order; you can't skip levels.
Each entry in the composite index stores the concatenated values of all indexed columns plus the row pointer. An index on (category VARCHAR(50), status VARCHAR(20), created_at TIMESTAMP) stores approximately 50 + 20 + 8 + pointer_size bytes per entry. Larger entries mean fewer entries per page and potentially taller trees.
The leftmost prefix rule is the most important concept in composite index design. It determines which queries can use a composite index and which cannot.
The Rule:
A composite index on (A, B, C) can be used for queries that filter on:
It cannot efficiently support queries that filter on:
Why?
Remember the hierarchical sorting. The index is sorted by A first. To find entries where B = 'some_value' (with no A constraint), you'd need to check every A value because B values for different A's are scattered throughout the index.
| Query Condition | Uses Index? | Explanation |
|---|---|---|
| WHERE A = 1 | ✅ Yes | Leftmost column, efficient seek |
| WHERE A = 1 AND B = 2 | ✅ Yes | Left two columns, efficient seek |
| WHERE A = 1 AND B = 2 AND C = 3 | ✅ Yes | All columns, most efficient |
| WHERE B = 2 | ❌ No | Skips A, cannot use index efficiently |
| WHERE C = 3 | ❌ No | Skips A and B, cannot use index |
| WHERE B = 2 AND C = 3 | ❌ No | Skips A, cannot use index |
| WHERE A = 1 AND C = 3 | ⚠️ Partial | Uses A, but C can't use index position |
| WHERE A > 1 | ✅ Yes | Range on leftmost, efficient range scan |
| WHERE A = 1 AND B > 2 | ✅ Yes | Equality on A, range on B |
| WHERE A > 1 AND B = 2 | ⚠️ Partial | Range on A stops B from using index |
123456789101112131415161718192021222324252627282930
-- Index on (user_id, status, created_at)CREATE INDEX idx_orders_composite ON orders(user_id, status, created_at); -- ✅ GOOD: Uses full indexSELECT * FROM orders WHERE user_id = 123 AND status = 'completed' AND created_at > '2024-01-01'; -- ✅ GOOD: Uses first two columnsSELECT * FROM orders WHERE user_id = 123 AND status = 'completed'; -- ✅ GOOD: Uses first columnSELECT * FROM orders WHERE user_id = 123; -- ❌ BAD: Skips user_id, cannot use indexSELECT * FROM orders WHERE status = 'completed';-- Needs separate index on (status) or (status, created_at) -- ⚠️ PARTIAL: Uses user_id only, created_at after rangeSELECT * FROM orders WHERE user_id > 100 -- Range condition AND status = 'completed';-- The status = 'completed' cannot use index efficiently-- because user_id is a RANGE, not an equalityOnce a range condition (>, <, >=, <=, BETWEEN, LIKE 'prefix%') is encountered, subsequent columns cannot use index positioning. For WHERE A > 1 AND B = 2, the index can efficiently find 'A > 1', but within that range, B values are scattered. This is why column order matters so much.
Choosing the correct column order is the most important decision in composite index design. The same columns in different orders create fundamentally different indexes.
General Ordering Principles:
Example: Designing Index Order
Consider an orders table with queries:
-- Query 1: Most common
SELECT * FROM orders
WHERE user_id = ? AND status = 'active' AND created_at > ?;
-- Query 2: Common
SELECT * FROM orders
WHERE user_id = ? AND status IN ('active', 'pending');
-- Query 3: Less common
SELECT * FROM orders
WHERE status = 'processing' AND created_at > ?;
Analysis:
user_id: Equality in Q1, Q2. High cardinality (many users)status: Equality in Q1, Q2, Q3. Low cardinality (few statuses)created_at: Range in Q1, Q3Optimal Index: (user_id, status, created_at)
For Q3, consider an additional index: (status, created_at)
You often can't optimize for all query patterns with one index. Identify your most critical queries (by frequency and latency sensitivity) and optimize for those. Less common queries may need separate indexes or can tolerate slower execution.
A covering index (also called an index-only scan index) contains all the columns needed to satisfy a query, eliminating the need to access the main table (heap).
Why Covering Indexes Are Fast:
Normal indexed query:
Covering index query:
The heap access step often involves random I/O—each row pointer might point to a different disk location. Eliminating this step can provide 2-10x performance improvement for read-heavy workloads.
123456789101112131415161718192021222324252627282930
-- Query we want to optimizeSELECT order_id, status, total_amount FROM orders WHERE user_id = 123 AND created_at > '2024-01-01'; -- Non-covering index (requires heap access)CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);-- Index has: user_id, created_at, row_pointer-- Must access heap to get: order_id, status, total_amount -- Covering index (no heap access needed)CREATE INDEX idx_orders_covering ON orders(user_id, created_at, order_id, status, total_amount);-- Index has ALL columns needed by the query!-- Result returned directly from index -- PostgreSQL: Check if index-only scan is usedEXPLAIN (ANALYZE, BUFFERS) SELECT order_id, status, total_amount FROM orders WHERE user_id = 123 AND created_at > '2024-01-01';-- Look for "Index Only Scan" in the output -- PostgreSQL 11+: Use INCLUDE for covering columns-- These columns are stored in leaf nodes but not indexedCREATE INDEX idx_orders_include ON orders(user_id, created_at) INCLUDE (order_id, status, total_amount);-- Better: smaller index, same covering benefitINCLUDE Clause (PostgreSQL 11+, SQL Server):
The INCLUDE clause adds columns to leaf nodes without including them in the index key:
When to Use Covering Indexes:
Covering indexes duplicate data from the table into the index. Each additional covering column increases index size, write overhead, and memory consumption. Only add covering columns for critical, high-frequency queries where the read benefit outweighs the write cost.
When no single index can satisfy a query, databases may combine multiple indexes through index intersection or index union operations.
Index Intersection (AND conditions):
For a query like:
SELECT * FROM products
WHERE category = 'Electronics' AND brand = 'Apple';
With separate indexes on category and brand, the database might:
category, collect row IDs for 'Electronics'brand, collect row IDs for 'Apple'Index Union (OR conditions):
For a query like:
SELECT * FROM products
WHERE category = 'Electronics' OR brand = 'Apple';
The database might:
category, collect row IDsbrand, collect row IDs| Approach | Storage | AND Query | OR Query | Single-Column Query |
|---|---|---|---|---|
| Composite (A, B) | 1 index | ✅ Optimal | ❌ Cannot use | ✅ A only; ❌ B only |
| Separate A, B | 2 indexes | ⚠️ Intersection | ⚠️ Union | ✅ Either column |
| Both approaches | 3 indexes | ✅ Optimal | ⚠️ Union | ✅ Any column |
When Index Intersection is Expensive:
Index intersection sounds clever, but it has costs:
For many combinations, a composite index outperforms intersection:
-- With composite index (category, brand)
SELECT * FROM products
WHERE category = 'Electronics' AND brand = 'Apple';
-- One index seek, direct to matching rows
-- With separate indexes (requires intersection)
-- Two index scans + intersection + heap fetches
-- Often 2-5x slower than composite
When Intersection is Acceptable:
For frequently executed queries with multi-column conditions, a composite index almost always outperforms index intersection. Reserve intersection for ad-hoc queries or cases where you truly can't predict the column combinations users will filter on.
Composite indexes can eliminate the need for sorting (filesort) in ORDER BY queries—but only when the index order matches the required sort order.
The Sorting Rule:
A composite index can satisfy ORDER BY if:
Example Analysis:
123456789101112131415161718192021222324252627282930313233343536
-- Index on (user_id, status, created_at)CREATE INDEX idx_orders ON orders(user_id, status, created_at); -- ✅ GOOD: Order by prefix of unused index columnsSELECT * FROM orders WHERE user_id = 123 AND status = 'active' ORDER BY created_at;-- Index order: [..., 123, active, <sorted by created_at>]-- No sort needed! -- ✅ GOOD: Equality columns + ORDER BY continuationSELECT * FROM orders WHERE user_id = 123 ORDER BY status, created_at;-- Index order after user_id = 123: sorted by (status, created_at)-- No sort needed! -- ❌ BAD: ORDER BY skips statusSELECT * FROM orders WHERE user_id = 123 ORDER BY created_at;-- Index order after user_id = 123: (status, created_at)-- created_at is NOT sorted within user_id alone!-- Requires filesort -- ❌ BAD: Mixed sort directionsSELECT * FROM orders WHERE user_id = 123 ORDER BY status ASC, created_at DESC;-- Index stores (status, created_at) both ascending-- Cannot satisfy mixed directions with standard index -- ✅ SOLUTION: Create index with matching directionsCREATE INDEX idx_orders_mixed ON orders(user_id, status ASC, created_at DESC);-- Now the mixed ORDER BY can use the indexUnderstanding Sort Direction in Indexes:
By default, indexes store values in ascending order. For descending sorts, the database can read the index backward—but this works only if all columns need the same direction flip.
| Index Columns | ORDER BY | Can Use Index? |
|---|---|---|
| (A, B) | A ASC, B ASC | ✅ Forward scan |
| (A, B) | A DESC, B DESC | ✅ Backward scan |
| (A, B) | A ASC, B DESC | ❌ Mixed directions |
| (A ASC, B DESC) | A ASC, B DESC | ✅ Forward scan |
| (A ASC, B DESC) | A DESC, B ASC | ✅ Backward scan |
Practical Implications:
The most common mixed-direction pattern is:
ORDER BY date DESC, id ASC -- Latest first, stable sort by ID
To avoid sort operations, create:
CREATE INDEX idx_timeline ON table(date DESC, id ASC);
When ORDER BY uses an index and LIMIT is specified, the database can stop reading after finding enough rows. For 'SELECT ... ORDER BY created_at DESC LIMIT 10', only 10 index entries are read. Without an index, all matching rows must be sorted, then truncated. This is why index-supported ORDER BY + LIMIT is dramatically faster.
Composite indexes play a crucial role in optimizing JOIN operations, especially when join conditions involve multiple columns or when filtering is applied to joined tables.
Foreign Key Indexes:
The most basic join optimization is indexing foreign keys:
-- Parent table
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
created_at TIMESTAMP
);
-- Foreign key index on orders.user_id
CREATE INDEX idx_orders_user ON orders(user_id);
-- Join uses the index
SELECT * FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.id = 123;
Without the foreign key index, the database must scan the entire orders table for each user.
Compound Join Keys:
When joins involve multiple columns, composite indexes become essential:
-- Join on two columns
SELECT * FROM order_items oi
JOIN products p ON p.category = oi.category
AND p.product_id = oi.product_id;
-- Composite index for the join
CREATE INDEX idx_products_join
ON products(category, product_id);
-- Without this index, every order_item row causes a full scan
Filter + Join Optimization:
Often, joins are accompanied by filters. The optimal index must consider both:
-- Common pattern: filter on outer table, join to inner
SELECT o.*, oi.*
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = 123 AND o.status = 'completed';
-- Index strategy:
-- 1. Index on orders(user_id, status) for the filter
-- 2. Index on order_items(order_id) for the join
-- Even better if order_items often filters:
SELECT o.*, oi.*
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = 123 AND oi.status = 'shipped';
-- Now we need:
-- orders(user_id)
-- order_items(order_id, status) or (status, order_id)
For frequently joined tables: (1) Always index foreign keys, (2) For compound join keys, use composite indexes, (3) If filters are applied to the joined table, extend the foreign key index to include filter columns, (4) Consider covering indexes if SELECT pulls specific columns from the joined table.
Certain composite index patterns recur across many applications. Recognizing these patterns accelerates index design.
(tenant_id, entity_id) — Multi-tenant systems where every query filters by tenant. Tenant_id first ensures tenant isolation uses the index.(user_id, created_at) — User-specific timelines, activity feeds, history. User_id equality, created_at for recent items.(status, updated_at) — Processing queues, job systems. Find oldest pending items: WHERE status = 'pending' ORDER BY updated_at.(type, reference_id) — Polymorphic associations. Find all 'comment' type referencing entity 42.(region, created_at) — Regional data with time filtering. Region equality, time range.(category, subcategory, product_id) — Product catalogs with browsing. Category equality, subcategory equality or empty, product for detail.1234567891011121314151617181920212223242526
-- Pattern: Tenant + Entity (SaaS applications)CREATE INDEX idx_tenant_entity ON records(tenant_id, entity_id);-- Every query includes tenant_id, making this highly effective -- Pattern: User Activity (Social, E-commerce)CREATE INDEX idx_user_activity ON events(user_id, created_at DESC);-- Efficient: "Show my recent activity"SELECT * FROM events WHERE user_id = ? ORDER BY created_at DESC LIMIT 20; -- Pattern: Status + Time (Job queues, Processing)CREATE INDEX idx_pending_oldest ON jobs(status, scheduled_at);-- Efficient: "Get oldest pending jobs to process"SELECT * FROM jobs WHERE status = 'pending' ORDER BY scheduled_at LIMIT 100; -- Pattern: Soft Delete with TimeCREATE INDEX idx_active_recent ON posts(deleted_at, created_at DESC)WHERE deleted_at IS NULL; -- Partial index (PostgreSQL)-- Efficient: "Latest non-deleted posts" -- Pattern: Uniqueness with Soft DeleteCREATE UNIQUE INDEX idx_unique_active_email ON users(email) WHERE deleted_at IS NULL;-- Allow same email to be "deleted" and re-registeredOnce you recognize that a query follows a known pattern (like tenant + entity or status + time), you can immediately apply the corresponding index pattern. This pattern-based thinking is what separates experienced DBAs from those who reinvent solutions for each query.
Even experienced engineers fall into composite index traps. Recognizing these anti-patterns helps avoid costly mistakes.
Periodically audit your indexes: Remove unused indexes (check pg_stat_user_indexes or equivalent). Identify duplicate/redundant indexes. Verify indexes align with slow query patterns. Each unnecessary index wastes storage and slows writes.
Composite indexes are essential for optimizing multi-column queries. Let's consolidate the key principles:
What's Next:
With comprehensive understanding of composite indexes, we move to the final topic in this module: Index trade-offs. You'll learn to balance read performance against write overhead, storage costs, and maintenance complexity—completing your indexing expertise.
You now possess the deep understanding of composite indexes that separates index designers from index guessers. Apply column ordering principles, consider covering indexes for critical queries, and always validate index designs against actual query patterns using EXPLAIN.