System Design (HLD)Indexes and Query Performance

Indexes and Query Performance

LevelIntermediate

Duration90 mins

TopicIndexes and Query Performance

4 / 5

Composite Indexes

Beyond Single-Column Indexes

Real-world database queries rarely filter on a single column. Consider a typical query: WHERE user_id = 123 AND created_at > '2024-01-01' AND status = 'active'. With only single-column indexes, the database has limited options—use one index and scan for the rest, or perform a bitmap intersection of multiple index scans.

Composite indexes (also called multi-column, compound, or concatenated indexes) solve this problem by indexing multiple columns together as a single unit. A well-designed composite index can transform a query that touches thousands of rows into one that retrieves exactly what's needed.

However, composite indexes come with subtleties that trip up even experienced engineers. Column order matters enormously, and the same columns in different orders create indexes with vastly different capabilities. Understanding these nuances is essential for effective index design.

What You Will Master

By the end of this page, you'll understand how composite indexes are structured internally, master the leftmost prefix rule, know how to choose optimal column order, design covering indexes, handle various query patterns, and avoid common composite index pitfalls.

Composite Index Structure

A composite index on columns (A, B, C) creates a single B-tree where the sorting key is the concatenation of all column values. This is fundamentally different from having three separate indexes.

How Composite Keys are Ordered:

Rows are sorted by A first, then by B within each A value, then by C within each (A, B) combination. This is exactly like alphabetizing names: sort by last name first, then by first name for matching last names, then by middle name for matching first and last names.

Composite key (country, city, street) sorted:
(France, Lyon, Rue A)
(France, Lyon, Rue B)
(France, Paris, Avenue C)
(France, Paris, Boulevard D)
(Germany, Berlin, Straße E)
(Germany, Munich, Weg F)
(USA, Boston, Street G)
(USA, New York, Avenue H)

Notice how:

All entries for 'France' are contiguous
Within 'France', entries for 'Lyon' are contiguous
Within 'France, Lyon', entries are sorted by street

composite-btree-structure.txt
Composite B-tree Index on (category, status, created_at)
 
                    ┌─────────────────────────────────────────┐
                    │           ROOT NODE                      │
                    │  (Books, active, 2024-01-15)            │
                    │  (Electronics, pending, 2024-02-01)     │
                    └─────────────────────────────────────────┘
                           /              |              \
                          /               |               \
            ┌────────────────┐  ┌────────────────┐  ┌────────────────┐
            │ Books section  │  │ Electronics    │  │ Food section   │
            │ (all statuses) │  │ (active/pending│  │ (all statuses) │
            └────────────────┘  └────────────────┘  └────────────────┘
                   |                    |                    |
                   ▼                    ▼                    ▼
    ┌──────────────────────────────────────────────────────────────────┐
    │                    LEAF NODES (sorted entries)                    │
    ├──────────────────────────────────────────────────────────────────┤
    │ (Books, active, 2024-01-01, row_ptr_17)                          │
    │ (Books, active, 2024-01-15, row_ptr_42)                          │
    │ (Books, active, 2024-02-28, row_ptr_8)                           │
    │ (Books, pending, 2024-01-05, row_ptr_33)                         │
    │ (Books, pending, 2024-03-10, row_ptr_51)                         │
    │ (Electronics, active, 2024-01-20, row_ptr_12)                    │
    │ ...                                                              │
    └──────────────────────────────────────────────────────────────────┘
 
Query: WHERE category = 'Books' AND status = 'active'
  → Navigate to (Books, active, *) range
  → All matching rows are contiguous!
  → Efficient range scan within the index

Key Insight: Composite Indexes Create Hierarchical Sorting

Think of a composite index as creating nested sort orders:

First level: Sort by column A
Second level: Within each A value, sort by B
Third level: Within each (A, B) pair, sort by C

This hierarchical structure is what makes composite indexes powerful—and also what creates their constraints. The hierarchy must be traversed in order; you can't skip levels.

Physical Storage

Each entry in the composite index stores the concatenated values of all indexed columns plus the row pointer. An index on (category VARCHAR(50), status VARCHAR(20), created_at TIMESTAMP) stores approximately 50 + 20 + 8 + pointer_size bytes per entry. Larger entries mean fewer entries per page and potentially taller trees.

The Leftmost Prefix Rule

The leftmost prefix rule is the most important concept in composite index design. It determines which queries can use a composite index and which cannot.

The Rule:

A composite index on (A, B, C) can be used for queries that filter on:

A
A, B
A, B, C

It cannot efficiently support queries that filter on:

B alone
C alone
B, C (without A)

Why?

Remember the hierarchical sorting. The index is sorted by A first. To find entries where B = 'some_value' (with no A constraint), you'd need to check every A value because B values for different A's are scattered throughout the index.

Query Pattern Support for Index (A, B, C)
Query Condition	Uses Index?	Explanation
WHERE A = 1	✅ Yes	Leftmost column, efficient seek
WHERE A = 1 AND B = 2	✅ Yes	Left two columns, efficient seek
WHERE A = 1 AND B = 2 AND C = 3	✅ Yes	All columns, most efficient
WHERE B = 2	❌ No	Skips A, cannot use index efficiently
WHERE C = 3	❌ No	Skips A and B, cannot use index
WHERE B = 2 AND C = 3	❌ No	Skips A, cannot use index
WHERE A = 1 AND C = 3	⚠️ Partial	Uses A, but C can't use index position
WHERE A > 1	✅ Yes	Range on leftmost, efficient range scan
WHERE A = 1 AND B > 2	✅ Yes	Equality on A, range on B
WHERE A > 1 AND B = 2	⚠️ Partial	Range on A stops B from using index

leftmost-prefix-examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Index on (user_id, status, created_at)
CREATE INDEX idx_orders_composite 
    ON orders(user_id, status, created_at);
 
-- ✅ GOOD: Uses full index
SELECT * FROM orders 
WHERE user_id = 123 
  AND status = 'completed' 
  AND created_at > '2024-01-01';
 
-- ✅ GOOD: Uses first two columns
SELECT * FROM orders 
WHERE user_id = 123 
  AND status = 'completed';
 
-- ✅ GOOD: Uses first column
SELECT * FROM orders 
WHERE user_id = 123;
 
-- ❌ BAD: Skips user_id, cannot use index
SELECT * FROM orders 
WHERE status = 'completed';
-- Needs separate index on (status) or (status, created_at)
 
-- ⚠️ PARTIAL: Uses user_id only, created_at after range
SELECT * FROM orders 
WHERE user_id > 100  -- Range condition
  AND status = 'completed';
-- The status = 'completed' cannot use index efficiently
-- because user_id is a RANGE, not an equality

The Range Stops Using the Index

Once a range condition (>, <, >=, <=, BETWEEN, LIKE 'prefix%') is encountered, subsequent columns cannot use index positioning. For WHERE A > 1 AND B = 2, the index can efficiently find 'A > 1', but within that range, B values are scattered. This is why column order matters so much.

Column Order Strategy

Choosing the correct column order is the most important decision in composite index design. The same columns in different orders create fundamentally different indexes.

General Ordering Principles:

Column Ordering Rules

•Equality conditions before range conditions: Columns used with = should come before columns used with range operators (>, <, BETWEEN). This maximizes the portion of the index that can be used.
•Higher cardinality first (within equality columns): Among columns with equality conditions, put higher cardinality columns first. This narrows down the search faster.
•Most frequently filtered columns first: If query patterns vary, prioritize columns that appear in the most queries as filtering criteria.
•Consider covering index needs: If you're building a covering index, add columns needed for SELECT after the filter columns.

Example: Designing Index Order

Consider an orders table with queries:

-- Query 1: Most common
SELECT * FROM orders 
WHERE user_id = ? AND status = 'active' AND created_at > ?;

-- Query 2: Common
SELECT * FROM orders 
WHERE user_id = ? AND status IN ('active', 'pending');

-- Query 3: Less common
SELECT * FROM orders 
WHERE status = 'processing' AND created_at > ?;

Analysis:

user_id: Equality in Q1, Q2. High cardinality (many users)
status: Equality in Q1, Q2, Q3. Low cardinality (few statuses)
created_at: Range in Q1, Q3

Optimal Index: (user_id, status, created_at)

Q1: Uses all three columns (equality, equality, range)
Q2: Uses first two columns perfectly
Q3: Cannot use this index well (starts with status)

For Q3, consider an additional index: (status, created_at)

The Trade-off Decision

You often can't optimize for all query patterns with one index. Identify your most critical queries (by frequency and latency sensitivity) and optimize for those. Less common queries may need separate indexes or can tolerate slower execution.

Covering Indexes

A covering index (also called an index-only scan index) contains all the columns needed to satisfy a query, eliminating the need to access the main table (heap).

Why Covering Indexes Are Fast:

Normal indexed query:

Traverse index to find matching entries
Extract row pointers
Access heap to read full rows (random I/O!)
Return data

Covering index query:

Traverse index to find matching entries
Return data directly from index
No heap access!

The heap access step often involves random I/O—each row pointer might point to a different disk location. Eliminating this step can provide 2-10x performance improvement for read-heavy workloads.

covering-index-examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Query we want to optimize
SELECT order_id, status, total_amount 
FROM orders 
WHERE user_id = 123 AND created_at > '2024-01-01';
 
-- Non-covering index (requires heap access)
CREATE INDEX idx_orders_user_date 
    ON orders(user_id, created_at);
-- Index has: user_id, created_at, row_pointer
-- Must access heap to get: order_id, status, total_amount
 
-- Covering index (no heap access needed)
CREATE INDEX idx_orders_covering 
    ON orders(user_id, created_at, order_id, status, total_amount);
-- Index has ALL columns needed by the query!
-- Result returned directly from index
 
-- PostgreSQL: Check if index-only scan is used
EXPLAIN (ANALYZE, BUFFERS) 
SELECT order_id, status, total_amount 
FROM orders 
WHERE user_id = 123 AND created_at > '2024-01-01';
-- Look for "Index Only Scan" in the output
 
-- PostgreSQL 11+: Use INCLUDE for covering columns
-- These columns are stored in leaf nodes but not indexed
CREATE INDEX idx_orders_include 
    ON orders(user_id, created_at) 
    INCLUDE (order_id, status, total_amount);
-- Better: smaller index, same covering benefit

INCLUDE Clause (PostgreSQL 11+, SQL Server):

The INCLUDE clause adds columns to leaf nodes without including them in the index key:

Included columns: Stored in leaf nodes, available for covering
Not part of sort key: Cannot be used for filtering or ordering
Smaller index: Less overhead than adding to the composite key
No leftmost prefix constraint: Doesn't affect which queries can use the index

When to Use Covering Indexes:

Good Candidates

•High-frequency queries with consistent SELECT clauses
•Queries selecting few additional columns
•Read-heavy workloads where write overhead is acceptable
•OLAP/reporting queries scanning many rows
•Queries returning only a few columns from wide tables

Poor Candidates

•Queries with varying SELECT clauses
•SELECT * patterns (need all columns anyway)
•Write-heavy workloads (index maintenance cost)
•Large BLOB/TEXT columns (too big for index)
•Tables with frequent schema changes

The Covering Index Cost

Covering indexes duplicate data from the table into the index. Each additional covering column increases index size, write overhead, and memory consumption. Only add covering columns for critical, high-frequency queries where the read benefit outweighs the write cost.

Index Merging and Intersection

When no single index can satisfy a query, databases may combine multiple indexes through index intersection or index union operations.

Index Intersection (AND conditions):

For a query like:

SELECT * FROM products 
WHERE category = 'Electronics' AND brand = 'Apple';

With separate indexes on category and brand, the database might:

Scan index on category, collect row IDs for 'Electronics'
Scan index on brand, collect row IDs for 'Apple'
Intersect the two sets of row IDs
Fetch only rows in the intersection

Index Union (OR conditions):

For a query like:

SELECT * FROM products 
WHERE category = 'Electronics' OR brand = 'Apple';

The database might:

Scan index on category, collect row IDs
Scan index on brand, collect row IDs
Union the sets (removing duplicates)
Fetch all rows in the union

Composite Index vs Index Intersection
Approach	Storage	AND Query	OR Query	Single-Column Query
Composite (A, B)	1 index	✅ Optimal	❌ Cannot use	✅ A only; ❌ B only
Separate A, B	2 indexes	⚠️ Intersection	⚠️ Union	✅ Either column
Both approaches	3 indexes	✅ Optimal	⚠️ Union	✅ Any column

When Index Intersection is Expensive:

Index intersection sounds clever, but it has costs:

Two index scans instead of one
Bitmap creation for row ID sets (memory intensive)
Set intersection computation (CPU intensive)
Random I/O to fetch final rows (still needed)

For many combinations, a composite index outperforms intersection:

-- With composite index (category, brand)
SELECT * FROM products 
WHERE category = 'Electronics' AND brand = 'Apple';
-- One index seek, direct to matching rows

-- With separate indexes (requires intersection)
-- Two index scans + intersection + heap fetches
-- Often 2-5x slower than composite

When Intersection is Acceptable:

Queries are infrequent (not worth a dedicated composite index)
Each condition is highly selective (small sets to intersect)
Adding more indexes would be expensive (write-heavy tables)
Query patterns vary widely (can't predict column combinations)

Prefer Composite Over Intersection

For frequently executed queries with multi-column conditions, a composite index almost always outperforms index intersection. Reserve intersection for ad-hoc queries or cases where you truly can't predict the column combinations users will filter on.

Composite Indexes and Sorting (ORDER BY)

Composite indexes can eliminate the need for sorting (filesort) in ORDER BY queries—but only when the index order matches the required sort order.

The Sorting Rule:

A composite index can satisfy ORDER BY if:

The ORDER BY columns match a prefix of the index columns
The sort direction matches for all columns (or is reversed for all)
Any WHERE equality conditions come before the ORDER BY columns in the index

Example Analysis:

composite-sorting.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- Index on (user_id, status, created_at)
CREATE INDEX idx_orders ON orders(user_id, status, created_at);
 
-- ✅ GOOD: Order by prefix of unused index columns
SELECT * FROM orders 
WHERE user_id = 123 AND status = 'active' 
ORDER BY created_at;
-- Index order: [..., 123, active, <sorted by created_at>]
-- No sort needed!
 
-- ✅ GOOD: Equality columns + ORDER BY continuation
SELECT * FROM orders 
WHERE user_id = 123 
ORDER BY status, created_at;
-- Index order after user_id = 123: sorted by (status, created_at)
-- No sort needed!
 
-- ❌ BAD: ORDER BY skips status
SELECT * FROM orders 
WHERE user_id = 123 
ORDER BY created_at;
-- Index order after user_id = 123: (status, created_at)
-- created_at is NOT sorted within user_id alone!
-- Requires filesort
 
-- ❌ BAD: Mixed sort directions
SELECT * FROM orders 
WHERE user_id = 123 
ORDER BY status ASC, created_at DESC;
-- Index stores (status, created_at) both ascending
-- Cannot satisfy mixed directions with standard index
 
-- ✅ SOLUTION: Create index with matching directions
CREATE INDEX idx_orders_mixed 
    ON orders(user_id, status ASC, created_at DESC);
-- Now the mixed ORDER BY can use the index

Understanding Sort Direction in Indexes:

By default, indexes store values in ascending order. For descending sorts, the database can read the index backward—but this works only if all columns need the same direction flip.

Index Columns	ORDER BY	Can Use Index?
(A, B)	A ASC, B ASC	✅ Forward scan
(A, B)	A DESC, B DESC	✅ Backward scan
(A, B)	A ASC, B DESC	❌ Mixed directions
(A ASC, B DESC)	A ASC, B DESC	✅ Forward scan
(A ASC, B DESC)	A DESC, B ASC	✅ Backward scan

Practical Implications:

The most common mixed-direction pattern is:

ORDER BY date DESC, id ASC  -- Latest first, stable sort by ID

To avoid sort operations, create:

CREATE INDEX idx_timeline ON table(date DESC, id ASC);

LIMIT Optimization

When ORDER BY uses an index and LIMIT is specified, the database can stop reading after finding enough rows. For 'SELECT ... ORDER BY created_at DESC LIMIT 10', only 10 index entries are read. Without an index, all matching rows must be sorted, then truncated. This is why index-supported ORDER BY + LIMIT is dramatically faster.

Multi-Table Joins and Composite Indexes

Composite indexes play a crucial role in optimizing JOIN operations, especially when join conditions involve multiple columns or when filtering is applied to joined tables.

Foreign Key Indexes:

The most basic join optimization is indexing foreign keys:

-- Parent table
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    created_at TIMESTAMP
);

-- Foreign key index on orders.user_id
CREATE INDEX idx_orders_user ON orders(user_id);

-- Join uses the index
SELECT * FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.id = 123;

Without the foreign key index, the database must scan the entire orders table for each user.

Compound Join Keys:

When joins involve multiple columns, composite indexes become essential:

-- Join on two columns
SELECT * FROM order_items oi
JOIN products p ON p.category = oi.category 
                AND p.product_id = oi.product_id;

-- Composite index for the join
CREATE INDEX idx_products_join 
    ON products(category, product_id);

-- Without this index, every order_item row causes a full scan

Filter + Join Optimization:

Often, joins are accompanied by filters. The optimal index must consider both:

-- Common pattern: filter on outer table, join to inner
SELECT o.*, oi.* 
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = 123 AND o.status = 'completed';

-- Index strategy:
-- 1. Index on orders(user_id, status) for the filter
-- 2. Index on order_items(order_id) for the join

-- Even better if order_items often filters:
SELECT o.*, oi.* 
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = 123 AND oi.status = 'shipped';

-- Now we need:
-- orders(user_id)
-- order_items(order_id, status) or (status, order_id)

Join Index Strategy

For frequently joined tables: (1) Always index foreign keys, (2) For compound join keys, use composite indexes, (3) If filters are applied to the joined table, extend the foreign key index to include filter columns, (4) Consider covering indexes if SELECT pulls specific columns from the joined table.

Common Composite Index Patterns

Certain composite index patterns recur across many applications. Recognizing these patterns accelerates index design.

Essential Composite Index Patterns

•Tenant + Entity Pattern: (tenant_id, entity_id) — Multi-tenant systems where every query filters by tenant. Tenant_id first ensures tenant isolation uses the index.
•User Activity Pattern: (user_id, created_at) — User-specific timelines, activity feeds, history. User_id equality, created_at for recent items.
•Status + Timestamp Pattern: (status, updated_at) — Processing queues, job systems. Find oldest pending items: WHERE status = 'pending' ORDER BY updated_at.
•Type + Reference Pattern: (type, reference_id) — Polymorphic associations. Find all 'comment' type referencing entity 42.
•Geo + Time Pattern: (region, created_at) — Regional data with time filtering. Region equality, time range.
•Category Hierarchy Pattern: (category, subcategory, product_id) — Product catalogs with browsing. Category equality, subcategory equality or empty, product for detail.

common-patterns.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Pattern: Tenant + Entity (SaaS applications)
CREATE INDEX idx_tenant_entity ON records(tenant_id, entity_id);
-- Every query includes tenant_id, making this highly effective
 
-- Pattern: User Activity (Social, E-commerce)
CREATE INDEX idx_user_activity ON events(user_id, created_at DESC);
-- Efficient: "Show my recent activity"
SELECT * FROM events 
WHERE user_id = ? ORDER BY created_at DESC LIMIT 20;
 
-- Pattern: Status + Time (Job queues, Processing)
CREATE INDEX idx_pending_oldest ON jobs(status, scheduled_at);
-- Efficient: "Get oldest pending jobs to process"
SELECT * FROM jobs 
WHERE status = 'pending' 
ORDER BY scheduled_at LIMIT 100;
 
-- Pattern: Soft Delete with Time
CREATE INDEX idx_active_recent ON posts(deleted_at, created_at DESC)
WHERE deleted_at IS NULL;  -- Partial index (PostgreSQL)
-- Efficient: "Latest non-deleted posts"
 
-- Pattern: Uniqueness with Soft Delete
CREATE UNIQUE INDEX idx_unique_active_email 
    ON users(email) WHERE deleted_at IS NULL;
-- Allow same email to be "deleted" and re-registered

Pattern Recognition Accelerates Design

Once you recognize that a query follows a known pattern (like tenant + entity or status + time), you can immediately apply the corresponding index pattern. This pattern-based thinking is what separates experienced DBAs from those who reinvent solutions for each query.

Composite Index Pitfalls and Anti-Patterns

Even experienced engineers fall into composite index traps. Recognizing these anti-patterns helps avoid costly mistakes.

Common Composite Index Mistakes

•Wrong column order: Putting range columns before equality columns. WHERE status = 'active' AND created_at > X needs (status, created_at), not (created_at, status).
•Too many columns: Index on (a, b, c, d, e, f) is expensive to maintain and rarely used fully. Keep composite indexes focused.
•Ignoring query patterns: Creating (a, b, c) when most queries filter only on (b, c). The index is useless for those queries.
•Duplicate indexes: Having both (a) and (a, b). The composite index covers queries that use only 'a', so (a) alone is redundant.
•Low-cardinality leading columns: Index (boolean_flag, user_id) – the boolean splits data into only 2 groups, providing minimal benefit.
•Assuming AND order matters in queries: WHERE a = 1 AND b = 2 and WHERE b = 2 AND a = 1 are equivalent to the optimizer. Only index column order matters.

Anti-Pattern Examples

•Index (created_at, user_id) for WHERE user_id = ? AND created_at > ?
•Index (is_active, email) when is_active is boolean
•Index (a, b, c) when you already have (a, b)
•Index (a, b) plus separate (a) index
•16-column composite index 'just in case'

Correct Approaches

•Index (user_id, created_at) — equality first, range second
•Index (email) WHERE is_active = true — partial index
•Remove (a, b) if (a, b, c) covers all (a, b) queries
•Remove (a) — (a, b) handles (a) queries via prefix
•Focused indexes based on actual query patterns

The Index Audit Practice

Periodically audit your indexes: Remove unused indexes (check pg_stat_user_indexes or equivalent). Identify duplicate/redundant indexes. Verify indexes align with slow query patterns. Each unnecessary index wastes storage and slows writes.

Summary: Mastering Composite Indexes

Composite indexes are essential for optimizing multi-column queries. Let's consolidate the key principles:

Key Takeaways

•Composite indexes create hierarchical sorting — (A, B, C) sorts by A, then B within A, then C within (A, B). This structure determines which queries can use the index.
•The leftmost prefix rule governs usage — Queries must match a prefix of index columns (A, or A+B, or A+B+C). Skipping leads columns prevents efficient index use.
•Column order is critical — Equality columns before range columns maximizes index utilization. High-cardinality columns first narrows search faster.
•Covering indexes eliminate heap access — Including SELECT columns in the index enables index-only scans. Use INCLUDE clause when available.
•Range conditions stop index column usage — After A > X, subsequent columns (B, C) can't benefit from index ordering. Design with this in mind.
•Composite indexes support ORDER BY — But only when WHERE equality columns + ORDER BY columns match an index prefix.
•Pattern recognition accelerates design — Tenant+entity, user+time, status+time patterns recur across systems.

What's Next:

With comprehensive understanding of composite indexes, we move to the final topic in this module: Index trade-offs. You'll learn to balance read performance against write overhead, storage costs, and maintenance complexity—completing your indexing expertise.

Composite Index Mastery Achieved

You now possess the deep understanding of composite indexes that separates index designers from index guessers. Apply column ordering principles, consider covering indexes for critical queries, and always validate index designs against actual query patterns using EXPLAIN.

4 / 5

Loading learning content...

System Design (HLD)Indexes and Query Performance

Indexes and Query Performance

LevelIntermediate

Duration90 mins

TopicIndexes and Query Performance

4 / 5

Composite Indexes

Beyond Single-Column Indexes

What You Will Master

Composite Index Structure

A composite index on columns (A, B, C) creates a single B-tree where the sorting key is the concatenation of all column values. This is fundamentally different from having three separate indexes.

How Composite Keys are Ordered:

Composite key (country, city, street) sorted:
(France, Lyon, Rue A)
(France, Lyon, Rue B)
(France, Paris, Avenue C)
(France, Paris, Boulevard D)
(Germany, Berlin, Straße E)
(Germany, Munich, Weg F)
(USA, Boston, Street G)
(USA, New York, Avenue H)

Notice how:

All entries for 'France' are contiguous
Within 'France', entries for 'Lyon' are contiguous
Within 'France, Lyon', entries are sorted by street

composite-btree-structure.txt
Composite B-tree Index on (category, status, created_at)
 
                    ┌─────────────────────────────────────────┐
                    │           ROOT NODE                      │
                    │  (Books, active, 2024-01-15)            │
                    │  (Electronics, pending, 2024-02-01)     │
                    └─────────────────────────────────────────┘
                           /              |              \
                          /               |               \
            ┌────────────────┐  ┌────────────────┐  ┌────────────────┐
            │ Books section  │  │ Electronics    │  │ Food section   │
            │ (all statuses) │  │ (active/pending│  │ (all statuses) │
            └────────────────┘  └────────────────┘  └────────────────┘
                   |                    |                    |
                   ▼                    ▼                    ▼
    ┌──────────────────────────────────────────────────────────────────┐
    │                    LEAF NODES (sorted entries)                    │
    ├──────────────────────────────────────────────────────────────────┤
    │ (Books, active, 2024-01-01, row_ptr_17)                          │
    │ (Books, active, 2024-01-15, row_ptr_42)                          │
    │ (Books, active, 2024-02-28, row_ptr_8)                           │
    │ (Books, pending, 2024-01-05, row_ptr_33)                         │
    │ (Books, pending, 2024-03-10, row_ptr_51)                         │
    │ (Electronics, active, 2024-01-20, row_ptr_12)                    │
    │ ...                                                              │
    └──────────────────────────────────────────────────────────────────┘
 
Query: WHERE category = 'Books' AND status = 'active'
  → Navigate to (Books, active, *) range
  → All matching rows are contiguous!
  → Efficient range scan within the index

Key Insight: Composite Indexes Create Hierarchical Sorting

Think of a composite index as creating nested sort orders:

First level: Sort by column A
Second level: Within each A value, sort by B
Third level: Within each (A, B) pair, sort by C

This hierarchical structure is what makes composite indexes powerful—and also what creates their constraints. The hierarchy must be traversed in order; you can't skip levels.

Physical Storage

The Leftmost Prefix Rule

The leftmost prefix rule is the most important concept in composite index design. It determines which queries can use a composite index and which cannot.

The Rule:

A composite index on (A, B, C) can be used for queries that filter on:

A
A, B
A, B, C

It cannot efficiently support queries that filter on:

B alone
C alone
B, C (without A)

Why?

Query Pattern Support for Index (A, B, C)
Query Condition	Uses Index?	Explanation
WHERE A = 1	✅ Yes	Leftmost column, efficient seek
WHERE A = 1 AND B = 2	✅ Yes	Left two columns, efficient seek
WHERE A = 1 AND B = 2 AND C = 3	✅ Yes	All columns, most efficient
WHERE B = 2	❌ No	Skips A, cannot use index efficiently
WHERE C = 3	❌ No	Skips A and B, cannot use index
WHERE B = 2 AND C = 3	❌ No	Skips A, cannot use index
WHERE A = 1 AND C = 3	⚠️ Partial	Uses A, but C can't use index position
WHERE A > 1	✅ Yes	Range on leftmost, efficient range scan
WHERE A = 1 AND B > 2	✅ Yes	Equality on A, range on B
WHERE A > 1 AND B = 2	⚠️ Partial	Range on A stops B from using index

leftmost-prefix-examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Index on (user_id, status, created_at)
CREATE INDEX idx_orders_composite 
    ON orders(user_id, status, created_at);
 
-- ✅ GOOD: Uses full index
SELECT * FROM orders 
WHERE user_id = 123 
  AND status = 'completed' 
  AND created_at > '2024-01-01';
 
-- ✅ GOOD: Uses first two columns
SELECT * FROM orders 
WHERE user_id = 123 
  AND status = 'completed';
 
-- ✅ GOOD: Uses first column
SELECT * FROM orders 
WHERE user_id = 123;
 
-- ❌ BAD: Skips user_id, cannot use index
SELECT * FROM orders 
WHERE status = 'completed';
-- Needs separate index on (status) or (status, created_at)
 
-- ⚠️ PARTIAL: Uses user_id only, created_at after range
SELECT * FROM orders 
WHERE user_id > 100  -- Range condition
  AND status = 'completed';
-- The status = 'completed' cannot use index efficiently
-- because user_id is a RANGE, not an equality

The Range Stops Using the Index

Column Order Strategy

Choosing the correct column order is the most important decision in composite index design. The same columns in different orders create fundamentally different indexes.

General Ordering Principles:

Column Ordering Rules

•Equality conditions before range conditions: Columns used with = should come before columns used with range operators (>, <, BETWEEN). This maximizes the portion of the index that can be used.
•Higher cardinality first (within equality columns): Among columns with equality conditions, put higher cardinality columns first. This narrows down the search faster.
•Most frequently filtered columns first: If query patterns vary, prioritize columns that appear in the most queries as filtering criteria.
•Consider covering index needs: If you're building a covering index, add columns needed for SELECT after the filter columns.

Example: Designing Index Order

Consider an orders table with queries:

-- Query 1: Most common
SELECT * FROM orders 
WHERE user_id = ? AND status = 'active' AND created_at > ?;

-- Query 2: Common
SELECT * FROM orders 
WHERE user_id = ? AND status IN ('active', 'pending');

-- Query 3: Less common
SELECT * FROM orders 
WHERE status = 'processing' AND created_at > ?;

Analysis:

user_id: Equality in Q1, Q2. High cardinality (many users)
status: Equality in Q1, Q2, Q3. Low cardinality (few statuses)
created_at: Range in Q1, Q3

Optimal Index: (user_id, status, created_at)

Q1: Uses all three columns (equality, equality, range)
Q2: Uses first two columns perfectly
Q3: Cannot use this index well (starts with status)

For Q3, consider an additional index: (status, created_at)

The Trade-off Decision

Covering Indexes

A covering index (also called an index-only scan index) contains all the columns needed to satisfy a query, eliminating the need to access the main table (heap).

Why Covering Indexes Are Fast:

Normal indexed query:

Traverse index to find matching entries
Extract row pointers
Access heap to read full rows (random I/O!)
Return data

Covering index query:

Traverse index to find matching entries
Return data directly from index
No heap access!

covering-index-examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Query we want to optimize
SELECT order_id, status, total_amount 
FROM orders 
WHERE user_id = 123 AND created_at > '2024-01-01';
 
-- Non-covering index (requires heap access)
CREATE INDEX idx_orders_user_date 
    ON orders(user_id, created_at);
-- Index has: user_id, created_at, row_pointer
-- Must access heap to get: order_id, status, total_amount
 
-- Covering index (no heap access needed)
CREATE INDEX idx_orders_covering 
    ON orders(user_id, created_at, order_id, status, total_amount);
-- Index has ALL columns needed by the query!
-- Result returned directly from index
 
-- PostgreSQL: Check if index-only scan is used
EXPLAIN (ANALYZE, BUFFERS) 
SELECT order_id, status, total_amount 
FROM orders 
WHERE user_id = 123 AND created_at > '2024-01-01';
-- Look for "Index Only Scan" in the output
 
-- PostgreSQL 11+: Use INCLUDE for covering columns
-- These columns are stored in leaf nodes but not indexed
CREATE INDEX idx_orders_include 
    ON orders(user_id, created_at) 
    INCLUDE (order_id, status, total_amount);
-- Better: smaller index, same covering benefit

INCLUDE Clause (PostgreSQL 11+, SQL Server):

The INCLUDE clause adds columns to leaf nodes without including them in the index key:

Included columns: Stored in leaf nodes, available for covering
Not part of sort key: Cannot be used for filtering or ordering
Smaller index: Less overhead than adding to the composite key
No leftmost prefix constraint: Doesn't affect which queries can use the index

When to Use Covering Indexes:

Good Candidates

•High-frequency queries with consistent SELECT clauses
•Queries selecting few additional columns
•Read-heavy workloads where write overhead is acceptable
•OLAP/reporting queries scanning many rows
•Queries returning only a few columns from wide tables

Poor Candidates

•Queries with varying SELECT clauses
•SELECT * patterns (need all columns anyway)
•Write-heavy workloads (index maintenance cost)
•Large BLOB/TEXT columns (too big for index)
•Tables with frequent schema changes

The Covering Index Cost

Index Merging and Intersection

When no single index can satisfy a query, databases may combine multiple indexes through index intersection or index union operations.

Index Intersection (AND conditions):

For a query like:

SELECT * FROM products 
WHERE category = 'Electronics' AND brand = 'Apple';

With separate indexes on category and brand, the database might:

Scan index on category, collect row IDs for 'Electronics'
Scan index on brand, collect row IDs for 'Apple'
Intersect the two sets of row IDs
Fetch only rows in the intersection

Index Union (OR conditions):

For a query like:

SELECT * FROM products 
WHERE category = 'Electronics' OR brand = 'Apple';

The database might:

Scan index on category, collect row IDs
Scan index on brand, collect row IDs
Union the sets (removing duplicates)
Fetch all rows in the union

Composite Index vs Index Intersection
Approach	Storage	AND Query	OR Query	Single-Column Query
Composite (A, B)	1 index	✅ Optimal	❌ Cannot use	✅ A only; ❌ B only
Separate A, B	2 indexes	⚠️ Intersection	⚠️ Union	✅ Either column
Both approaches	3 indexes	✅ Optimal	⚠️ Union	✅ Any column

When Index Intersection is Expensive:

Index intersection sounds clever, but it has costs:

Two index scans instead of one
Bitmap creation for row ID sets (memory intensive)
Set intersection computation (CPU intensive)
Random I/O to fetch final rows (still needed)

For many combinations, a composite index outperforms intersection:

-- With composite index (category, brand)
SELECT * FROM products 
WHERE category = 'Electronics' AND brand = 'Apple';
-- One index seek, direct to matching rows

-- With separate indexes (requires intersection)
-- Two index scans + intersection + heap fetches
-- Often 2-5x slower than composite

When Intersection is Acceptable:

Queries are infrequent (not worth a dedicated composite index)
Each condition is highly selective (small sets to intersect)
Adding more indexes would be expensive (write-heavy tables)
Query patterns vary widely (can't predict column combinations)

Prefer Composite Over Intersection

Composite Indexes and Sorting (ORDER BY)

Composite indexes can eliminate the need for sorting (filesort) in ORDER BY queries—but only when the index order matches the required sort order.

The Sorting Rule:

A composite index can satisfy ORDER BY if:

The ORDER BY columns match a prefix of the index columns
The sort direction matches for all columns (or is reversed for all)
Any WHERE equality conditions come before the ORDER BY columns in the index

Example Analysis:

composite-sorting.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- Index on (user_id, status, created_at)
CREATE INDEX idx_orders ON orders(user_id, status, created_at);
 
-- ✅ GOOD: Order by prefix of unused index columns
SELECT * FROM orders 
WHERE user_id = 123 AND status = 'active' 
ORDER BY created_at;
-- Index order: [..., 123, active, <sorted by created_at>]
-- No sort needed!
 
-- ✅ GOOD: Equality columns + ORDER BY continuation
SELECT * FROM orders 
WHERE user_id = 123 
ORDER BY status, created_at;
-- Index order after user_id = 123: sorted by (status, created_at)
-- No sort needed!
 
-- ❌ BAD: ORDER BY skips status
SELECT * FROM orders 
WHERE user_id = 123 
ORDER BY created_at;
-- Index order after user_id = 123: (status, created_at)
-- created_at is NOT sorted within user_id alone!
-- Requires filesort
 
-- ❌ BAD: Mixed sort directions
SELECT * FROM orders 
WHERE user_id = 123 
ORDER BY status ASC, created_at DESC;
-- Index stores (status, created_at) both ascending
-- Cannot satisfy mixed directions with standard index
 
-- ✅ SOLUTION: Create index with matching directions
CREATE INDEX idx_orders_mixed 
    ON orders(user_id, status ASC, created_at DESC);
-- Now the mixed ORDER BY can use the index

Understanding Sort Direction in Indexes:

By default, indexes store values in ascending order. For descending sorts, the database can read the index backward—but this works only if all columns need the same direction flip.

Index Columns	ORDER BY	Can Use Index?
(A, B)	A ASC, B ASC	✅ Forward scan
(A, B)	A DESC, B DESC	✅ Backward scan
(A, B)	A ASC, B DESC	❌ Mixed directions
(A ASC, B DESC)	A ASC, B DESC	✅ Forward scan
(A ASC, B DESC)	A DESC, B ASC	✅ Backward scan

Practical Implications:

The most common mixed-direction pattern is:

ORDER BY date DESC, id ASC  -- Latest first, stable sort by ID

To avoid sort operations, create:

CREATE INDEX idx_timeline ON table(date DESC, id ASC);

LIMIT Optimization

Multi-Table Joins and Composite Indexes

Composite indexes play a crucial role in optimizing JOIN operations, especially when join conditions involve multiple columns or when filtering is applied to joined tables.

Foreign Key Indexes:

The most basic join optimization is indexing foreign keys:

-- Parent table
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    created_at TIMESTAMP
);

-- Foreign key index on orders.user_id
CREATE INDEX idx_orders_user ON orders(user_id);

-- Join uses the index
SELECT * FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.id = 123;

Without the foreign key index, the database must scan the entire orders table for each user.

Compound Join Keys:

When joins involve multiple columns, composite indexes become essential:

-- Join on two columns
SELECT * FROM order_items oi
JOIN products p ON p.category = oi.category 
                AND p.product_id = oi.product_id;

-- Composite index for the join
CREATE INDEX idx_products_join 
    ON products(category, product_id);

-- Without this index, every order_item row causes a full scan

Filter + Join Optimization:

Often, joins are accompanied by filters. The optimal index must consider both:

-- Common pattern: filter on outer table, join to inner
SELECT o.*, oi.* 
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = 123 AND o.status = 'completed';

-- Index strategy:
-- 1. Index on orders(user_id, status) for the filter
-- 2. Index on order_items(order_id) for the join

-- Even better if order_items often filters:
SELECT o.*, oi.* 
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = 123 AND oi.status = 'shipped';

-- Now we need:
-- orders(user_id)
-- order_items(order_id, status) or (status, order_id)

Join Index Strategy

Common Composite Index Patterns

Certain composite index patterns recur across many applications. Recognizing these patterns accelerates index design.

Essential Composite Index Patterns

•Tenant + Entity Pattern: (tenant_id, entity_id) — Multi-tenant systems where every query filters by tenant. Tenant_id first ensures tenant isolation uses the index.
•User Activity Pattern: (user_id, created_at) — User-specific timelines, activity feeds, history. User_id equality, created_at for recent items.
•Status + Timestamp Pattern: (status, updated_at) — Processing queues, job systems. Find oldest pending items: WHERE status = 'pending' ORDER BY updated_at.
•Type + Reference Pattern: (type, reference_id) — Polymorphic associations. Find all 'comment' type referencing entity 42.
•Geo + Time Pattern: (region, created_at) — Regional data with time filtering. Region equality, time range.
•Category Hierarchy Pattern: (category, subcategory, product_id) — Product catalogs with browsing. Category equality, subcategory equality or empty, product for detail.

common-patterns.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Pattern: Tenant + Entity (SaaS applications)
CREATE INDEX idx_tenant_entity ON records(tenant_id, entity_id);
-- Every query includes tenant_id, making this highly effective
 
-- Pattern: User Activity (Social, E-commerce)
CREATE INDEX idx_user_activity ON events(user_id, created_at DESC);
-- Efficient: "Show my recent activity"
SELECT * FROM events 
WHERE user_id = ? ORDER BY created_at DESC LIMIT 20;
 
-- Pattern: Status + Time (Job queues, Processing)
CREATE INDEX idx_pending_oldest ON jobs(status, scheduled_at);
-- Efficient: "Get oldest pending jobs to process"
SELECT * FROM jobs 
WHERE status = 'pending' 
ORDER BY scheduled_at LIMIT 100;
 
-- Pattern: Soft Delete with Time
CREATE INDEX idx_active_recent ON posts(deleted_at, created_at DESC)
WHERE deleted_at IS NULL;  -- Partial index (PostgreSQL)
-- Efficient: "Latest non-deleted posts"
 
-- Pattern: Uniqueness with Soft Delete
CREATE UNIQUE INDEX idx_unique_active_email 
    ON users(email) WHERE deleted_at IS NULL;
-- Allow same email to be "deleted" and re-registered

Pattern Recognition Accelerates Design

Composite Index Pitfalls and Anti-Patterns

Even experienced engineers fall into composite index traps. Recognizing these anti-patterns helps avoid costly mistakes.

Common Composite Index Mistakes

•Wrong column order: Putting range columns before equality columns. WHERE status = 'active' AND created_at > X needs (status, created_at), not (created_at, status).
•Too many columns: Index on (a, b, c, d, e, f) is expensive to maintain and rarely used fully. Keep composite indexes focused.
•Ignoring query patterns: Creating (a, b, c) when most queries filter only on (b, c). The index is useless for those queries.
•Duplicate indexes: Having both (a) and (a, b). The composite index covers queries that use only 'a', so (a) alone is redundant.
•Low-cardinality leading columns: Index (boolean_flag, user_id) – the boolean splits data into only 2 groups, providing minimal benefit.
•Assuming AND order matters in queries: WHERE a = 1 AND b = 2 and WHERE b = 2 AND a = 1 are equivalent to the optimizer. Only index column order matters.

Anti-Pattern Examples

•Index (created_at, user_id) for WHERE user_id = ? AND created_at > ?
•Index (is_active, email) when is_active is boolean
•Index (a, b, c) when you already have (a, b)
•Index (a, b) plus separate (a) index
•16-column composite index 'just in case'

Correct Approaches

•Index (user_id, created_at) — equality first, range second
•Index (email) WHERE is_active = true — partial index
•Remove (a, b) if (a, b, c) covers all (a, b) queries
•Remove (a) — (a, b) handles (a) queries via prefix
•Focused indexes based on actual query patterns

The Index Audit Practice

Summary: Mastering Composite Indexes

Composite indexes are essential for optimizing multi-column queries. Let's consolidate the key principles:

Key Takeaways

•Composite indexes create hierarchical sorting — (A, B, C) sorts by A, then B within A, then C within (A, B). This structure determines which queries can use the index.
•The leftmost prefix rule governs usage — Queries must match a prefix of index columns (A, or A+B, or A+B+C). Skipping leads columns prevents efficient index use.
•Column order is critical — Equality columns before range columns maximizes index utilization. High-cardinality columns first narrows search faster.
•Covering indexes eliminate heap access — Including SELECT columns in the index enables index-only scans. Use INCLUDE clause when available.
•Range conditions stop index column usage — After A > X, subsequent columns (B, C) can't benefit from index ordering. Design with this in mind.
•Composite indexes support ORDER BY — But only when WHERE equality columns + ORDER BY columns match an index prefix.
•Pattern recognition accelerates design — Tenant+entity, user+time, status+time patterns recur across systems.

What's Next:

Composite Index Mastery Achieved

4 / 5