Database Management SystemsIndex Types

Index Types: Primary, Secondary, Unique, and Composite Indexes

LevelIntermediate

Duration60 mins

TopicIndex Types

4 / 5

Composite Index: Multi-Column Query Optimization

Beyond Single-Column Indexes

Real-world queries rarely filter on a single column. Searches for orders involve customer ID and date ranges. Product lookups combine category with price filters. User authentication checks email and password hash. To optimize these multi-predicate queries, we need composite indexes—indexes built on multiple columns simultaneously.

A composite index (also called multi-column index, concatenated index, or compound index) stores a single B+-tree sorted by a combination of column values. The column order within this index is absolutely critical—it determines which queries can leverage the index effectively. Mastering composite index design is perhaps the single most impactful skill for database performance tuning.

What You Will Learn

By the end of this page, you will understand how composite indexes organize multi-column data, the critical leftmost prefix rule that governs index usability, strategies for optimal column ordering, how indexes interact with ORDER BY and GROUP BY, the concept of covering indexes for index-only scans, and advanced techniques for serving multiple query patterns.

Composite Index Structure

A composite index is a B+-tree index where each entry is a tuple of values from multiple columns. The entries are sorted lexicographically—like words in a dictionary—first by the first column, then by the second within ties, then by the third, and so on.

Lexicographic ordering:

Consider an index on (last_name, first_name, birth_date):

All 'Adams' come before all 'Baker'
Within 'Adams', ordered by first_name: 'Adams, Alice' before 'Adams, Bob'
Within 'Adams, Alice', ordered by birth_date

This ordering has profound implications for which queries can use the index.

composite_index_structure.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Composite Index on (last_name, first_name, birth_date)
 
B+-tree Index Structure:
                            ┌─────────────────────────────────┐
                            │            Root Node            │
                            │  (Baker, Alice, ...)            │
                            └───────────────┬─────────────────┘
                                            │
                    ┌───────────────────────┴───────────────────────┐
                    ▼                                               ▼
    ┌───────────────────────────────┐           ┌───────────────────────────────┐
    │         Left Subtree          │           │        Right Subtree          │
    │ All names < (Baker, Alice)    │           │ All names ≥ (Baker, Alice)    │
    └───────────────────────────────┘           └───────────────────────────────┘
 
Leaf Level (sorted entries):
┌─────────────────────────────────────────────────────────────────────────────────┐
│ (Adams, Alice, 1985-03-15) → Row1                                              │
│ (Adams, Alice, 1990-07-22) → Row7   ← Same name, different dates               │
│ (Adams, Bob, 1988-11-30) → Row3                                                 │
│ (Adams, Carol, 1982-01-05) → Row5                                               │
│ (Baker, Alice, 1995-09-12) → Row2                                              │
│ (Baker, David, 1978-04-18) → Row4                                              │
│ (Chen, Alice, 1992-06-28) → Row6                                               │
│ ...                                                                             │
└─────────────────────────────────────────────────────────────────────────────────┘
 
Sort Order:
1. Primary sort: last_name (alphabetically)
2. Secondary sort: first_name (within same last_name)
3. Tertiary sort: birth_date (within same last_name + first_name)

Index Key Composition

The composite key is treated as a single sortable value. Just as strings 'abc' < 'abd' < 'xyz', tuples compare element-by-element: (1, 5) < (1, 7) < (2, 3). This lexicographic comparison enables binary search on the combined key.

The Leftmost Prefix Rule

The leftmost prefix rule is the most important concept in composite index usage. It states:

A composite index on (A, B, C) can efficiently serve queries that filter on:

(A) alone

(A, B) together

(A, B, C) together

But NOT queries filtering only on:

(B) alone

(C) alone

(B, C) together

Why this matters:

The index is sorted first by column A. To find entries where B = 5, you'd need to scan the entire index—entries with B = 5 are scattered across all values of A. But to find A = 'Smith', you can binary search directly to that section.

Think of it like a phone book sorted by (last_name, first_name). You can quickly find all "Smiths", or "Smith, John" specifically. But finding all "Johns" (without knowing the last name) requires reading every page.

Leftmost Prefix Rule for Index (A, B, C)
Query Filter	Uses Index?	Why
WHERE A = 1	✅ Yes	A is the first column, binary search works
WHERE A = 1 AND B = 2	✅ Yes	Leftmost prefix (A, B)
WHERE A = 1 AND B = 2 AND C = 3	✅ Yes	Complete key match
WHERE B = 2	❌ No*	B is not leftmost, cannot narrow search
WHERE C = 3	❌ No*	C is not leftmost
WHERE B = 2 AND C = 3	❌ No*	Neither B nor C is leftmost
WHERE A = 1 AND C = 3	⚠️ Partial	Uses index for A=1, then scans for C=3

leftmost_examples.sql
Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Index definition:
CREATE INDEX idx_orders ON orders (customer_id, order_date, status);
 
-- Query 1: Uses index perfectly (complete prefix)
SELECT * FROM orders
WHERE customer_id = 100 AND order_date = '2024-01-15' AND status = 'shipped';
-- Access path: Index seek on (100, '2024-01-15', 'shipped')
 
-- Query 2: Uses index (partial prefix)
SELECT * FROM orders
WHERE customer_id = 100 AND order_date BETWEEN '2024-01-01' AND '2024-01-31';
-- Access path: Index range scan for customer_id=100 within date range
 
-- Query 3: Uses index (leftmost column only)
SELECT * FROM orders
WHERE customer_id = 100;
-- Access path: Index seek on customer_id=100, scan all dates/statuses
 
-- Query 4: CANNOT use index efficiently
SELECT * FROM orders
WHERE order_date = '2024-01-15';
-- Access path: Full table scan (or index scan checking every entry)
-- order_date is not the leftmost column!
 
-- Query 5: Partial index use
SELECT * FROM orders
WHERE customer_id = 100 AND status = 'shipped';
-- Access path: Index seek on customer_id=100, filter status (skips date)
-- Index helps with customer_id, but not status (due to "gap")

Column Gaps Break Index

In an index (A, B, C), a query on A and C (skipping B) can only use the index for the A filter. Once you skip a column in the prefix, subsequent columns cannot leverage the index's sort order for seeking—though they may still be used for filtering after the seek.

Column Order Strategy

Given the leftmost prefix rule, the order of columns in a composite index is a critical design decision. Here are strategic principles for optimal column ordering:

Principle 1: Equality Predicates First

Columns used in equality comparisons (=) should come before columns used in range comparisons (<, >, BETWEEN, LIKE 'prefix%').

Why: After an equality match, the index can still use the sorted order of subsequent columns. But after a range scan starts, subsequent columns are scrambled within the range.

Principle 2: High Selectivity First (Sometimes)

A common guideline is to place more selective columns (columns that filter out more rows) first. However, this is secondary to the equality-before-range principle.

column_order_example.sql
Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Common query pattern:
SELECT * FROM orders
WHERE customer_id = 100                    -- Equality
  AND order_date BETWEEN '2024-01-01'      -- Range
                  AND '2024-03-31'
  AND status = 'shipped';                  -- Equality (but after range!)
 
-- Index Option 1: (customer_id, order_date, status)
-- customer_id = 100      → Seek to customer 100's section
-- order_date BETWEEN     → Range scan within Q1 2024
-- status = 'shipped'     → Filter applied (index not helping status)
-- Result: Good, but status filter is post-range
 
-- Index Option 2: (customer_id, status, order_date)
-- customer_id = 100      → Seek to customer 100's section
-- status = 'shipped'     → Further narrow to 'shipped' orders
-- order_date BETWEEN     → Range scan within shipped orders
-- Result: BETTER! All three columns contribute to seek/scan
 
-- Column order matters:
-- (customer_id, order_date, status) - 2 columns fully used
-- (customer_id, status, order_date) - 3 columns fully used
 
-- General pattern: Equality columns, then Range column last
CREATE INDEX idx_orders_optimized 
ON orders (customer_id, status, order_date);

Column Ordering Principles

•Equality before range: Columns with = conditions first, columns with <, >, BETWEEN last
•Most queries first: The column that appears in most queries should be leftmost (maximizes prefix reuse)
•Consider cardinality: Among equality columns, higher cardinality (more distinct values) slightly preferred first
•ORDER BY integration: If queries include ORDER BY, consider placing those columns after WHERE columns
•Write patterns matter: If a column is updated frequently, placing it later reduces index churn

Analyze Your Workload

There's no universally correct column order—it depends entirely on your query patterns. Collect slow query logs, identify the most frequent and expensive queries, and design indexes that serve them. A single carefully designed composite index often outperforms multiple single-column indexes.

Composite Indexes and ORDER BY / GROUP BY

Composite indexes can eliminate the need for expensive sorting operations when queries include ORDER BY or GROUP BY. The key is ensuring the index order matches the requested output order.

ORDER BY optimization:

If the index is on (A, B, C) and the query is:

ORDER BY A, B → Index can provide sorted output directly
ORDER BY A, B, C → Index can provide sorted output directly
ORDER BY B, C → Index CANNOT help (leftmost prefix rule)
ORDER BY A DESC, B ASC → Depends on database (some support mixed sort)

order_by_optimization.sql
Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Index: (customer_id, order_date)
CREATE INDEX idx_customer_orders ON orders (customer_id, order_date);
 
-- Query 1: Perfect match - no sort needed
SELECT * FROM orders
WHERE customer_id = 100
ORDER BY order_date;
-- Index provides orders for customer 100, already sorted by date!
-- EXPLAIN shows: "Using index" with no "Using filesort"
 
-- Query 2: Still works - both columns in order
SELECT * FROM orders
ORDER BY customer_id, order_date;
-- Index is sorted exactly this way - no sort operation needed
 
-- Query 3: Cannot use index for ORDER BY
SELECT * FROM orders
WHERE customer_id = 100
ORDER BY status;          -- status not in this index
-- Index used for WHERE, but filesort needed for ORDER BY
 
-- Query 4: Partial optimization
SELECT * FROM orders
WHERE customer_id = 100
ORDER BY order_date DESC;
-- Index scan in reverse order - no filesort (if DB supports backward scan)
 
-- GROUP BY works similarly:
SELECT customer_id, COUNT(*) 
FROM orders
GROUP BY customer_id;
-- Index provides rows pre-grouped by customer_id (if leading column)

ORDER BY Optimization with Index (A, B, C)
ORDER BY Clause	Sort Eliminated?	Condition
A	✅ Yes	Matches leftmost prefix
A, B	✅ Yes	Matches prefix
A, B, C	✅ Yes	Complete match
A DESC, B DESC, C DESC	✅ Yes*	*If DB supports backward scan
B, C	❌ No	Doesn't start with A
A, C	❌ No	Gap in middle (skips B)
A ASC, B DESC	⚠️ Maybe	Mixed direction requires special index

LIMIT Optimization

ORDER BY optimization is especially powerful with LIMIT. For 'SELECT ... ORDER BY date DESC LIMIT 10', the database can use an index to return the top 10 directly without sorting all rows. Without an index, it must sort every matching row, then return 10—potentially millions of times slower.

Covering Indexes: Avoiding Table Access

A covering index is not a special type of index—it's a usage pattern where a query can be satisfied entirely from the index without accessing the base table. This occurs when all columns needed by the query (in SELECT, WHERE, ORDER BY, GROUP BY) are present in the index.

Why covering indexes are powerful:

No random I/O to table: Index entries are smaller than full rows, more fit in memory
Sequential access: Index leaves are linked, enabling efficient scans
Fewer pages read: Index is more compact than full table

The performance improvement can be 10-100x for queries on large tables.

covering_index_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Consider this query:
SELECT customer_id, order_date, total_amount
FROM orders
WHERE customer_id = 100 AND order_date >= '2024-01-01';
 
-- Index Option 1: (customer_id, order_date)
-- Can use index for WHERE, but must fetch total_amount from table
-- Access path: Index seek → table lookup (row by row) → return results
 
-- Index Option 2: (customer_id, order_date, total_amount)
-- Now index COVERS the query - all needed columns in index!
-- Access path: Index seek → return results directly from index
 
CREATE INDEX idx_orders_covering 
ON orders (customer_id, order_date, total_amount);
 
-- EXPLAIN shows "Using index" or "Index Only Scan"
-- No table access required!
 
-- INCLUDE syntax (SQL Server, PostgreSQL 11+):
-- When additional columns are only for covering, not filtering:
CREATE INDEX idx_orders_covering
ON orders (customer_id, order_date)
INCLUDE (total_amount, status, shipping_date);
 
-- Benefits of INCLUDE:
-- - Included columns don't affect sort order
-- - Smaller tree (included cols only in leaf nodes)
-- - More flexible for covering multiple queries

With Table Access:

Index Seek
    │
    ▼
Row ID from index
    │
    ▼
Table Lookup (random I/O)
    │
    ▼
Fetch full row
    │
    ▼
Extract needed columns

Cost: Index traversal + N random table reads

Covering Index (No Table Access):

Index Seek
    │
    ▼
All columns in index
    │
    ▼
Return directly

Cost: Index traversal only

Savings: Often 10-100x faster

Index Size Tradeoff

Adding columns to make an index covering increases index size. A bloated index consumes more memory, has higher write overhead, and may degrade other queries. Balance covering benefits against storage and maintenance costs. INCLUDE columns help by keeping non-key columns out of internal nodes.

Index Intersection: Combining Multiple Indexes

When a query has multiple filter conditions and no single index covers them all, some databases can use index intersection—combining results from multiple indexes using set operations.

How index intersection works:

Execute each index separately, getting sets of row IDs
Intersect (AND) or union (OR) the row ID sets
Fetch the final set of rows from the table

This allows the optimizer to leverage existing single-column indexes for multi-column queries.

index_intersection.sql
Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Existing indexes:
CREATE INDEX idx_customer ON orders (customer_id);
CREATE INDEX idx_status ON orders (status);
 
-- Query with two conditions:
SELECT * FROM orders
WHERE customer_id = 100 AND status = 'shipped';
 
-- Without composite index, optimizer has choices:
 
-- Option 1: Use idx_customer, filter status
-- Scan all orders for customer 100, filter by status in memory
 
-- Option 2: Use idx_status, filter customer
-- Scan all 'shipped' orders, filter by customer in memory
 
-- Option 3: Index intersection (if supported)
-- Step 1: Get all row IDs where customer_id = 100 from idx_customer
-- Step 2: Get all row IDs where status = 'shipped' from idx_status
-- Step 3: Intersect the two sets of row IDs
-- Step 4: Fetch only the intersecting rows from table
 
-- Index intersection is typically chosen when:
-- - Both predicates are selective
-- - Neither index alone reduces result set enough
-- - A composite index doesn't exist
 
-- However, composite index usually beats intersection:
CREATE INDEX idx_customer_status ON orders (customer_id, status);
-- Single index seek, no intersection overhead

Composite Index vs. Index Intersection
Aspect	Composite Index	Index Intersection
Number of indexes	One multi-column	Multiple single-column
Query efficiency	Single index seek	Multiple seeks + merge
Storage overhead	One larger index	Multiple smaller indexes
Flexibility	Best for specific query	Covers varied query patterns
Typical performance	Faster (10x or more)	Slower but still better than scan
Database support	All databases	MySQL, PostgreSQL, SQL Server (varies)

When Intersection Makes Sense

Index intersection shines when you can't predict query patterns in advance. An OLAP system with ad-hoc queries might benefit from single-column indexes that can be combined, rather than trying to create composite indexes for every possible filter combination.

Designing Indexes for Multiple Queries

In practice, you need to design indexes that serve multiple queries efficiently while minimizing index count. Here's a systematic approach:

Step 1: Catalog your important queries

Identify the most frequent, performance-critical, or SLA-bound queries.

Step 2: Identify common column prefixes

Look for columns that appear together across queries. These are candidates for composite indexes.

Step 3: Apply the prefix-sharing principle

A single index can serve multiple queries if their needed columns are prefixes of each other.

multi_query_index.sql
Strategy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Query workload analysis:
-- Q1: WHERE customer_id = ?
-- Q2: WHERE customer_id = ? AND status = ?
-- Q3: WHERE customer_id = ? AND status = ? AND order_date = ?
-- Q4: WHERE customer_id = ? ORDER BY order_date DESC LIMIT 10
 
-- Observation: customer_id is common to all queries
 
-- Option A: Four separate indexes (wasteful!)
CREATE INDEX idx1 ON orders (customer_id);
CREATE INDEX idx2 ON orders (customer_id, status);
CREATE INDEX idx3 ON orders (customer_id, status, order_date);
CREATE INDEX idx4 ON orders (customer_id, order_date);
 
-- Option B: Strategic consolidation (better!)
-- Index 1: (customer_id, status, order_date) - serves Q1, Q2, Q3
-- Index 2: (customer_id, order_date) - serves Q1, Q4
 
-- Even better: Can we serve all with fewer indexes?
-- Q4 needs customer_id + order_date in specific order
-- Q3 needs customer_id + status + order_date
 
-- Insight: We can't fully consolidate because:
-- ORDER BY order_date needs order_date to immediately follow customer_id
 
-- Final design:
CREATE INDEX idx_cust_status_date ON orders (customer_id, status, order_date);
CREATE INDEX idx_cust_date ON orders (customer_id, order_date);
 
-- 2 indexes serve all 4 query patterns efficiently
-- idx1 is a prefix of idx_cust_status_date (Q1 served)
-- idx2 is a prefix of idx_cust_status_date (Q2 served)
-- idx3 is exactly idx_cust_status_date (Q3 served)
-- idx4 is served by idx_cust_date (Q4 served)

Index Consolidation Strategies

•Prefix sharing: If Q1 needs (A) and Q2 needs (A, B), one index on (A, B) serves both
•Extend for covering: Add non-filter columns to avoid table lookup
•Consider ORDER BY independently: ORDER BY requirements may force separate indexes
•Analyze query frequency: A seldom-used query might not warrant its own index
•Monitor and adjust: Use slow query logs to identify gaps in index coverage

Index Count Tradeoff

Every index adds write overhead. A table with 10 indexes means every INSERT updates 11 structures (table + 10 indexes). For write-heavy workloads, fewer well-designed composite indexes outperform many single-column indexes. Target 3-5 indexes per table as a starting guideline.

Summary: Composite Index Mastery

We've conducted a deep exploration of composite indexes—the multi-column structures that are essential for optimizing real-world queries that filter, sort, and aggregate across multiple attributes.

Key Takeaways

•Structure: Composite indexes store multi-column tuples in lexicographic order—sorted first by column 1, then column 2 within ties, and so on
•Leftmost prefix rule: Only queries that filter on a prefix of the index columns (starting from the leftmost) can fully leverage the index
•Column order matters: Place equality columns before range columns; consider query frequency and selectivity
•ORDER BY integration: Indexes can eliminate sorting if the sort order matches the index order
•Covering indexes: Including all query columns in the index enables index-only scans, avoiding table access
•INCLUDE clause: Non-key columns can be included for covering without affecting sort order
•Index intersection: Multiple single-column indexes can be combined, but composite indexes are usually faster
•Multi-query design: Strategically design indexes to serve multiple query patterns through prefix sharing

What's next:

With our understanding of primary, secondary, unique, and composite indexes complete, we'll tie everything together with an index classification framework—a systematic approach to categorizing and selecting the right index type for any given scenario.

Page Complete

You now have deep knowledge of composite index design: from the fundamental lexicographic ordering and leftmost prefix rule, through column ordering strategy and ORDER BY integration, to advanced techniques like covering indexes and multi-query optimization. This knowledge enables you to design high-performance index strategies.

4 / 5

Loading learning content...

Database Management SystemsIndex Types

Index Types: Primary, Secondary, Unique, and Composite Indexes

LevelIntermediate

Duration60 mins

TopicIndex Types

4 / 5

Composite Index: Multi-Column Query Optimization

Beyond Single-Column Indexes

What You Will Learn

Composite Index Structure

Lexicographic ordering:

Consider an index on (last_name, first_name, birth_date):

All 'Adams' come before all 'Baker'
Within 'Adams', ordered by first_name: 'Adams, Alice' before 'Adams, Bob'
Within 'Adams, Alice', ordered by birth_date

This ordering has profound implications for which queries can use the index.

composite_index_structure.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Composite Index on (last_name, first_name, birth_date)
 
B+-tree Index Structure:
                            ┌─────────────────────────────────┐
                            │            Root Node            │
                            │  (Baker, Alice, ...)            │
                            └───────────────┬─────────────────┘
                                            │
                    ┌───────────────────────┴───────────────────────┐
                    ▼                                               ▼
    ┌───────────────────────────────┐           ┌───────────────────────────────┐
    │         Left Subtree          │           │        Right Subtree          │
    │ All names < (Baker, Alice)    │           │ All names ≥ (Baker, Alice)    │
    └───────────────────────────────┘           └───────────────────────────────┘
 
Leaf Level (sorted entries):
┌─────────────────────────────────────────────────────────────────────────────────┐
│ (Adams, Alice, 1985-03-15) → Row1                                              │
│ (Adams, Alice, 1990-07-22) → Row7   ← Same name, different dates               │
│ (Adams, Bob, 1988-11-30) → Row3                                                 │
│ (Adams, Carol, 1982-01-05) → Row5                                               │
│ (Baker, Alice, 1995-09-12) → Row2                                              │
│ (Baker, David, 1978-04-18) → Row4                                              │
│ (Chen, Alice, 1992-06-28) → Row6                                               │
│ ...                                                                             │
└─────────────────────────────────────────────────────────────────────────────────┘
 
Sort Order:
1. Primary sort: last_name (alphabetically)
2. Secondary sort: first_name (within same last_name)
3. Tertiary sort: birth_date (within same last_name + first_name)

Index Key Composition

The Leftmost Prefix Rule

The leftmost prefix rule is the most important concept in composite index usage. It states:

A composite index on (A, B, C) can efficiently serve queries that filter on:

(A) alone

(A, B) together

(A, B, C) together

But NOT queries filtering only on:

(B) alone

(C) alone

(B, C) together

Why this matters:

Leftmost Prefix Rule for Index (A, B, C)
Query Filter	Uses Index?	Why
WHERE A = 1	✅ Yes	A is the first column, binary search works
WHERE A = 1 AND B = 2	✅ Yes	Leftmost prefix (A, B)
WHERE A = 1 AND B = 2 AND C = 3	✅ Yes	Complete key match
WHERE B = 2	❌ No*	B is not leftmost, cannot narrow search
WHERE C = 3	❌ No*	C is not leftmost
WHERE B = 2 AND C = 3	❌ No*	Neither B nor C is leftmost
WHERE A = 1 AND C = 3	⚠️ Partial	Uses index for A=1, then scans for C=3

leftmost_examples.sql
Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Index definition:
CREATE INDEX idx_orders ON orders (customer_id, order_date, status);
 
-- Query 1: Uses index perfectly (complete prefix)
SELECT * FROM orders
WHERE customer_id = 100 AND order_date = '2024-01-15' AND status = 'shipped';
-- Access path: Index seek on (100, '2024-01-15', 'shipped')
 
-- Query 2: Uses index (partial prefix)
SELECT * FROM orders
WHERE customer_id = 100 AND order_date BETWEEN '2024-01-01' AND '2024-01-31';
-- Access path: Index range scan for customer_id=100 within date range
 
-- Query 3: Uses index (leftmost column only)
SELECT * FROM orders
WHERE customer_id = 100;
-- Access path: Index seek on customer_id=100, scan all dates/statuses
 
-- Query 4: CANNOT use index efficiently
SELECT * FROM orders
WHERE order_date = '2024-01-15';
-- Access path: Full table scan (or index scan checking every entry)
-- order_date is not the leftmost column!
 
-- Query 5: Partial index use
SELECT * FROM orders
WHERE customer_id = 100 AND status = 'shipped';
-- Access path: Index seek on customer_id=100, filter status (skips date)
-- Index helps with customer_id, but not status (due to "gap")

Column Gaps Break Index

Column Order Strategy

Given the leftmost prefix rule, the order of columns in a composite index is a critical design decision. Here are strategic principles for optimal column ordering:

Principle 1: Equality Predicates First

Columns used in equality comparisons (=) should come before columns used in range comparisons (<, >, BETWEEN, LIKE 'prefix%').

Why: After an equality match, the index can still use the sorted order of subsequent columns. But after a range scan starts, subsequent columns are scrambled within the range.

Principle 2: High Selectivity First (Sometimes)

A common guideline is to place more selective columns (columns that filter out more rows) first. However, this is secondary to the equality-before-range principle.

column_order_example.sql
Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Common query pattern:
SELECT * FROM orders
WHERE customer_id = 100                    -- Equality
  AND order_date BETWEEN '2024-01-01'      -- Range
                  AND '2024-03-31'
  AND status = 'shipped';                  -- Equality (but after range!)
 
-- Index Option 1: (customer_id, order_date, status)
-- customer_id = 100      → Seek to customer 100's section
-- order_date BETWEEN     → Range scan within Q1 2024
-- status = 'shipped'     → Filter applied (index not helping status)
-- Result: Good, but status filter is post-range
 
-- Index Option 2: (customer_id, status, order_date)
-- customer_id = 100      → Seek to customer 100's section
-- status = 'shipped'     → Further narrow to 'shipped' orders
-- order_date BETWEEN     → Range scan within shipped orders
-- Result: BETTER! All three columns contribute to seek/scan
 
-- Column order matters:
-- (customer_id, order_date, status) - 2 columns fully used
-- (customer_id, status, order_date) - 3 columns fully used
 
-- General pattern: Equality columns, then Range column last
CREATE INDEX idx_orders_optimized 
ON orders (customer_id, status, order_date);

Column Ordering Principles

•Equality before range: Columns with = conditions first, columns with <, >, BETWEEN last
•Most queries first: The column that appears in most queries should be leftmost (maximizes prefix reuse)
•Consider cardinality: Among equality columns, higher cardinality (more distinct values) slightly preferred first
•ORDER BY integration: If queries include ORDER BY, consider placing those columns after WHERE columns
•Write patterns matter: If a column is updated frequently, placing it later reduces index churn

Analyze Your Workload

Composite Indexes and ORDER BY / GROUP BY

Composite indexes can eliminate the need for expensive sorting operations when queries include ORDER BY or GROUP BY. The key is ensuring the index order matches the requested output order.

ORDER BY optimization:

If the index is on (A, B, C) and the query is:

ORDER BY A, B → Index can provide sorted output directly
ORDER BY A, B, C → Index can provide sorted output directly
ORDER BY B, C → Index CANNOT help (leftmost prefix rule)
ORDER BY A DESC, B ASC → Depends on database (some support mixed sort)

order_by_optimization.sql
Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Index: (customer_id, order_date)
CREATE INDEX idx_customer_orders ON orders (customer_id, order_date);
 
-- Query 1: Perfect match - no sort needed
SELECT * FROM orders
WHERE customer_id = 100
ORDER BY order_date;
-- Index provides orders for customer 100, already sorted by date!
-- EXPLAIN shows: "Using index" with no "Using filesort"
 
-- Query 2: Still works - both columns in order
SELECT * FROM orders
ORDER BY customer_id, order_date;
-- Index is sorted exactly this way - no sort operation needed
 
-- Query 3: Cannot use index for ORDER BY
SELECT * FROM orders
WHERE customer_id = 100
ORDER BY status;          -- status not in this index
-- Index used for WHERE, but filesort needed for ORDER BY
 
-- Query 4: Partial optimization
SELECT * FROM orders
WHERE customer_id = 100
ORDER BY order_date DESC;
-- Index scan in reverse order - no filesort (if DB supports backward scan)
 
-- GROUP BY works similarly:
SELECT customer_id, COUNT(*) 
FROM orders
GROUP BY customer_id;
-- Index provides rows pre-grouped by customer_id (if leading column)

ORDER BY Optimization with Index (A, B, C)
ORDER BY Clause	Sort Eliminated?	Condition
A	✅ Yes	Matches leftmost prefix
A, B	✅ Yes	Matches prefix
A, B, C	✅ Yes	Complete match
A DESC, B DESC, C DESC	✅ Yes*	*If DB supports backward scan
B, C	❌ No	Doesn't start with A
A, C	❌ No	Gap in middle (skips B)
A ASC, B DESC	⚠️ Maybe	Mixed direction requires special index

LIMIT Optimization

Covering Indexes: Avoiding Table Access

Why covering indexes are powerful:

No random I/O to table: Index entries are smaller than full rows, more fit in memory
Sequential access: Index leaves are linked, enabling efficient scans
Fewer pages read: Index is more compact than full table

The performance improvement can be 10-100x for queries on large tables.

covering_index_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Consider this query:
SELECT customer_id, order_date, total_amount
FROM orders
WHERE customer_id = 100 AND order_date >= '2024-01-01';
 
-- Index Option 1: (customer_id, order_date)
-- Can use index for WHERE, but must fetch total_amount from table
-- Access path: Index seek → table lookup (row by row) → return results
 
-- Index Option 2: (customer_id, order_date, total_amount)
-- Now index COVERS the query - all needed columns in index!
-- Access path: Index seek → return results directly from index
 
CREATE INDEX idx_orders_covering 
ON orders (customer_id, order_date, total_amount);
 
-- EXPLAIN shows "Using index" or "Index Only Scan"
-- No table access required!
 
-- INCLUDE syntax (SQL Server, PostgreSQL 11+):
-- When additional columns are only for covering, not filtering:
CREATE INDEX idx_orders_covering
ON orders (customer_id, order_date)
INCLUDE (total_amount, status, shipping_date);
 
-- Benefits of INCLUDE:
-- - Included columns don't affect sort order
-- - Smaller tree (included cols only in leaf nodes)
-- - More flexible for covering multiple queries

With Table Access:

Index Seek
    │
    ▼
Row ID from index
    │
    ▼
Table Lookup (random I/O)
    │
    ▼
Fetch full row
    │
    ▼
Extract needed columns

Cost: Index traversal + N random table reads

Covering Index (No Table Access):

Index Seek
    │
    ▼
All columns in index
    │
    ▼
Return directly

Cost: Index traversal only

Savings: Often 10-100x faster

Index Size Tradeoff

Index Intersection: Combining Multiple Indexes

When a query has multiple filter conditions and no single index covers them all, some databases can use index intersection—combining results from multiple indexes using set operations.

How index intersection works:

Execute each index separately, getting sets of row IDs
Intersect (AND) or union (OR) the row ID sets
Fetch the final set of rows from the table

This allows the optimizer to leverage existing single-column indexes for multi-column queries.

index_intersection.sql
Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Existing indexes:
CREATE INDEX idx_customer ON orders (customer_id);
CREATE INDEX idx_status ON orders (status);
 
-- Query with two conditions:
SELECT * FROM orders
WHERE customer_id = 100 AND status = 'shipped';
 
-- Without composite index, optimizer has choices:
 
-- Option 1: Use idx_customer, filter status
-- Scan all orders for customer 100, filter by status in memory
 
-- Option 2: Use idx_status, filter customer
-- Scan all 'shipped' orders, filter by customer in memory
 
-- Option 3: Index intersection (if supported)
-- Step 1: Get all row IDs where customer_id = 100 from idx_customer
-- Step 2: Get all row IDs where status = 'shipped' from idx_status
-- Step 3: Intersect the two sets of row IDs
-- Step 4: Fetch only the intersecting rows from table
 
-- Index intersection is typically chosen when:
-- - Both predicates are selective
-- - Neither index alone reduces result set enough
-- - A composite index doesn't exist
 
-- However, composite index usually beats intersection:
CREATE INDEX idx_customer_status ON orders (customer_id, status);
-- Single index seek, no intersection overhead

Composite Index vs. Index Intersection
Aspect	Composite Index	Index Intersection
Number of indexes	One multi-column	Multiple single-column
Query efficiency	Single index seek	Multiple seeks + merge
Storage overhead	One larger index	Multiple smaller indexes
Flexibility	Best for specific query	Covers varied query patterns
Typical performance	Faster (10x or more)	Slower but still better than scan
Database support	All databases	MySQL, PostgreSQL, SQL Server (varies)

When Intersection Makes Sense

Designing Indexes for Multiple Queries

In practice, you need to design indexes that serve multiple queries efficiently while minimizing index count. Here's a systematic approach:

Step 1: Catalog your important queries

Identify the most frequent, performance-critical, or SLA-bound queries.

Step 2: Identify common column prefixes

Look for columns that appear together across queries. These are candidates for composite indexes.

Step 3: Apply the prefix-sharing principle

A single index can serve multiple queries if their needed columns are prefixes of each other.

multi_query_index.sql
Strategy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Query workload analysis:
-- Q1: WHERE customer_id = ?
-- Q2: WHERE customer_id = ? AND status = ?
-- Q3: WHERE customer_id = ? AND status = ? AND order_date = ?
-- Q4: WHERE customer_id = ? ORDER BY order_date DESC LIMIT 10
 
-- Observation: customer_id is common to all queries
 
-- Option A: Four separate indexes (wasteful!)
CREATE INDEX idx1 ON orders (customer_id);
CREATE INDEX idx2 ON orders (customer_id, status);
CREATE INDEX idx3 ON orders (customer_id, status, order_date);
CREATE INDEX idx4 ON orders (customer_id, order_date);
 
-- Option B: Strategic consolidation (better!)
-- Index 1: (customer_id, status, order_date) - serves Q1, Q2, Q3
-- Index 2: (customer_id, order_date) - serves Q1, Q4
 
-- Even better: Can we serve all with fewer indexes?
-- Q4 needs customer_id + order_date in specific order
-- Q3 needs customer_id + status + order_date
 
-- Insight: We can't fully consolidate because:
-- ORDER BY order_date needs order_date to immediately follow customer_id
 
-- Final design:
CREATE INDEX idx_cust_status_date ON orders (customer_id, status, order_date);
CREATE INDEX idx_cust_date ON orders (customer_id, order_date);
 
-- 2 indexes serve all 4 query patterns efficiently
-- idx1 is a prefix of idx_cust_status_date (Q1 served)
-- idx2 is a prefix of idx_cust_status_date (Q2 served)
-- idx3 is exactly idx_cust_status_date (Q3 served)
-- idx4 is served by idx_cust_date (Q4 served)

Index Consolidation Strategies

•Prefix sharing: If Q1 needs (A) and Q2 needs (A, B), one index on (A, B) serves both
•Extend for covering: Add non-filter columns to avoid table lookup
•Consider ORDER BY independently: ORDER BY requirements may force separate indexes
•Analyze query frequency: A seldom-used query might not warrant its own index
•Monitor and adjust: Use slow query logs to identify gaps in index coverage

Index Count Tradeoff

Summary: Composite Index Mastery

We've conducted a deep exploration of composite indexes—the multi-column structures that are essential for optimizing real-world queries that filter, sort, and aggregate across multiple attributes.

Key Takeaways

•Structure: Composite indexes store multi-column tuples in lexicographic order—sorted first by column 1, then column 2 within ties, and so on
•Leftmost prefix rule: Only queries that filter on a prefix of the index columns (starting from the leftmost) can fully leverage the index
•Column order matters: Place equality columns before range columns; consider query frequency and selectivity
•ORDER BY integration: Indexes can eliminate sorting if the sort order matches the index order
•Covering indexes: Including all query columns in the index enables index-only scans, avoiding table access
•INCLUDE clause: Non-key columns can be included for covering without affecting sort order
•Index intersection: Multiple single-column indexes can be combined, but composite indexes are usually faster
•Multi-query design: Strategically design indexes to serve multiple query patterns through prefix sharing

What's next:

Page Complete

4 / 5