Database Management SystemsMulti-Column Indexes

Multi-Column Indexes

LevelIntermediate

Duration60 mins

TopicMulti-Column Indexes

2 / 5

Column Order Importance

The Order That Changes Everything

In composite index design, column order is not arbitrary—it is the single most critical decision that determines whether your index accelerates queries or sits unused, consuming storage and slowing writes.

Consider two seemingly equivalent indexes:

CREATE INDEX idx_a ON orders (customer_id, order_date);
CREATE INDEX idx_b ON orders (order_date, customer_id);

Both indexes contain the same columns. Both consume similar storage. Yet for many queries, one index will provide O(log n) efficiency while the other provides no benefit at all—forcing a full table scan.

This page explores the science and art of column ordering, transforming this critical decision from guesswork into deliberate engineering.

What You Will Learn

By the end of this page, you will understand the leftmost prefix rule, how to analyze query patterns to determine optimal column ordering, the interplay between equality and range predicates, and proven strategies for ordering columns in production composite indexes.

The Leftmost Prefix Rule

The leftmost prefix rule (also called the prefix matching rule) is the fundamental principle governing composite index usage:

A composite index can only be used efficiently if the query filters or sorts by a contiguous prefix of the index's columns, starting from the leftmost column.

This rule arises directly from the lexicographic ordering of composite index keys. Consider an index on (A, B, C):

The index is sorted primarily by A
Within each A value, entries are sorted by B
Within each (A, B) combination, entries are sorted by C

This hierarchical structure means that to leverage the index effectively, you must 'anchor' your search from the left side of the key.

Index Usage Based on Query Predicates - Index on (A, B, C)
Query Predicates	Columns Used	Can Use Index Efficiently?
WHERE A = ?	A	✅ Yes - Uses first column
WHERE A = ? AND B = ?	A, B	✅ Yes - Uses first two columns
WHERE A = ? AND B = ? AND C = ?	A, B, C	✅ Yes - Uses all three columns
WHERE B = ?	B only	❌ No - Skips leading column A
WHERE C = ?	C only	❌ No - Skips leading columns A, B
WHERE B = ? AND C = ?	B, C	❌ No - Skips leading column A
WHERE A = ? AND C = ?	A, C (gap)	⚠️ Partial - Uses A only, C not used efficiently

The Gap Problem

When there's a 'gap' in the column sequence (e.g., querying A and C but not B), only the columns up to the gap can be used for efficient index navigation. The predicate on C cannot leverage the sorted property because entries are not contiguously sorted by C unless A AND B are both specified.

Why does this rule exist?

Think of the phonebook analogy again. A phonebook sorted by (LastName, FirstName) lets you:

Find all 'Smiths' quickly (leftmost column)
Find 'John Smith' quickly (both columns in order)

But it does not let you:

Find all 'Johns' quickly (skips LastName)
Find all people with FirstName = 'John' and LastName = 'Smith' if you search by FirstName first

To find all 'Johns', you'd have to scan the entire phonebook because people named John are scattered throughout (John Adams, John Brown, John Carter, ...)—they are not contiguous.

Converting Mermaid diagram...

Equality vs Range Predicates

A crucial distinction in composite index design is between equality predicates and range predicates:

Equality predicates: =, IN
Range predicates: <, >, <=, >=, BETWEEN, LIKE 'abc%'

The Critical Rule:

Once a range predicate is encountered in the composite key, subsequent columns cannot be used for efficient index range scans.

This is because range predicates 'break' the sorted continuity for subsequent columns.

range_predicate_impact.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Index: (department_id, hire_date, employee_id)
 
-- Query 1: All equality predicates
SELECT * FROM employees
WHERE department_id = 50
  AND hire_date = '2023-01-15'
  AND employee_id = 1001;
-- Result: Uses all three columns efficiently ✅
 
-- Query 2: Equality, then range
SELECT * FROM employees
WHERE department_id = 50
  AND hire_date BETWEEN '2023-01-01' AND '2023-06-30'
  AND employee_id = 1001;
-- Result: Uses department_id and hire_date only ⚠️
-- The employee_id predicate cannot use the index range scan
-- (It may still filter, but doesn't narrow the scan)
 
-- Query 3: Range on first column
SELECT * FROM employees
WHERE department_id BETWEEN 40 AND 60
  AND hire_date = '2023-01-15'
  AND employee_id = 1001;
-- Result: Uses department_id only for range scan ⚠️
-- hire_date and employee_id cannot leverage index ordering
 
-- Why? Because within the range [40, 60], hire_dates are NOT contiguous:
-- (40, 2020-01-01), (40, 2023-01-15), (41, 2021-06-01), (42, 2023-01-15), ...
-- There's no single "2023-01-15 range" to scan

The Range Stops the Index Rule

A useful mnemonic: "The range stops the index." Once you hit a range predicate in your composite key order, any columns after it cannot benefit from index ordering. This is why equality columns should generally come before range columns in composite index design.

Predicate Type Impact on Index Column Usage
Query Pattern	A Usage	B Usage	C Usage
A = ? AND B = ? AND C = ?	✅ Equality	✅ Equality	✅ Equality
A = ? AND B = ? AND C > ?	✅ Equality	✅ Equality	⚠️ Range scan
A = ? AND B > ? AND C = ?	✅ Equality	⚠️ Range scan	❌ Not used
A > ? AND B = ? AND C = ?	⚠️ Range scan	❌ Not used	❌ Not used
A = ? AND B > ? AND C > ?	✅ Equality	⚠️ Range scan	❌ Not used

Column Ordering Strategies

Given the leftmost prefix rule and the equality-vs-range distinction, how should you order columns in a composite index? Here are the primary strategies, ordered by priority:

Column Ordering Principles (In Priority Order)

•Equality predicates before range predicates — Columns with = or IN conditions should precede columns with <, >, BETWEEN conditions. This maximizes the number of columns that can fully participate in index navigation.
•Higher selectivity columns first (generally) — Among equality columns, placing higher-selectivity columns first reduces the number of index entries examined. However, this is secondary to the equality-before-range rule.
•Match the most common query patterns — If 90% of queries filter by column A first, A should be the leading column even if other columns have higher selectivity.
•Consider ORDER BY columns — If queries commonly sort by certain columns, including them in the index (in the same order and direction) can eliminate filesort operations.
•Include columns for covering — Consider adding columns after the filter columns to make the index cover queries completely (index-only scans).

Applying Ordering StrategiesOrders table with common queries:

Input

Output

ordering_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
-- Scenario 1: E-commerce order lookup
-- Common query: Find orders for a customer in a date range
-- Pattern: customer_id = ? AND order_date BETWEEN ? AND ?
 
-- CORRECT ordering: Equality first, range second
CREATE INDEX idx_orders_optimal ON orders (customer_id, order_date);
 
-- INCORRECT ordering: Range column first
CREATE INDEX idx_orders_suboptimal ON orders (order_date, customer_id);
-- This would require scanning all dates to find the customer
 
 
-- Scenario 2: Employee filtering with multiple equalities
-- Common query: Find employees in a department with a specific job title
-- Pattern: department_id = ? AND job_title = ?
 
-- If department_id has higher selectivity (100 depts, 20 job titles):
CREATE INDEX idx_emp_v1 ON employees (department_id, job_title);
 
-- If job_title has higher selectivity (100 depts, 500 job titles):
CREATE INDEX idx_emp_v2 ON employees (job_title, department_id);
 
-- The higher-selectivity column first reduces rows examined
 
 
-- Scenario 3: Multi-predicate with ORDER BY
-- Common query: Recent orders for a customer
-- Pattern: customer_id = ? ORDER BY order_date DESC LIMIT 10
 
-- Optimal: Filter column + sort column
CREATE INDEX idx_orders_sorted ON orders (customer_id, order_date DESC);
-- Enables index scan in order, no filesort needed
 
 
-- Scenario 4: Covering index design
-- Common query: Customer order summary
-- Pattern: SELECT order_id, total, status 
--          WHERE customer_id = ? AND order_date BETWEEN ? AND ?
 
-- Covering index includes all needed columns
CREATE INDEX idx_orders_covering ON orders (
    customer_id,      -- Equality filter
    order_date,       -- Range filter
    order_id,         -- Output column
    total,            -- Output column
    status            -- Output column
);
-- Enables index-only scan

Selectivity Considerations

Selectivity (the fraction of rows returned after applying a predicate) influences column ordering for equality predicates. The general guidance is to place higher-selectivity columns first, but this is nuanced.

Case: High-Selectivity Column First

When column A has selectivity 0.001 (1 in 1000) and column B has selectivity 0.1 (1 in 10):

INDEX (A, B)
-- 1,000,000 rows → A=? → 1,000 rows
-- 1,000 rows → B=? → 100 rows

Navigating first to the highly selective A dramatically reduces the search space immediately.

Case: Lower-Selectivity Column First

Same columns but with the opposite order:

INDEX (B, A)
-- 1,000,000 rows → B=? → 100,000 rows
-- 100,000 rows → A=? → 100 rows

More index entries are traversed before the highly selective filter is applied.

When Query Frequency Trumps Selectivity

If 95% of queries filter only on column B (lower selectivity) and only 5% use both A and B, the index (B, A) may be preferable. The leading column should match the most common query patterns, even if not the most selective. A single index must serve multiple queries; optimize for the dominant use case.

Selectivity Analysis Process:

Gather column statistics: Determine NDV (number of distinct values) for each candidate column
Calculate selectivity: Selectivity = 1 / NDV (approximate for uniform distribution)
Analyze query patterns: Identify which columns appear together in WHERE clauses
Weight by query frequency: Consider how often each query pattern executes
Account for equality vs range: Equality predicates always come before range, regardless of selectivity
Consider data skew: Highly skewed distributions may invalidate simple selectivity calculations

selectivity_analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- MySQL: Analyze column cardinality
SELECT 
    COLUMN_NAME,
    CARDINALITY
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_NAME = 'orders'
ORDER BY CARDINALITY DESC;
 
-- PostgreSQL: Detailed statistics
SELECT 
    attname AS column_name,
    n_distinct,
    most_common_vals,
    most_common_freqs
FROM pg_stats
WHERE tablename = 'orders';
 
-- Calculate selectivity
-- For orders table (1,000,000 rows):
-- customer_id: 50,000 distinct → selectivity = 0.00002 (high)
-- status: 5 distinct → selectivity = 0.2 (low)
-- order_date: 1,000 distinct → selectivity = 0.001 (moderate)
 
-- Common query: customer_id = ? AND status = ? AND order_date BETWEEN ? AND ?
-- Optimal ordering:
-- 1. customer_id (equality, highest selectivity)
-- 2. status (equality, lower selectivity)
-- 3. order_date (range)
CREATE INDEX idx_orders_optimal ON orders (customer_id, status, order_date);

Sort Order Considerations

Each column in a composite index can be sorted in ascending (ASC) or descending (DESC) order. The sort direction affects whether the index can satisfy ORDER BY clauses without additional sorting.

The Sort Direction Rule:

An index can satisfy an ORDER BY clause if:

The ORDER BY columns are a prefix of the index columns
The sort directions either all match or are all opposite to the index

The 'all opposite' case works because databases can scan an index in reverse.

Sort Order Matching - Index (A ASC, B ASC, C ASC)
ORDER BY Clause	Can Use Index?	Reason
ORDER BY A ASC	✅ Yes	Matches index order
ORDER BY A DESC	✅ Yes	Reverse scan of index
ORDER BY A ASC, B ASC	✅ Yes	Matches index order
ORDER BY A DESC, B DESC	✅ Yes	All reversed, reverse scan
ORDER BY A ASC, B DESC	❌ No	Mixed directions
ORDER BY A DESC, B ASC	❌ No	Mixed directions
ORDER BY B ASC	❌ No	Skips leading column A
ORDER BY A ASC, C ASC	❌ No	Gap (skips B)

mixed_sort_orders.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Scenario: Dashboard showing recent orders by category
-- Required: ORDER BY category_id ASC, order_date DESC
 
-- Standard index (both ASC) - CANNOT satisfy mixed order
CREATE INDEX idx_orders_v1 ON orders (category_id ASC, order_date ASC);
-- Result: Filesort required for DESC on order_date
 
-- Mixed sort order index - CAN satisfy the query
CREATE INDEX idx_orders_v2 ON orders (category_id ASC, order_date DESC);
-- Result: Index scan in natural order, no filesort
 
-- Query execution comparison:
 
-- With idx_orders_v1:
EXPLAIN SELECT * FROM orders
WHERE category_id = 5
ORDER BY category_id ASC, order_date DESC;
-- Extra: Using filesort (expensive for large results)
 
-- With idx_orders_v2:
EXPLAIN SELECT * FROM orders  
WHERE category_id = 5
ORDER BY category_id ASC, order_date DESC;
-- Extra: Using index (no filesort)
 
 
-- Modern databases support mixed sort orders:
-- MySQL 8.0+, PostgreSQL, SQL Server, Oracle all support this
CREATE INDEX idx_products ON products (
    category_id ASC,
    price DESC,
    product_name ASC
);

Analyze Your ORDER BY Patterns

Examine your top queries' ORDER BY clauses. If they use mixed sort directions, create indexes with matching mixed orders. The filesort elimination can be a significant performance win for large result sets.

Real-World Ordering Examples

Let's apply column ordering principles to common real-world scenarios:

Customer Order History

Queries:

WHERE customer_id = ? ORDER BY order_date DESC LIMIT 10
WHERE customer_id = ? AND status = 'shipped' ORDER BY order_date DESC
WHERE customer_id = ? AND order_date BETWEEN ? AND ?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Optimal index for all three patterns:
CREATE INDEX idx_customer_orders ON orders (
    customer_id,        -- Equality (all queries)
    status,             -- Equality (some queries, adds selectivity)
    order_date DESC     -- Range + Sort (matches ORDER BY DESC)
);
 
-- Analysis:
-- customer_id: Always equality, most common filter
-- status: Sometimes equality, before date (range)
-- order_date: Range predicate, also sort column (DESC)
 
-- Alternative if status filtering is rare:
CREATE INDEX idx_customer_orders_simple ON orders (
    customer_id,
    order_date DESC
);

Common Column Ordering Mistakes

Even experienced developers make column ordering mistakes. Here are the most common pitfalls to avoid:

Column Ordering Anti-Patterns

•Range column before equality column — Placing date ranges or numeric ranges before equality predicates wastes index potential. Always: equality columns → range columns.
•Ordering by column definition order — Creating indexes like (col1, col2, col3) because that's how they appear in the table, without analyzing query patterns.
•Ignoring the leading column requirement — Creating (A, B, C) when most queries only filter on B or C, leaving the index unused.
•Forgetting sort direction — Creating (A ASC, B ASC) when queries use ORDER BY A ASC, B DESC, forcing filesoft.
•Over-relying on selectivity alone — Prioritizing the most selective column without considering if it's actually used in queries.
•Creating near-duplicate indexes — Having both (A, B) and (A, B, C), (A, C, B) when one well-designed index would suffice.

ordering_mistakes.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- MISTAKE 1: Range before equality
-- Query: WHERE status = 'active' AND created_at > '2023-01-01'
CREATE INDEX idx_bad ON items (created_at, status);  -- ❌ Wrong order
CREATE INDEX idx_good ON items (status, created_at); -- ✅ Correct
 
-- MISTAKE 2: Leading column not used
-- Query: WHERE department_id = 50
CREATE INDEX idx_unused ON employees (manager_id, department_id);
-- Index is useless for this query! manager_id is leading.
 
-- MISTAKE 3: Wrong sort direction
-- Query: ORDER BY last_name ASC, hire_date DESC
CREATE INDEX idx_wrong_sort ON employees (last_name ASC, hire_date ASC);
-- Requires filesort because of mixed directions
 
CREATE INDEX idx_right_sort ON employees (last_name ASC, hire_date DESC);
-- No filesort needed
 
-- MISTAKE 4: Duplicate coverage
-- These indexes are redundant:
CREATE INDEX idx1 ON orders (customer_id);
CREATE INDEX idx2 ON orders (customer_id, order_date);
-- idx2 can serve all queries that idx1 serves!
-- Solution: Keep only idx2

Validation Through EXPLAIN

Always validate index usage with EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL). The query plan reveals whether your carefully designed column order is actually being utilized as expected.

Summary: Column Order Mastery

Column ordering is the art and science of composite index design. Here are the essential principles:

Key Takeaways

•Leftmost prefix rule — Queries must use a contiguous prefix of index columns starting from the left to leverage the index efficiently.
•Equality before range — The most important ordering rule. Place = and IN columns before <, >, BETWEEN columns.
•Range stops the index — Once a range predicate is encountered, subsequent columns cannot be used for index range scans.
•Selectivity matters for equality columns — Higher-selectivity equality columns should generally come before lower-selectivity ones.
•Query frequency matters more — The leading column should match the most common query filter patterns.
•Sort direction must match — ORDER BY clauses require matching (or all-inverted) sort directions in the index.
•Validate with EXPLAIN — Always verify that your column ordering produces the expected query plans.

What's Next:

Now that you understand how to order columns, the next page explores prefix matching in depth—how the database leverages partial prefixes of your composite index and how to design indexes that serve multiple query patterns through strategic prefix utilization.

Page Complete

You now understand the critical principles of column ordering in composite indexes. You know the leftmost prefix rule, the equality-before-range principle, how selectivity guides ordering decisions, and how to match sort directions. These concepts form the foundation for all advanced composite index design.

2 / 5

Loading learning content...

Database Management SystemsMulti-Column Indexes

Multi-Column Indexes

LevelIntermediate

Duration60 mins

TopicMulti-Column Indexes

2 / 5

Column Order Importance

The Order That Changes Everything

Consider two seemingly equivalent indexes:

CREATE INDEX idx_a ON orders (customer_id, order_date);
CREATE INDEX idx_b ON orders (order_date, customer_id);

This page explores the science and art of column ordering, transforming this critical decision from guesswork into deliberate engineering.

What You Will Learn

The Leftmost Prefix Rule

The leftmost prefix rule (also called the prefix matching rule) is the fundamental principle governing composite index usage:

A composite index can only be used efficiently if the query filters or sorts by a contiguous prefix of the index's columns, starting from the leftmost column.

This rule arises directly from the lexicographic ordering of composite index keys. Consider an index on (A, B, C):

The index is sorted primarily by A
Within each A value, entries are sorted by B
Within each (A, B) combination, entries are sorted by C

This hierarchical structure means that to leverage the index effectively, you must 'anchor' your search from the left side of the key.

Index Usage Based on Query Predicates - Index on (A, B, C)
Query Predicates	Columns Used	Can Use Index Efficiently?
WHERE A = ?	A	✅ Yes - Uses first column
WHERE A = ? AND B = ?	A, B	✅ Yes - Uses first two columns
WHERE A = ? AND B = ? AND C = ?	A, B, C	✅ Yes - Uses all three columns
WHERE B = ?	B only	❌ No - Skips leading column A
WHERE C = ?	C only	❌ No - Skips leading columns A, B
WHERE B = ? AND C = ?	B, C	❌ No - Skips leading column A
WHERE A = ? AND C = ?	A, C (gap)	⚠️ Partial - Uses A only, C not used efficiently

The Gap Problem

Why does this rule exist?

Think of the phonebook analogy again. A phonebook sorted by (LastName, FirstName) lets you:

Find all 'Smiths' quickly (leftmost column)
Find 'John Smith' quickly (both columns in order)

But it does not let you:

Find all 'Johns' quickly (skips LastName)
Find all people with FirstName = 'John' and LastName = 'Smith' if you search by FirstName first

To find all 'Johns', you'd have to scan the entire phonebook because people named John are scattered throughout (John Adams, John Brown, John Carter, ...)—they are not contiguous.

Converting Mermaid diagram...

Equality vs Range Predicates

A crucial distinction in composite index design is between equality predicates and range predicates:

Equality predicates: =, IN
Range predicates: <, >, <=, >=, BETWEEN, LIKE 'abc%'

The Critical Rule:

Once a range predicate is encountered in the composite key, subsequent columns cannot be used for efficient index range scans.

This is because range predicates 'break' the sorted continuity for subsequent columns.

range_predicate_impact.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Index: (department_id, hire_date, employee_id)
 
-- Query 1: All equality predicates
SELECT * FROM employees
WHERE department_id = 50
  AND hire_date = '2023-01-15'
  AND employee_id = 1001;
-- Result: Uses all three columns efficiently ✅
 
-- Query 2: Equality, then range
SELECT * FROM employees
WHERE department_id = 50
  AND hire_date BETWEEN '2023-01-01' AND '2023-06-30'
  AND employee_id = 1001;
-- Result: Uses department_id and hire_date only ⚠️
-- The employee_id predicate cannot use the index range scan
-- (It may still filter, but doesn't narrow the scan)
 
-- Query 3: Range on first column
SELECT * FROM employees
WHERE department_id BETWEEN 40 AND 60
  AND hire_date = '2023-01-15'
  AND employee_id = 1001;
-- Result: Uses department_id only for range scan ⚠️
-- hire_date and employee_id cannot leverage index ordering
 
-- Why? Because within the range [40, 60], hire_dates are NOT contiguous:
-- (40, 2020-01-01), (40, 2023-01-15), (41, 2021-06-01), (42, 2023-01-15), ...
-- There's no single "2023-01-15 range" to scan

The Range Stops the Index Rule

Predicate Type Impact on Index Column Usage
Query Pattern	A Usage	B Usage	C Usage
A = ? AND B = ? AND C = ?	✅ Equality	✅ Equality	✅ Equality
A = ? AND B = ? AND C > ?	✅ Equality	✅ Equality	⚠️ Range scan
A = ? AND B > ? AND C = ?	✅ Equality	⚠️ Range scan	❌ Not used
A > ? AND B = ? AND C = ?	⚠️ Range scan	❌ Not used	❌ Not used
A = ? AND B > ? AND C > ?	✅ Equality	⚠️ Range scan	❌ Not used

Column Ordering Strategies

Given the leftmost prefix rule and the equality-vs-range distinction, how should you order columns in a composite index? Here are the primary strategies, ordered by priority:

Column Ordering Principles (In Priority Order)

•Equality predicates before range predicates — Columns with = or IN conditions should precede columns with <, >, BETWEEN conditions. This maximizes the number of columns that can fully participate in index navigation.
•Higher selectivity columns first (generally) — Among equality columns, placing higher-selectivity columns first reduces the number of index entries examined. However, this is secondary to the equality-before-range rule.
•Match the most common query patterns — If 90% of queries filter by column A first, A should be the leading column even if other columns have higher selectivity.
•Consider ORDER BY columns — If queries commonly sort by certain columns, including them in the index (in the same order and direction) can eliminate filesort operations.
•Include columns for covering — Consider adding columns after the filter columns to make the index cover queries completely (index-only scans).

Applying Ordering StrategiesOrders table with common queries:

Input

Output

ordering_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
-- Scenario 1: E-commerce order lookup
-- Common query: Find orders for a customer in a date range
-- Pattern: customer_id = ? AND order_date BETWEEN ? AND ?
 
-- CORRECT ordering: Equality first, range second
CREATE INDEX idx_orders_optimal ON orders (customer_id, order_date);
 
-- INCORRECT ordering: Range column first
CREATE INDEX idx_orders_suboptimal ON orders (order_date, customer_id);
-- This would require scanning all dates to find the customer
 
 
-- Scenario 2: Employee filtering with multiple equalities
-- Common query: Find employees in a department with a specific job title
-- Pattern: department_id = ? AND job_title = ?
 
-- If department_id has higher selectivity (100 depts, 20 job titles):
CREATE INDEX idx_emp_v1 ON employees (department_id, job_title);
 
-- If job_title has higher selectivity (100 depts, 500 job titles):
CREATE INDEX idx_emp_v2 ON employees (job_title, department_id);
 
-- The higher-selectivity column first reduces rows examined
 
 
-- Scenario 3: Multi-predicate with ORDER BY
-- Common query: Recent orders for a customer
-- Pattern: customer_id = ? ORDER BY order_date DESC LIMIT 10
 
-- Optimal: Filter column + sort column
CREATE INDEX idx_orders_sorted ON orders (customer_id, order_date DESC);
-- Enables index scan in order, no filesort needed
 
 
-- Scenario 4: Covering index design
-- Common query: Customer order summary
-- Pattern: SELECT order_id, total, status 
--          WHERE customer_id = ? AND order_date BETWEEN ? AND ?
 
-- Covering index includes all needed columns
CREATE INDEX idx_orders_covering ON orders (
    customer_id,      -- Equality filter
    order_date,       -- Range filter
    order_id,         -- Output column
    total,            -- Output column
    status            -- Output column
);
-- Enables index-only scan

Selectivity Considerations

Case: High-Selectivity Column First

When column A has selectivity 0.001 (1 in 1000) and column B has selectivity 0.1 (1 in 10):

INDEX (A, B)
-- 1,000,000 rows → A=? → 1,000 rows
-- 1,000 rows → B=? → 100 rows

Navigating first to the highly selective A dramatically reduces the search space immediately.

Case: Lower-Selectivity Column First

Same columns but with the opposite order:

INDEX (B, A)
-- 1,000,000 rows → B=? → 100,000 rows
-- 100,000 rows → A=? → 100 rows

More index entries are traversed before the highly selective filter is applied.

When Query Frequency Trumps Selectivity

Selectivity Analysis Process:

Gather column statistics: Determine NDV (number of distinct values) for each candidate column
Calculate selectivity: Selectivity = 1 / NDV (approximate for uniform distribution)
Analyze query patterns: Identify which columns appear together in WHERE clauses
Weight by query frequency: Consider how often each query pattern executes
Account for equality vs range: Equality predicates always come before range, regardless of selectivity
Consider data skew: Highly skewed distributions may invalidate simple selectivity calculations

selectivity_analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- MySQL: Analyze column cardinality
SELECT 
    COLUMN_NAME,
    CARDINALITY
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_NAME = 'orders'
ORDER BY CARDINALITY DESC;
 
-- PostgreSQL: Detailed statistics
SELECT 
    attname AS column_name,
    n_distinct,
    most_common_vals,
    most_common_freqs
FROM pg_stats
WHERE tablename = 'orders';
 
-- Calculate selectivity
-- For orders table (1,000,000 rows):
-- customer_id: 50,000 distinct → selectivity = 0.00002 (high)
-- status: 5 distinct → selectivity = 0.2 (low)
-- order_date: 1,000 distinct → selectivity = 0.001 (moderate)
 
-- Common query: customer_id = ? AND status = ? AND order_date BETWEEN ? AND ?
-- Optimal ordering:
-- 1. customer_id (equality, highest selectivity)
-- 2. status (equality, lower selectivity)
-- 3. order_date (range)
CREATE INDEX idx_orders_optimal ON orders (customer_id, status, order_date);

Sort Order Considerations

The Sort Direction Rule:

An index can satisfy an ORDER BY clause if:

The ORDER BY columns are a prefix of the index columns
The sort directions either all match or are all opposite to the index

The 'all opposite' case works because databases can scan an index in reverse.

Sort Order Matching - Index (A ASC, B ASC, C ASC)
ORDER BY Clause	Can Use Index?	Reason
ORDER BY A ASC	✅ Yes	Matches index order
ORDER BY A DESC	✅ Yes	Reverse scan of index
ORDER BY A ASC, B ASC	✅ Yes	Matches index order
ORDER BY A DESC, B DESC	✅ Yes	All reversed, reverse scan
ORDER BY A ASC, B DESC	❌ No	Mixed directions
ORDER BY A DESC, B ASC	❌ No	Mixed directions
ORDER BY B ASC	❌ No	Skips leading column A
ORDER BY A ASC, C ASC	❌ No	Gap (skips B)

mixed_sort_orders.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Scenario: Dashboard showing recent orders by category
-- Required: ORDER BY category_id ASC, order_date DESC
 
-- Standard index (both ASC) - CANNOT satisfy mixed order
CREATE INDEX idx_orders_v1 ON orders (category_id ASC, order_date ASC);
-- Result: Filesort required for DESC on order_date
 
-- Mixed sort order index - CAN satisfy the query
CREATE INDEX idx_orders_v2 ON orders (category_id ASC, order_date DESC);
-- Result: Index scan in natural order, no filesort
 
-- Query execution comparison:
 
-- With idx_orders_v1:
EXPLAIN SELECT * FROM orders
WHERE category_id = 5
ORDER BY category_id ASC, order_date DESC;
-- Extra: Using filesort (expensive for large results)
 
-- With idx_orders_v2:
EXPLAIN SELECT * FROM orders  
WHERE category_id = 5
ORDER BY category_id ASC, order_date DESC;
-- Extra: Using index (no filesort)
 
 
-- Modern databases support mixed sort orders:
-- MySQL 8.0+, PostgreSQL, SQL Server, Oracle all support this
CREATE INDEX idx_products ON products (
    category_id ASC,
    price DESC,
    product_name ASC
);

Analyze Your ORDER BY Patterns

Real-World Ordering Examples

Let's apply column ordering principles to common real-world scenarios:

Customer Order History

Queries:

WHERE customer_id = ? ORDER BY order_date DESC LIMIT 10
WHERE customer_id = ? AND status = 'shipped' ORDER BY order_date DESC
WHERE customer_id = ? AND order_date BETWEEN ? AND ?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Optimal index for all three patterns:
CREATE INDEX idx_customer_orders ON orders (
    customer_id,        -- Equality (all queries)
    status,             -- Equality (some queries, adds selectivity)
    order_date DESC     -- Range + Sort (matches ORDER BY DESC)
);
 
-- Analysis:
-- customer_id: Always equality, most common filter
-- status: Sometimes equality, before date (range)
-- order_date: Range predicate, also sort column (DESC)
 
-- Alternative if status filtering is rare:
CREATE INDEX idx_customer_orders_simple ON orders (
    customer_id,
    order_date DESC
);

Common Column Ordering Mistakes

Even experienced developers make column ordering mistakes. Here are the most common pitfalls to avoid:

Column Ordering Anti-Patterns

•Range column before equality column — Placing date ranges or numeric ranges before equality predicates wastes index potential. Always: equality columns → range columns.
•Ordering by column definition order — Creating indexes like (col1, col2, col3) because that's how they appear in the table, without analyzing query patterns.
•Ignoring the leading column requirement — Creating (A, B, C) when most queries only filter on B or C, leaving the index unused.
•Forgetting sort direction — Creating (A ASC, B ASC) when queries use ORDER BY A ASC, B DESC, forcing filesoft.
•Over-relying on selectivity alone — Prioritizing the most selective column without considering if it's actually used in queries.
•Creating near-duplicate indexes — Having both (A, B) and (A, B, C), (A, C, B) when one well-designed index would suffice.

ordering_mistakes.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- MISTAKE 1: Range before equality
-- Query: WHERE status = 'active' AND created_at > '2023-01-01'
CREATE INDEX idx_bad ON items (created_at, status);  -- ❌ Wrong order
CREATE INDEX idx_good ON items (status, created_at); -- ✅ Correct
 
-- MISTAKE 2: Leading column not used
-- Query: WHERE department_id = 50
CREATE INDEX idx_unused ON employees (manager_id, department_id);
-- Index is useless for this query! manager_id is leading.
 
-- MISTAKE 3: Wrong sort direction
-- Query: ORDER BY last_name ASC, hire_date DESC
CREATE INDEX idx_wrong_sort ON employees (last_name ASC, hire_date ASC);
-- Requires filesort because of mixed directions
 
CREATE INDEX idx_right_sort ON employees (last_name ASC, hire_date DESC);
-- No filesort needed
 
-- MISTAKE 4: Duplicate coverage
-- These indexes are redundant:
CREATE INDEX idx1 ON orders (customer_id);
CREATE INDEX idx2 ON orders (customer_id, order_date);
-- idx2 can serve all queries that idx1 serves!
-- Solution: Keep only idx2

Validation Through EXPLAIN

Always validate index usage with EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL). The query plan reveals whether your carefully designed column order is actually being utilized as expected.

Summary: Column Order Mastery

Column ordering is the art and science of composite index design. Here are the essential principles:

Key Takeaways

•Leftmost prefix rule — Queries must use a contiguous prefix of index columns starting from the left to leverage the index efficiently.
•Equality before range — The most important ordering rule. Place = and IN columns before <, >, BETWEEN columns.
•Range stops the index — Once a range predicate is encountered, subsequent columns cannot be used for index range scans.
•Selectivity matters for equality columns — Higher-selectivity equality columns should generally come before lower-selectivity ones.
•Query frequency matters more — The leading column should match the most common query filter patterns.
•Sort direction must match — ORDER BY clauses require matching (or all-inverted) sort directions in the index.
•Validate with EXPLAIN — Always verify that your column ordering produces the expected query plans.

What's Next:

Page Complete

2 / 5