Loading learning content...
In composite index design, column order is not arbitrary—it is the single most critical decision that determines whether your index accelerates queries or sits unused, consuming storage and slowing writes.
Consider two seemingly equivalent indexes:
CREATE INDEX idx_a ON orders (customer_id, order_date);
CREATE INDEX idx_b ON orders (order_date, customer_id);
Both indexes contain the same columns. Both consume similar storage. Yet for many queries, one index will provide O(log n) efficiency while the other provides no benefit at all—forcing a full table scan.
This page explores the science and art of column ordering, transforming this critical decision from guesswork into deliberate engineering.
By the end of this page, you will understand the leftmost prefix rule, how to analyze query patterns to determine optimal column ordering, the interplay between equality and range predicates, and proven strategies for ordering columns in production composite indexes.
The leftmost prefix rule (also called the prefix matching rule) is the fundamental principle governing composite index usage:
A composite index can only be used efficiently if the query filters or sorts by a contiguous prefix of the index's columns, starting from the leftmost column.
This rule arises directly from the lexicographic ordering of composite index keys. Consider an index on (A, B, C):
This hierarchical structure means that to leverage the index effectively, you must 'anchor' your search from the left side of the key.
| Query Predicates | Columns Used | Can Use Index Efficiently? |
|---|---|---|
| WHERE A = ? | A | ✅ Yes - Uses first column |
| WHERE A = ? AND B = ? | A, B | ✅ Yes - Uses first two columns |
| WHERE A = ? AND B = ? AND C = ? | A, B, C | ✅ Yes - Uses all three columns |
| WHERE B = ? | B only | ❌ No - Skips leading column A |
| WHERE C = ? | C only | ❌ No - Skips leading columns A, B |
| WHERE B = ? AND C = ? | B, C | ❌ No - Skips leading column A |
| WHERE A = ? AND C = ? | A, C (gap) | ⚠️ Partial - Uses A only, C not used efficiently |
When there's a 'gap' in the column sequence (e.g., querying A and C but not B), only the columns up to the gap can be used for efficient index navigation. The predicate on C cannot leverage the sorted property because entries are not contiguously sorted by C unless A AND B are both specified.
Why does this rule exist?
Think of the phonebook analogy again. A phonebook sorted by (LastName, FirstName) lets you:
But it does not let you:
To find all 'Johns', you'd have to scan the entire phonebook because people named John are scattered throughout (John Adams, John Brown, John Carter, ...)—they are not contiguous.
A crucial distinction in composite index design is between equality predicates and range predicates:
=, IN<, >, <=, >=, BETWEEN, LIKE 'abc%'The Critical Rule:
Once a range predicate is encountered in the composite key, subsequent columns cannot be used for efficient index range scans.
This is because range predicates 'break' the sorted continuity for subsequent columns.
1234567891011121314151617181920212223242526272829
-- Index: (department_id, hire_date, employee_id) -- Query 1: All equality predicatesSELECT * FROM employeesWHERE department_id = 50 AND hire_date = '2023-01-15' AND employee_id = 1001;-- Result: Uses all three columns efficiently ✅ -- Query 2: Equality, then rangeSELECT * FROM employeesWHERE department_id = 50 AND hire_date BETWEEN '2023-01-01' AND '2023-06-30' AND employee_id = 1001;-- Result: Uses department_id and hire_date only ⚠️-- The employee_id predicate cannot use the index range scan-- (It may still filter, but doesn't narrow the scan) -- Query 3: Range on first columnSELECT * FROM employeesWHERE department_id BETWEEN 40 AND 60 AND hire_date = '2023-01-15' AND employee_id = 1001;-- Result: Uses department_id only for range scan ⚠️-- hire_date and employee_id cannot leverage index ordering -- Why? Because within the range [40, 60], hire_dates are NOT contiguous:-- (40, 2020-01-01), (40, 2023-01-15), (41, 2021-06-01), (42, 2023-01-15), ...-- There's no single "2023-01-15 range" to scanA useful mnemonic: "The range stops the index." Once you hit a range predicate in your composite key order, any columns after it cannot benefit from index ordering. This is why equality columns should generally come before range columns in composite index design.
| Index (A, B, C) | Query Pattern | A Usage | B Usage | C Usage |
|---|---|---|---|---|
| A = ? AND B = ? AND C = ? | ✅ Equality | ✅ Equality | ✅ Equality | |
| A = ? AND B = ? AND C > ? | ✅ Equality | ✅ Equality | ⚠️ Range scan | |
| A = ? AND B > ? AND C = ? | ✅ Equality | ⚠️ Range scan | ❌ Not used | |
| A > ? AND B = ? AND C = ? | ⚠️ Range scan | ❌ Not used | ❌ Not used | |
| A = ? AND B > ? AND C > ? | ✅ Equality | ⚠️ Range scan | ❌ Not used |
Given the leftmost prefix rule and the equality-vs-range distinction, how should you order columns in a composite index? Here are the primary strategies, ordered by priority:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
-- Scenario 1: E-commerce order lookup-- Common query: Find orders for a customer in a date range-- Pattern: customer_id = ? AND order_date BETWEEN ? AND ? -- CORRECT ordering: Equality first, range secondCREATE INDEX idx_orders_optimal ON orders (customer_id, order_date); -- INCORRECT ordering: Range column firstCREATE INDEX idx_orders_suboptimal ON orders (order_date, customer_id);-- This would require scanning all dates to find the customer -- Scenario 2: Employee filtering with multiple equalities-- Common query: Find employees in a department with a specific job title-- Pattern: department_id = ? AND job_title = ? -- If department_id has higher selectivity (100 depts, 20 job titles):CREATE INDEX idx_emp_v1 ON employees (department_id, job_title); -- If job_title has higher selectivity (100 depts, 500 job titles):CREATE INDEX idx_emp_v2 ON employees (job_title, department_id); -- The higher-selectivity column first reduces rows examined -- Scenario 3: Multi-predicate with ORDER BY-- Common query: Recent orders for a customer-- Pattern: customer_id = ? ORDER BY order_date DESC LIMIT 10 -- Optimal: Filter column + sort columnCREATE INDEX idx_orders_sorted ON orders (customer_id, order_date DESC);-- Enables index scan in order, no filesort needed -- Scenario 4: Covering index design-- Common query: Customer order summary-- Pattern: SELECT order_id, total, status -- WHERE customer_id = ? AND order_date BETWEEN ? AND ? -- Covering index includes all needed columnsCREATE INDEX idx_orders_covering ON orders ( customer_id, -- Equality filter order_date, -- Range filter order_id, -- Output column total, -- Output column status -- Output column);-- Enables index-only scanSelectivity (the fraction of rows returned after applying a predicate) influences column ordering for equality predicates. The general guidance is to place higher-selectivity columns first, but this is nuanced.
Case: High-Selectivity Column First
When column A has selectivity 0.001 (1 in 1000) and column B has selectivity 0.1 (1 in 10):
INDEX (A, B)
-- 1,000,000 rows → A=? → 1,000 rows
-- 1,000 rows → B=? → 100 rows
Navigating first to the highly selective A dramatically reduces the search space immediately.
Case: Lower-Selectivity Column First
Same columns but with the opposite order:
INDEX (B, A)
-- 1,000,000 rows → B=? → 100,000 rows
-- 100,000 rows → A=? → 100 rows
More index entries are traversed before the highly selective filter is applied.
If 95% of queries filter only on column B (lower selectivity) and only 5% use both A and B, the index (B, A) may be preferable. The leading column should match the most common query patterns, even if not the most selective. A single index must serve multiple queries; optimize for the dominant use case.
Selectivity Analysis Process:
1234567891011121314151617181920212223242526272829
-- MySQL: Analyze column cardinalitySELECT COLUMN_NAME, CARDINALITYFROM INFORMATION_SCHEMA.STATISTICSWHERE TABLE_NAME = 'orders'ORDER BY CARDINALITY DESC; -- PostgreSQL: Detailed statisticsSELECT attname AS column_name, n_distinct, most_common_vals, most_common_freqsFROM pg_statsWHERE tablename = 'orders'; -- Calculate selectivity-- For orders table (1,000,000 rows):-- customer_id: 50,000 distinct → selectivity = 0.00002 (high)-- status: 5 distinct → selectivity = 0.2 (low)-- order_date: 1,000 distinct → selectivity = 0.001 (moderate) -- Common query: customer_id = ? AND status = ? AND order_date BETWEEN ? AND ?-- Optimal ordering:-- 1. customer_id (equality, highest selectivity)-- 2. status (equality, lower selectivity)-- 3. order_date (range)CREATE INDEX idx_orders_optimal ON orders (customer_id, status, order_date);Each column in a composite index can be sorted in ascending (ASC) or descending (DESC) order. The sort direction affects whether the index can satisfy ORDER BY clauses without additional sorting.
The Sort Direction Rule:
An index can satisfy an ORDER BY clause if:
The 'all opposite' case works because databases can scan an index in reverse.
| ORDER BY Clause | Can Use Index? | Reason |
|---|---|---|
| ORDER BY A ASC | ✅ Yes | Matches index order |
| ORDER BY A DESC | ✅ Yes | Reverse scan of index |
| ORDER BY A ASC, B ASC | ✅ Yes | Matches index order |
| ORDER BY A DESC, B DESC | ✅ Yes | All reversed, reverse scan |
| ORDER BY A ASC, B DESC | ❌ No | Mixed directions |
| ORDER BY A DESC, B ASC | ❌ No | Mixed directions |
| ORDER BY B ASC | ❌ No | Skips leading column A |
| ORDER BY A ASC, C ASC | ❌ No | Gap (skips B) |
123456789101112131415161718192021222324252627282930313233
-- Scenario: Dashboard showing recent orders by category-- Required: ORDER BY category_id ASC, order_date DESC -- Standard index (both ASC) - CANNOT satisfy mixed orderCREATE INDEX idx_orders_v1 ON orders (category_id ASC, order_date ASC);-- Result: Filesort required for DESC on order_date -- Mixed sort order index - CAN satisfy the queryCREATE INDEX idx_orders_v2 ON orders (category_id ASC, order_date DESC);-- Result: Index scan in natural order, no filesort -- Query execution comparison: -- With idx_orders_v1:EXPLAIN SELECT * FROM ordersWHERE category_id = 5ORDER BY category_id ASC, order_date DESC;-- Extra: Using filesort (expensive for large results) -- With idx_orders_v2:EXPLAIN SELECT * FROM orders WHERE category_id = 5ORDER BY category_id ASC, order_date DESC;-- Extra: Using index (no filesort) -- Modern databases support mixed sort orders:-- MySQL 8.0+, PostgreSQL, SQL Server, Oracle all support thisCREATE INDEX idx_products ON products ( category_id ASC, price DESC, product_name ASC);Examine your top queries' ORDER BY clauses. If they use mixed sort directions, create indexes with matching mixed orders. The filesort elimination can be a significant performance win for large result sets.
Let's apply column ordering principles to common real-world scenarios:
Customer Order History
Queries:
WHERE customer_id = ? ORDER BY order_date DESC LIMIT 10WHERE customer_id = ? AND status = 'shipped' ORDER BY order_date DESCWHERE customer_id = ? AND order_date BETWEEN ? AND ?1234567891011121314151617
-- Optimal index for all three patterns:CREATE INDEX idx_customer_orders ON orders ( customer_id, -- Equality (all queries) status, -- Equality (some queries, adds selectivity) order_date DESC -- Range + Sort (matches ORDER BY DESC)); -- Analysis:-- customer_id: Always equality, most common filter-- status: Sometimes equality, before date (range)-- order_date: Range predicate, also sort column (DESC) -- Alternative if status filtering is rare:CREATE INDEX idx_customer_orders_simple ON orders ( customer_id, order_date DESC);Even experienced developers make column ordering mistakes. Here are the most common pitfalls to avoid:
(col1, col2, col3) because that's how they appear in the table, without analyzing query patterns.(A, B, C) when most queries only filter on B or C, leaving the index unused.(A ASC, B ASC) when queries use ORDER BY A ASC, B DESC, forcing filesoft.(A, B) and (A, B, C), (A, C, B) when one well-designed index would suffice.123456789101112131415161718192021222324
-- MISTAKE 1: Range before equality-- Query: WHERE status = 'active' AND created_at > '2023-01-01'CREATE INDEX idx_bad ON items (created_at, status); -- ❌ Wrong orderCREATE INDEX idx_good ON items (status, created_at); -- ✅ Correct -- MISTAKE 2: Leading column not used-- Query: WHERE department_id = 50CREATE INDEX idx_unused ON employees (manager_id, department_id);-- Index is useless for this query! manager_id is leading. -- MISTAKE 3: Wrong sort direction-- Query: ORDER BY last_name ASC, hire_date DESCCREATE INDEX idx_wrong_sort ON employees (last_name ASC, hire_date ASC);-- Requires filesort because of mixed directions CREATE INDEX idx_right_sort ON employees (last_name ASC, hire_date DESC);-- No filesort needed -- MISTAKE 4: Duplicate coverage-- These indexes are redundant:CREATE INDEX idx1 ON orders (customer_id);CREATE INDEX idx2 ON orders (customer_id, order_date);-- idx2 can serve all queries that idx1 serves!-- Solution: Keep only idx2Always validate index usage with EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL). The query plan reveals whether your carefully designed column order is actually being utilized as expected.
Column ordering is the art and science of composite index design. Here are the essential principles:
What's Next:
Now that you understand how to order columns, the next page explores prefix matching in depth—how the database leverages partial prefixes of your composite index and how to design indexes that serve multiple query patterns through strategic prefix utilization.
You now understand the critical principles of column ordering in composite indexes. You know the leftmost prefix rule, the equality-before-range principle, how selectivity guides ordering decisions, and how to match sort directions. These concepts form the foundation for all advanced composite index design.