Loading content...
If an index is a map from values to record locations, then the search key defines what those values are. The search key is not merely a column you happen to index—it is the organizing principle of the entire index structure. Everything about how the index behaves, what queries it accelerates, and how much space it consumes derives from the search key choice.
Choosing search keys is one of the most impactful decisions a database professional makes. A well-chosen search key turns a 10-minute query into a 10-millisecond query. A poorly chosen search key wastes disk space, slows writes, and provides no query benefit. Understanding search keys deeply—not just what they are, but how they interact with data characteristics and query patterns—is essential for effective database design.
By the end of this page, you will understand the formal definition of a search key, distinguish between simple and composite search keys, analyze how data characteristics affect search key effectiveness, and recognize the relationship between search keys and query predicates.
A search key is the attribute or combination of attributes whose values are used to look up records in an index. More formally:
A search key K for an index I on relation R is an ordered list of attributes (A₁, A₂, ..., Aₙ) from R such that each entry in I contains a value from the domain of K paired with references to records in R having that value.
Critical Clarification: Search Key vs. Primary Key
This is a common source of confusion. The term "search key" has a specific technical meaning in indexing that is distinct from the concept of a primary key:
Primary Key: A constraint on the table that enforces uniqueness and not-null. A table has exactly one primary key.
Search Key: Any attribute(s) used to organize an index. A table can have many indexes with different search keys. The search key need not be unique—indexes can have duplicate key values.
In database literature and textbooks, "search key" and "key" (in index context) always refer to the indexed attribute(s), not to uniqueness. When we say 'the search key is department_id,' we mean department_id values are used to look up records—we make no claim about uniqueness. This differs from casual usage where 'key' often implies uniqueness.
| Aspect | Search Key | Primary Key |
|---|---|---|
| Definition | Attribute(s) used to organize an index | Attribute(s) that uniquely identify a row |
| Uniqueness | Not required | Required |
| Null Values | Often allowed | Never allowed |
| Count per Table | One per index (many possible) | Exactly one per table |
| Purpose | Accelerate lookups | Enforce entity identity |
| Constraint Type | Structure choice | Integrity constraint |
A simple search key consists of a single attribute. This is the most straightforward form of indexing and is appropriate when queries filter on a single column.
Example:
Consider an employees table with columns (emp_id, name, department, salary, hire_date). An index on department uses department as its search key. The index might look like:
Engineering → {rec_1, rec_4, rec_7, rec_12}
Finance → {rec_2, rec_8}
HR → {rec_3, rec_6, rec_11}
Marketing → {rec_5, rec_9, rec_10}
Each distinct department value maps to the set of records belonging to that department.
Characteristics of Simple Search Keys:
Natural Ordering: Most index structures (B+-trees) impose an ordering on key values. For simple keys, this is straightforward—numeric keys are ordered numerically, string keys are ordered lexicographically.
Selectivity: A simple key's selectivity—the average fraction of records matching a given key value—is determined by the attribute's cardinality (number of distinct values) and value distribution.
Storage Efficiency: Index entries are compact—each contains one value from the indexed column plus a pointer. For a 4-byte integer key with 8-byte pointers, each entry is 12 bytes.
Query Matching:
A simple search key index is useful for queries with predicates on that column:
-- Uses the department index effectively
SELECT * FROM employees WHERE department = 'Engineering';
-- Can use the department index for range queries (in B+-trees)
SELECT * FROM employees WHERE department BETWEEN 'A' AND 'F';
-- Cannot use the department index (predicate on different column)
SELECT * FROM employees WHERE salary > 100000;
A composite (compound) search key consists of multiple attributes in a defined order. This powerful technique allows a single index to efficiently serve queries filtering on combinations of columns.
Example:
An index on (department, hire_date) creates a two-level ordering:
(Engineering, 2020-01-15) → rec_7
(Engineering, 2021-03-22) → rec_1
(Engineering, 2022-08-10) → rec_12
(Finance, 2019-07-01) → rec_2
(Finance, 2022-02-28) → rec_8
(HR, 2018-11-30) → rec_3
...
This structure is ideal for queries like:
-- Uses both columns: very efficient
SELECT * FROM employees
WHERE department = 'Engineering' AND hire_date > '2021-01-01';
A composite index on (A, B, C) can be used for predicates on: • A alone • A and B together • A, B, and C together
But it CANNOT be efficiently used for: • B alone (not a prefix) • C alone (not a prefix) • B and C together (not a prefix)
The leftmost columns are the 'prefix' that must be constrained for the index to be effective.
Composite Key Ordering:
The order of attributes in a composite key is critical and cannot be changed after index creation. Consider an index on (A, B) versus (B, A):
| Query Pattern | Index (A, B) | Index (B, A) |
|---|---|---|
WHERE A = x | ✅ Efficient | ❌ Full scan |
WHERE B = y | ❌ Full scan | ✅ Efficient |
WHERE A = x AND B = y | ✅ Efficient | ✅ Efficient |
Both indexes store the same information but support different query patterns efficiently. This is why understanding workload characteristics is essential before creating composite indexes.
| Query Predicate | Index Usable? | Explanation |
|---|---|---|
WHERE department = ? | ✅ Yes | First column is prefix |
WHERE department = ? AND hire_date = ? | ✅ Yes (both) | First two columns are prefix |
WHERE department = ? AND hire_date > ? | ✅ Yes (range) | Prefix with range on second |
WHERE hire_date = ? | ❌ No | First column not constrained |
WHERE department = ? AND salary > ? | ⚠️ Partial | Skips hire_date—only first column used |
WHERE salary = ? | ❌ No | Not in prefix |
Guidelines for Composite Key Column Order:
Place equality-predicate columns first: Columns that appear in = conditions should come before columns that appear in >, <, or BETWEEN conditions.
Higher selectivity first: When multiple columns have equality predicates, placing more selective columns first can reduce the search space faster.
Match query patterns: Analyze your workload. If most queries filter on A and sometimes on A, B, but never on B alone, order (A, B) is correct.
Consider sorting needs: If queries often ORDER BY multiple columns, a composite index in that order can provide sorted results without a separate sort operation.
The data type of the search key profoundly affects index structure, storage requirements, and query performance. Each data type brings unique characteristics that must be understood for effective index design.
Numeric Types (INTEGER, BIGINT, DECIMAL):
Numeric keys are ideal for indexing:
String Types (VARCHAR, TEXT):
String keys introduce complexity:
Date/Time Types (DATE, TIMESTAMP, DATETIME):
Temporal keys have special characteristics:
UUIDs and GUIDs:
Random identifiers create specific challenges:
Random UUIDs (v4) are poor primary key choices for clustered indexes. Each insert goes to a random location, causing excessive page splits and fragmentation. Use sequential IDs, auto-increment integers, or time-ordered UUIDs (v7, ULID) for better index maintenance performance.
| Data Type | Size | Comparison Speed | Index Efficiency | Common Use Case |
|---|---|---|---|---|
| INT/BIGINT | 4-8 bytes | Fastest | Excellent | Primary keys, foreign keys |
| VARCHAR(50) | 1-52 bytes | Medium | Good | Names, codes, identifiers |
| TEXT | Variable | Slow | Poor (often truncated) | Full-text search only |
| TIMESTAMP | 8 bytes | Fast | Excellent | Time-series, audit trails |
| UUID v4 | 16 bytes | Medium | Poor (random) | Distributed systems (consider v7) |
| DECIMAL(10,2) | 5-9 bytes | Medium | Good | Financial amounts |
Two related but distinct concepts determine how effective a search key will be: cardinality (a property of the data) and selectivity (a property of queries against that data).
Cardinality:
Cardinality is the number of distinct values in the search key column(s). For a table with N rows:
Examples:
employee_id in employees table: Cardinality = N (unique per row)country in customers table: Cardinality = ~200 (number of countries)is_active boolean column: Cardinality = 2 (true/false)Selectivity:
Selectivity measures how many rows match a particular predicate value. It is typically expressed as a fraction:
Selectivity = (Number of rows matching the predicate) / (Total rows)
Low selectivity (bad for indexing): Many rows match
WHERE is_active = true on a table where 95% are active → selectivity = 0.95High selectivity (good for indexing): Few rows match
WHERE employee_id = 12345 → selectivity = 1/N ≈ 0The Relationship:
High cardinality often (but not always) implies high selectivity. A column with 1 million distinct values is likely to have high selectivity for equality predicates. However, skewed distributions can break this:
The query optimizer maintains histograms and statistics to estimate selectivity for specific predicate values, not just average selectivity. This is crucial: knowing that WHERE status = 'active' matches 90% of rows while WHERE status = 'suspended' matches 0.1% allows the optimizer to choose table scan for the former and index lookup for the latter.
In ordered indexes (B+-trees and similar structures), the search key defines the sort order of index entries. This ordering is not merely an implementation detail—it has profound implications for query processing.
The Ordering Guarantee:
An index on search key K guarantees that entries are stored in sorted order by K. This enables:
Example Impact:
-- With index on (department, hire_date):
-- This query gets sorted results 'for free'
SELECT * FROM employees
WHERE department = 'Engineering'
ORDER BY hire_date;
-- Execution: Index scan, no sorting needed
-- This query requires sorting
SELECT * FROM employees
WHERE department = 'Engineering'
ORDER BY salary;
-- Execution: Index scan + sort operation (salary not in index)
Ascending vs. Descending Order:
Most databases allow specifying sort direction for each column in a composite index:
CREATE INDEX idx_orders ON orders (customer_id ASC, order_date DESC);
This creates an index where:
ORDER BY customer_id ASC, order_date DESCORDER BY customer_id DESC, order_date DESC (mixed direction mismatch on customer_id)Sort Order Compatibility Matrix:
For index (A ASC, B DESC):
| Query ORDER BY | Compatible? | Notes |
|---|---|---|
| A ASC, B DESC | ✅ Yes | Exact match |
| A DESC, B ASC | ✅ Yes | Reverse scan |
| A ASC, B ASC | ❌ No | B direction mismatch |
| B DESC, A ASC | ❌ No | Column order mismatch |
Most B+-tree implementations support backward scanning—reading the index in reverse order. This means an ASC index can satisfy DESC queries (by scanning backward), but at a slight performance cost due to reduced cache efficiency and pre-fetch effectiveness.
Selecting the right search key(s) for an index is as much art as science. It requires understanding the data, the queries, and the trade-offs. Here is a systematic approach:
Step 1: Analyze Query Workload
Before creating any index, collect and analyze query patterns:
Step 2: Identify Candidate Keys
For each query pattern, identify which columns would form an effective search key:
Step 3: Evaluate Data Characteristics
For each candidate key, evaluate:
Step 4: Consider Write Impact
Each index adds write overhead. Consider:
Step 5: Prototype and Measure
Create the index and measure actual performance:
The search key is the foundation upon which an index is built. Understanding search keys deeply enables you to design effective indexes that accelerate the right queries without undue overhead. Let's consolidate the key concepts:
What's Next:
With a solid understanding of search keys, we now examine what the index actually stores: index entries. The next page explores the structure of index entries, the different ways they can reference data, and how these choices affect index size and lookup performance.
You now understand search keys at a professional level: what they are, how to choose them, and how their characteristics determine index effectiveness. This knowledge is essential for designing indexes that actually improve performance rather than just consuming resources.