Index Concept - Learning Module

Loading content...

0/252

Search Key

The Organizing Principle

If an index is a map from values to record locations, then the search key defines what those values are. The search key is not merely a column you happen to index—it is the organizing principle of the entire index structure. Everything about how the index behaves, what queries it accelerates, and how much space it consumes derives from the search key choice.

Choosing search keys is one of the most impactful decisions a database professional makes. A well-chosen search key turns a 10-minute query into a 10-millisecond query. A poorly chosen search key wastes disk space, slows writes, and provides no query benefit. Understanding search keys deeply—not just what they are, but how they interact with data characteristics and query patterns—is essential for effective database design.

What You Will Learn

By the end of this page, you will understand the formal definition of a search key, distinguish between simple and composite search keys, analyze how data characteristics affect search key effectiveness, and recognize the relationship between search keys and query predicates.

Formal Definition of a Search Key

A search key is the attribute or combination of attributes whose values are used to look up records in an index. More formally:

A search key K for an index I on relation R is an ordered list of attributes (A₁, A₂, ..., Aₙ) from R such that each entry in I contains a value from the domain of K paired with references to records in R having that value.

Critical Clarification: Search Key vs. Primary Key

This is a common source of confusion. The term "search key" has a specific technical meaning in indexing that is distinct from the concept of a primary key:

Primary Key: A constraint on the table that enforces uniqueness and not-null. A table has exactly one primary key.
Search Key: Any attribute(s) used to organize an index. A table can have many indexes with different search keys. The search key need not be unique—indexes can have duplicate key values.

Terminology Alert

In database literature and textbooks, "search key" and "key" (in index context) always refer to the indexed attribute(s), not to uniqueness. When we say 'the search key is department_id,' we mean department_id values are used to look up records—we make no claim about uniqueness. This differs from casual usage where 'key' often implies uniqueness.

Search Key vs. Primary Key Comparison
Aspect	Search Key	Primary Key
Definition	Attribute(s) used to organize an index	Attribute(s) that uniquely identify a row
Uniqueness	Not required	Required
Null Values	Often allowed	Never allowed
Count per Table	One per index (many possible)	Exactly one per table
Purpose	Accelerate lookups	Enforce entity identity
Constraint Type	Structure choice	Integrity constraint

Simple Search Keys: Single-Attribute Indexing

A simple search key consists of a single attribute. This is the most straightforward form of indexing and is appropriate when queries filter on a single column.

Example:

Consider an employees table with columns (emp_id, name, department, salary, hire_date). An index on department uses department as its search key. The index might look like:

Engineering → {rec_1, rec_4, rec_7, rec_12}
Finance → {rec_2, rec_8}
HR → {rec_3, rec_6, rec_11}
Marketing → {rec_5, rec_9, rec_10}

Each distinct department value maps to the set of records belonging to that department.

Characteristics of Simple Search Keys:

Natural Ordering: Most index structures (B+-trees) impose an ordering on key values. For simple keys, this is straightforward—numeric keys are ordered numerically, string keys are ordered lexicographically.
Selectivity: A simple key's selectivity—the average fraction of records matching a given key value—is determined by the attribute's cardinality (number of distinct values) and value distribution.
Storage Efficiency: Index entries are compact—each contains one value from the indexed column plus a pointer. For a 4-byte integer key with 8-byte pointers, each entry is 12 bytes.

Query Matching:

A simple search key index is useful for queries with predicates on that column:

-- Uses the department index effectively
SELECT * FROM employees WHERE department = 'Engineering';

-- Can use the department index for range queries (in B+-trees)
SELECT * FROM employees WHERE department BETWEEN 'A' AND 'F';

-- Cannot use the department index (predicate on different column)
SELECT * FROM employees WHERE salary > 100000;

When to Use Simple Search Keys

•High cardinality columns: Employee IDs, order numbers, timestamps—many distinct values mean high selectivity
•Frequent single-column lookups: Common equality or range predicates on one attribute
•Foreign key columns: Fast joins on the referencing side
•Sort columns: When ORDER BY on this column is common

Composite Search Keys: Multi-Attribute Indexing

A composite (compound) search key consists of multiple attributes in a defined order. This powerful technique allows a single index to efficiently serve queries filtering on combinations of columns.

Example:

An index on (department, hire_date) creates a two-level ordering:

First, entries are sorted by department
Within each department, entries are sorted by hire_date

(Engineering, 2020-01-15) → rec_7
(Engineering, 2021-03-22) → rec_1
(Engineering, 2022-08-10) → rec_12
(Finance, 2019-07-01) → rec_2
(Finance, 2022-02-28) → rec_8
(HR, 2018-11-30) → rec_3
...

This structure is ideal for queries like:

-- Uses both columns: very efficient
SELECT * FROM employees 
WHERE department = 'Engineering' AND hire_date > '2021-01-01';

The Prefix Rule

A composite index on (A, B, C) can be used for predicates on: • A alone • A and B together • A, B, and C together

But it CANNOT be efficiently used for: • B alone (not a prefix) • C alone (not a prefix) • B and C together (not a prefix)

The leftmost columns are the 'prefix' that must be constrained for the index to be effective.

Composite Key Ordering:

The order of attributes in a composite key is critical and cannot be changed after index creation. Consider an index on (A, B) versus (B, A):

Query Pattern	Index (A, B)	Index (B, A)
`WHERE A = x`	✅ Efficient	❌ Full scan
`WHERE B = y`	❌ Full scan	✅ Efficient
`WHERE A = x AND B = y`	✅ Efficient	✅ Efficient

Both indexes store the same information but support different query patterns efficiently. This is why understanding workload characteristics is essential before creating composite indexes.

Composite Index (department, hire_date, salary) Query Usability
Query Predicate	Index Usable?	Explanation
`WHERE department = ?`	✅ Yes	First column is prefix
`WHERE department = ? AND hire_date = ?`	✅ Yes (both)	First two columns are prefix
`WHERE department = ? AND hire_date > ?`	✅ Yes (range)	Prefix with range on second
`WHERE hire_date = ?`	❌ No	First column not constrained
`WHERE department = ? AND salary > ?`	⚠️ Partial	Skips hire_date—only first column used
`WHERE salary = ?`	❌ No	Not in prefix

Guidelines for Composite Key Column Order:

Place equality-predicate columns first: Columns that appear in = conditions should come before columns that appear in >, <, or BETWEEN conditions.
Higher selectivity first: When multiple columns have equality predicates, placing more selective columns first can reduce the search space faster.
Match query patterns: Analyze your workload. If most queries filter on A and sometimes on A, B, but never on B alone, order (A, B) is correct.
Consider sorting needs: If queries often ORDER BY multiple columns, a composite index in that order can provide sorted results without a separate sort operation.

Search Key Data Type Considerations

The data type of the search key profoundly affects index structure, storage requirements, and query performance. Each data type brings unique characteristics that must be understood for effective index design.

Numeric Types (INTEGER, BIGINT, DECIMAL):

Numeric keys are ideal for indexing:

Compact storage: 4 bytes for INT, 8 for BIGINT
Fast comparison: Single CPU instruction for equality/ordering
Predictable distribution: Often naturally sequential (auto-increment IDs)
No collation complexity: Universal ordering semantics

String Types (VARCHAR, TEXT):

String keys introduce complexity:

Variable length: Storage depends on actual values
Collation-dependent ordering: 'a' < 'B' in some collations, 'B' < 'a' in others
Prefix complexity: 'apple' and 'application' share a prefix—index must store differentiating suffixes
Case sensitivity: May need separate indexes for case-sensitive and case-insensitive searches

Date/Time Types (DATE, TIMESTAMP, DATETIME):

Temporal keys have special characteristics:

Natural range queries: Time data is inherently ordered
Clustering opportunity: Recent data is often accessed together (temporal locality)
Precision matters: TIMESTAMP with milliseconds vs. DATE affects cardinality
Time zone complexity: Storage format affects comparison semantics

UUIDs and GUIDs:

Random identifiers create specific challenges:

No locality: Random UUIDs scatter insertions across the entire index
Large size: 16 bytes per key vs. 4 bytes for integer
Poor cache utilization: Random access patterns defeat caching strategies
Alternative: UUID v7 and ULID provide time-ordered UUIDs with better indexing behavior

The UUID Indexing Problem

Random UUIDs (v4) are poor primary key choices for clustered indexes. Each insert goes to a random location, causing excessive page splits and fragmentation. Use sequential IDs, auto-increment integers, or time-ordered UUIDs (v7, ULID) for better index maintenance performance.

Search Key Data Type Comparison
Data Type	Size	Comparison Speed	Index Efficiency	Common Use Case
INT/BIGINT	4-8 bytes	Fastest	Excellent	Primary keys, foreign keys
VARCHAR(50)	1-52 bytes	Medium	Good	Names, codes, identifiers
TEXT	Variable	Slow	Poor (often truncated)	Full-text search only
TIMESTAMP	8 bytes	Fast	Excellent	Time-series, audit trails
UUID v4	16 bytes	Medium	Poor (random)	Distributed systems (consider v7)
DECIMAL(10,2)	5-9 bytes	Medium	Good	Financial amounts

Cardinality and Selectivity: Key Effectiveness Metrics

Two related but distinct concepts determine how effective a search key will be: cardinality (a property of the data) and selectivity (a property of queries against that data).

Cardinality:

Cardinality is the number of distinct values in the search key column(s). For a table with N rows:

Maximum cardinality = N: Every row has a unique value (unique key)
Minimum cardinality = 1: Every row has the same value (useless for indexing)

Examples:

employee_id in employees table: Cardinality = N (unique per row)
country in customers table: Cardinality = ~200 (number of countries)
is_active boolean column: Cardinality = 2 (true/false)

Selectivity:

Selectivity measures how many rows match a particular predicate value. It is typically expressed as a fraction:

Selectivity = (Number of rows matching the predicate) / (Total rows)

Low selectivity (bad for indexing): Many rows match

WHERE is_active = true on a table where 95% are active → selectivity = 0.95
Index provides little benefit over full scan

High selectivity (good for indexing): Few rows match

WHERE employee_id = 12345 → selectivity = 1/N ≈ 0
Index dramatically reduces I/O

The Relationship:

High cardinality often (but not always) implies high selectivity. A column with 1 million distinct values is likely to have high selectivity for equality predicates. However, skewed distributions can break this:

Column has 1 million distinct values (high cardinality)
But one value appears in 50% of rows
Queries for that common value have low selectivity despite high cardinality

The Query Optimizer's View

The query optimizer maintains histograms and statistics to estimate selectivity for specific predicate values, not just average selectivity. This is crucial: knowing that WHERE status = 'active' matches 90% of rows while WHERE status = 'suspended' matches 0.1% allows the optimizer to choose table scan for the former and index lookup for the latter.

Good Index Candidates

•Unique identifiers (IDs, codes)
•Timestamp columns with range queries
•Foreign keys for join operations
•Columns with uniform value distribution
•High-cardinality columns with equality predicates

Poor Index Candidates

•Boolean flags (true/false only)
•Status columns with few values
•Highly skewed distributions
•Rarely queried columns
•Columns modified in most UPDATE operations

Search Keys and Index Ordering

In ordered indexes (B+-trees and similar structures), the search key defines the sort order of index entries. This ordering is not merely an implementation detail—it has profound implications for query processing.

The Ordering Guarantee:

An index on search key K guarantees that entries are stored in sorted order by K. This enables:

Binary search for equality: O(log n) lookup time
Range queries: Find start point, scan sequentially
Sorted output: Index scan produces results in K order
MIN/MAX operations: Directly accessible at index extremities

Example Impact:

-- With index on (department, hire_date):

-- This query gets sorted results 'for free'
SELECT * FROM employees 
WHERE department = 'Engineering'
ORDER BY hire_date;
-- Execution: Index scan, no sorting needed

-- This query requires sorting
SELECT * FROM employees 
WHERE department = 'Engineering'
ORDER BY salary;
-- Execution: Index scan + sort operation (salary not in index)

Ascending vs. Descending Order:

Most databases allow specifying sort direction for each column in a composite index:

CREATE INDEX idx_orders ON orders (customer_id ASC, order_date DESC);

This creates an index where:

Within each customer, orders are sorted newest-first
Perfectly matches queries like: ORDER BY customer_id ASC, order_date DESC
Cannot efficiently satisfy: ORDER BY customer_id DESC, order_date DESC (mixed direction mismatch on customer_id)

Sort Order Compatibility Matrix:

For index (A ASC, B DESC):

Query ORDER BY	Compatible?	Notes
A ASC, B DESC	✅ Yes	Exact match
A DESC, B ASC	✅ Yes	Reverse scan
A ASC, B ASC	❌ No	B direction mismatch
B DESC, A ASC	❌ No	Column order mismatch

Backward Index Scans

Most B+-tree implementations support backward scanning—reading the index in reverse order. This means an ASC index can satisfy DESC queries (by scanning backward), but at a slight performance cost due to reduced cache efficiency and pre-fetch effectiveness.

Search Key Selection Strategies

Selecting the right search key(s) for an index is as much art as science. It requires understanding the data, the queries, and the trade-offs. Here is a systematic approach:

Step 1: Analyze Query Workload

Before creating any index, collect and analyze query patterns:

What columns appear in WHERE clauses?
What columns appear in JOIN conditions?
What columns appear in ORDER BY clauses?
What columns appear in GROUP BY clauses?
What is the frequency of each query pattern?

Step 2: Identify Candidate Keys

For each query pattern, identify which columns would form an effective search key:

Equality predicates → Candidate for index key
Range predicates → Candidate for index key (but position matters)
Join columns → Candidate for index on join attribute
Sort columns → Candidate if sorting cost is high

Step 3: Evaluate Data Characteristics

For each candidate key, evaluate:

Cardinality: How many distinct values?
Distribution: Are values uniformly distributed or skewed?
Null percentage: How many nulls? (Affects index size and usability)
Correlation: Are candidate columns correlated? (Affects composite key design)

Step 4: Consider Write Impact

Each index adds write overhead. Consider:

How often is the indexed column updated?
What is the read/write ratio for this table?
Can we consolidate multiple indexes into one composite?

Step 5: Prototype and Measure

Create the index and measure actual performance:

Is the index being used? (Check EXPLAIN plans)
What is the improvement in query time?
What is the impact on write operations?
Does the optimizer's behavior match expectations?

Search Key Selection Principles

•Leftmost prefix rule: Design composite keys so the most frequently filtered column is first
•Equality before range: Place equality-predicate columns before range-predicate columns
•Covering index potential: Include additional columns if they allow index-only scans
•Update frequency inverse: Avoid indexing columns that change frequently
•Consolidation over proliferation: One composite index often replaces multiple simple indexes

Summary: Mastering Search Keys

The search key is the foundation upon which an index is built. Understanding search keys deeply enables you to design effective indexes that accelerate the right queries without undue overhead. Let's consolidate the key concepts:

Key Takeaways

•Search key ≠ Primary key — Search key is the attribute(s) organizing an index; it need not be unique or identify rows.
•Simple keys are straightforward — Single-column indexes serve queries filtering on that column with equality or range predicates.
•Composite keys are powerful but ordered — Multi-column indexes are highly effective but only for queries matching their prefix.
•Data types affect performance — Small, comparable types (integers) outperform variable-length types (strings) for indexing.
•Cardinality and selectivity determine value — High cardinality usually means high selectivity, making the index worthwhile.
•Index ordering enables sorted output — The search key determines the index's sort order, which can eliminate explicit sorts.
•Selection requires systematic analysis — Effective key selection combines workload analysis, data profiling, and performance testing.

What's Next:

With a solid understanding of search keys, we now examine what the index actually stores: index entries. The next page explores the structure of index entries, the different ways they can reference data, and how these choices affect index size and lookup performance.

Page Complete

You now understand search keys at a professional level: what they are, how to choose them, and how their characteristics determine index effectiveness. This knowledge is essential for designing indexes that actually improve performance rather than just consuming resources.

Search Key

The Organizing Principle

What You Will Learn

Formal Definition of a Search Key

A search key is the attribute or combination of attributes whose values are used to look up records in an index. More formally:

A search key K for an index I on relation R is an ordered list of attributes (A₁, A₂, ..., Aₙ) from R such that each entry in I contains a value from the domain of K paired with references to records in R having that value.

Critical Clarification: Search Key vs. Primary Key

This is a common source of confusion. The term "search key" has a specific technical meaning in indexing that is distinct from the concept of a primary key:

Primary Key: A constraint on the table that enforces uniqueness and not-null. A table has exactly one primary key.
Search Key: Any attribute(s) used to organize an index. A table can have many indexes with different search keys. The search key need not be unique—indexes can have duplicate key values.

Terminology Alert

Search Key vs. Primary Key Comparison
Aspect	Search Key	Primary Key
Definition	Attribute(s) used to organize an index	Attribute(s) that uniquely identify a row
Uniqueness	Not required	Required
Null Values	Often allowed	Never allowed
Count per Table	One per index (many possible)	Exactly one per table
Purpose	Accelerate lookups	Enforce entity identity
Constraint Type	Structure choice	Integrity constraint

Simple Search Keys: Single-Attribute Indexing

A simple search key consists of a single attribute. This is the most straightforward form of indexing and is appropriate when queries filter on a single column.

Example:

Consider an employees table with columns (emp_id, name, department, salary, hire_date). An index on department uses department as its search key. The index might look like:

Engineering → {rec_1, rec_4, rec_7, rec_12}
Finance → {rec_2, rec_8}
HR → {rec_3, rec_6, rec_11}
Marketing → {rec_5, rec_9, rec_10}

Each distinct department value maps to the set of records belonging to that department.

Characteristics of Simple Search Keys:

Natural Ordering: Most index structures (B+-trees) impose an ordering on key values. For simple keys, this is straightforward—numeric keys are ordered numerically, string keys are ordered lexicographically.
Selectivity: A simple key's selectivity—the average fraction of records matching a given key value—is determined by the attribute's cardinality (number of distinct values) and value distribution.
Storage Efficiency: Index entries are compact—each contains one value from the indexed column plus a pointer. For a 4-byte integer key with 8-byte pointers, each entry is 12 bytes.

Query Matching:

A simple search key index is useful for queries with predicates on that column:

-- Uses the department index effectively
SELECT * FROM employees WHERE department = 'Engineering';

-- Can use the department index for range queries (in B+-trees)
SELECT * FROM employees WHERE department BETWEEN 'A' AND 'F';

-- Cannot use the department index (predicate on different column)
SELECT * FROM employees WHERE salary > 100000;

When to Use Simple Search Keys

•High cardinality columns: Employee IDs, order numbers, timestamps—many distinct values mean high selectivity
•Frequent single-column lookups: Common equality or range predicates on one attribute
•Foreign key columns: Fast joins on the referencing side
•Sort columns: When ORDER BY on this column is common

Composite Search Keys: Multi-Attribute Indexing

Example:

An index on (department, hire_date) creates a two-level ordering:

First, entries are sorted by department
Within each department, entries are sorted by hire_date

(Engineering, 2020-01-15) → rec_7
(Engineering, 2021-03-22) → rec_1
(Engineering, 2022-08-10) → rec_12
(Finance, 2019-07-01) → rec_2
(Finance, 2022-02-28) → rec_8
(HR, 2018-11-30) → rec_3
...

This structure is ideal for queries like:

-- Uses both columns: very efficient
SELECT * FROM employees 
WHERE department = 'Engineering' AND hire_date > '2021-01-01';

The Prefix Rule

A composite index on (A, B, C) can be used for predicates on: • A alone • A and B together • A, B, and C together

But it CANNOT be efficiently used for: • B alone (not a prefix) • C alone (not a prefix) • B and C together (not a prefix)

The leftmost columns are the 'prefix' that must be constrained for the index to be effective.

Composite Key Ordering:

The order of attributes in a composite key is critical and cannot be changed after index creation. Consider an index on (A, B) versus (B, A):

Query Pattern	Index (A, B)	Index (B, A)
`WHERE A = x`	✅ Efficient	❌ Full scan
`WHERE B = y`	❌ Full scan	✅ Efficient
`WHERE A = x AND B = y`	✅ Efficient	✅ Efficient

Both indexes store the same information but support different query patterns efficiently. This is why understanding workload characteristics is essential before creating composite indexes.

Composite Index (department, hire_date, salary) Query Usability
Query Predicate	Index Usable?	Explanation
`WHERE department = ?`	✅ Yes	First column is prefix
`WHERE department = ? AND hire_date = ?`	✅ Yes (both)	First two columns are prefix
`WHERE department = ? AND hire_date > ?`	✅ Yes (range)	Prefix with range on second
`WHERE hire_date = ?`	❌ No	First column not constrained
`WHERE department = ? AND salary > ?`	⚠️ Partial	Skips hire_date—only first column used
`WHERE salary = ?`	❌ No	Not in prefix

Guidelines for Composite Key Column Order:

Place equality-predicate columns first: Columns that appear in = conditions should come before columns that appear in >, <, or BETWEEN conditions.
Higher selectivity first: When multiple columns have equality predicates, placing more selective columns first can reduce the search space faster.
Match query patterns: Analyze your workload. If most queries filter on A and sometimes on A, B, but never on B alone, order (A, B) is correct.
Consider sorting needs: If queries often ORDER BY multiple columns, a composite index in that order can provide sorted results without a separate sort operation.

Search Key Data Type Considerations

Numeric Types (INTEGER, BIGINT, DECIMAL):

Numeric keys are ideal for indexing:

Compact storage: 4 bytes for INT, 8 for BIGINT
Fast comparison: Single CPU instruction for equality/ordering
Predictable distribution: Often naturally sequential (auto-increment IDs)
No collation complexity: Universal ordering semantics

String Types (VARCHAR, TEXT):

String keys introduce complexity:

Variable length: Storage depends on actual values
Collation-dependent ordering: 'a' < 'B' in some collations, 'B' < 'a' in others
Prefix complexity: 'apple' and 'application' share a prefix—index must store differentiating suffixes
Case sensitivity: May need separate indexes for case-sensitive and case-insensitive searches

Date/Time Types (DATE, TIMESTAMP, DATETIME):

Temporal keys have special characteristics:

Natural range queries: Time data is inherently ordered
Clustering opportunity: Recent data is often accessed together (temporal locality)
Precision matters: TIMESTAMP with milliseconds vs. DATE affects cardinality
Time zone complexity: Storage format affects comparison semantics

UUIDs and GUIDs:

Random identifiers create specific challenges:

No locality: Random UUIDs scatter insertions across the entire index
Large size: 16 bytes per key vs. 4 bytes for integer
Poor cache utilization: Random access patterns defeat caching strategies
Alternative: UUID v7 and ULID provide time-ordered UUIDs with better indexing behavior

The UUID Indexing Problem

Search Key Data Type Comparison
Data Type	Size	Comparison Speed	Index Efficiency	Common Use Case
INT/BIGINT	4-8 bytes	Fastest	Excellent	Primary keys, foreign keys
VARCHAR(50)	1-52 bytes	Medium	Good	Names, codes, identifiers
TEXT	Variable	Slow	Poor (often truncated)	Full-text search only
TIMESTAMP	8 bytes	Fast	Excellent	Time-series, audit trails
UUID v4	16 bytes	Medium	Poor (random)	Distributed systems (consider v7)
DECIMAL(10,2)	5-9 bytes	Medium	Good	Financial amounts

Cardinality and Selectivity: Key Effectiveness Metrics

Two related but distinct concepts determine how effective a search key will be: cardinality (a property of the data) and selectivity (a property of queries against that data).

Cardinality:

Cardinality is the number of distinct values in the search key column(s). For a table with N rows:

Maximum cardinality = N: Every row has a unique value (unique key)
Minimum cardinality = 1: Every row has the same value (useless for indexing)

Examples:

employee_id in employees table: Cardinality = N (unique per row)
country in customers table: Cardinality = ~200 (number of countries)
is_active boolean column: Cardinality = 2 (true/false)

Selectivity:

Selectivity measures how many rows match a particular predicate value. It is typically expressed as a fraction:

Selectivity = (Number of rows matching the predicate) / (Total rows)

Low selectivity (bad for indexing): Many rows match

WHERE is_active = true on a table where 95% are active → selectivity = 0.95
Index provides little benefit over full scan

High selectivity (good for indexing): Few rows match

WHERE employee_id = 12345 → selectivity = 1/N ≈ 0
Index dramatically reduces I/O

The Relationship:

Column has 1 million distinct values (high cardinality)
But one value appears in 50% of rows
Queries for that common value have low selectivity despite high cardinality

The Query Optimizer's View

Good Index Candidates

•Unique identifiers (IDs, codes)
•Timestamp columns with range queries
•Foreign keys for join operations
•Columns with uniform value distribution
•High-cardinality columns with equality predicates

Poor Index Candidates

•Boolean flags (true/false only)
•Status columns with few values
•Highly skewed distributions
•Rarely queried columns
•Columns modified in most UPDATE operations

Search Keys and Index Ordering

The Ordering Guarantee:

An index on search key K guarantees that entries are stored in sorted order by K. This enables:

Binary search for equality: O(log n) lookup time
Range queries: Find start point, scan sequentially
Sorted output: Index scan produces results in K order
MIN/MAX operations: Directly accessible at index extremities

Example Impact:

-- With index on (department, hire_date):

-- This query gets sorted results 'for free'
SELECT * FROM employees 
WHERE department = 'Engineering'
ORDER BY hire_date;
-- Execution: Index scan, no sorting needed

-- This query requires sorting
SELECT * FROM employees 
WHERE department = 'Engineering'
ORDER BY salary;
-- Execution: Index scan + sort operation (salary not in index)

Ascending vs. Descending Order:

Most databases allow specifying sort direction for each column in a composite index:

CREATE INDEX idx_orders ON orders (customer_id ASC, order_date DESC);

This creates an index where:

Within each customer, orders are sorted newest-first
Perfectly matches queries like: ORDER BY customer_id ASC, order_date DESC
Cannot efficiently satisfy: ORDER BY customer_id DESC, order_date DESC (mixed direction mismatch on customer_id)

Sort Order Compatibility Matrix:

For index (A ASC, B DESC):

Query ORDER BY	Compatible?	Notes
A ASC, B DESC	✅ Yes	Exact match
A DESC, B ASC	✅ Yes	Reverse scan
A ASC, B ASC	❌ No	B direction mismatch
B DESC, A ASC	❌ No	Column order mismatch

Backward Index Scans

Search Key Selection Strategies

Selecting the right search key(s) for an index is as much art as science. It requires understanding the data, the queries, and the trade-offs. Here is a systematic approach:

Step 1: Analyze Query Workload

Before creating any index, collect and analyze query patterns:

What columns appear in WHERE clauses?
What columns appear in JOIN conditions?
What columns appear in ORDER BY clauses?
What columns appear in GROUP BY clauses?
What is the frequency of each query pattern?

Step 2: Identify Candidate Keys

For each query pattern, identify which columns would form an effective search key:

Equality predicates → Candidate for index key
Range predicates → Candidate for index key (but position matters)
Join columns → Candidate for index on join attribute
Sort columns → Candidate if sorting cost is high

Step 3: Evaluate Data Characteristics

For each candidate key, evaluate:

Cardinality: How many distinct values?
Distribution: Are values uniformly distributed or skewed?
Null percentage: How many nulls? (Affects index size and usability)
Correlation: Are candidate columns correlated? (Affects composite key design)

Step 4: Consider Write Impact

Each index adds write overhead. Consider:

How often is the indexed column updated?
What is the read/write ratio for this table?
Can we consolidate multiple indexes into one composite?

Step 5: Prototype and Measure

Create the index and measure actual performance:

Is the index being used? (Check EXPLAIN plans)
What is the improvement in query time?
What is the impact on write operations?
Does the optimizer's behavior match expectations?

Search Key Selection Principles

•Leftmost prefix rule: Design composite keys so the most frequently filtered column is first
•Equality before range: Place equality-predicate columns before range-predicate columns
•Covering index potential: Include additional columns if they allow index-only scans
•Update frequency inverse: Avoid indexing columns that change frequently
•Consolidation over proliferation: One composite index often replaces multiple simple indexes

Summary: Mastering Search Keys

Key Takeaways

•Search key ≠ Primary key — Search key is the attribute(s) organizing an index; it need not be unique or identify rows.
•Simple keys are straightforward — Single-column indexes serve queries filtering on that column with equality or range predicates.
•Composite keys are powerful but ordered — Multi-column indexes are highly effective but only for queries matching their prefix.
•Data types affect performance — Small, comparable types (integers) outperform variable-length types (strings) for indexing.
•Cardinality and selectivity determine value — High cardinality usually means high selectivity, making the index worthwhile.
•Index ordering enables sorted output — The search key determines the index's sort order, which can eliminate explicit sorts.
•Selection requires systematic analysis — Effective key selection combines workload analysis, data profiling, and performance testing.

What's Next:

Page Complete