Index Concept - Learning Module

Loading content...

0/252

Lookup Acceleration

The Heart of Indexing Performance

Everything we've discussed about indexes—their definition, search keys, and entries—serves a single purpose: accelerating data lookup. An index that cannot find data faster than a table scan is worthless. Understanding exactly how indexes achieve lookup acceleration is fundamental to leveraging them effectively.

Lookup acceleration isn't magic. It follows from well-understood computer science principles: reducing search space, exploiting ordering, and trading space for time. By the end of this page, you'll understand precisely why an index lookup completes in milliseconds while a table scan takes minutes—not as a mysterious database optimization, but as the predictable outcome of algorithmic design.

What You Will Learn

By the end of this page, you will understand the complexity theory behind index acceleration, trace the step-by-step process of an index lookup, analyze how different index types achieve their performance characteristics, and calculate expected I/O costs for various lookup operations.

The Problem: Linear Search is Unsustainable

To appreciate what indexes provide, we must first understand what happens without them. Consider a table with N rows stored as a heap file (unsorted, unindexed). To find a specific record:

SELECT * FROM employees WHERE employee_id = 12345;

Without an index, the database must:

Start at the first page of the table
Read each page into memory
Examine every record on each page
Check if employee_id = 12345
Continue until the record is found or all pages are exhausted

This is fundamentally O(N) — the time grows linearly with data size.

The Scale Problem:

Linear time sounds acceptable until you consider real-world scale:

Table Scan Time at Scale (assuming 1ms per page, 100 rows per page)
Row Count	Pages	Scan Time	Index Lookup (log₂N levels)
1,000	10	10ms	~2ms (4 levels)
100,000	1,000	1 second	~3ms (5 levels)
10,000,000	100,000	100 seconds	~4ms (7 levels)
1,000,000,000	10,000,000	2.8 hours	~5ms (10 levels)

The Exponential Gap

Notice how the gap between scan and index lookup grows exponentially. At 1 billion rows, the table scan takes 2.8 hours while the index lookup takes 5 milliseconds—a factor of 2 million difference. This isn't a small optimization; it's the difference between a usable system and an unusable one.

The Impossibility of Linear Scale:

For interactive applications, queries must complete in tens or hundreds of milliseconds. At 10 million rows, a 100-second table scan is unacceptable. At 1 billion rows, a 2.8-hour table scan is absurd.

Moreover, linear scan has no parallelization benefit for selectivity. Whether you're looking for 1 row or 1 million rows, you must still examine every page. The only parallel optimization is reading multiple pages simultaneously—which improves throughput but not latency for single queries.

Indexes break this linear dependency completely.

Logarithmic Time Through Tree Structures

The most common index structure—the B+-tree—achieves O(log N) lookup time by organizing entries in a balanced tree. This is the fundamental acceleration mechanism that makes modern databases possible.

The Key Insight: Divide and Conquer

Instead of examining all N entries, a tree structure allows us to:

Examine a root node with M pointers to child subtrees
Determine which child subtree contains our target
Move to that child (eliminating N - N/M entries from consideration)
Repeat until we reach a leaf containing the actual entry

With each step, we eliminate a fraction of the search space. The number of steps needed is the tree's height, which is log(N) for a balanced tree.

B+-Tree Height Calculation:

For a B+-tree with:

Page size P (typically 8-16 KB)
Internal node fanout F (typically 200-500 for integer keys)
N total records

The height H is approximately:

H = ⌈log_F(N)⌉

Example:

100 million records (N = 10⁸)
Fanout of 300 (F = 300)
Height = log₃₀₀(10⁸) = log(10⁸)/log(300) ≈ 3.23

This means 4 page reads to locate any record among 100 million. And because root and high-level internal nodes are frequently accessed, they're typically cached in memory, reducing I/O further.

The Fanout Advantage:

B+-trees achieve high fanout (branching factor) because:

Each internal node contains many keys and pointers (not just 2 like binary trees)
Large page sizes (8-16 KB) hold many small entries
Keys in internal nodes can be truncated to separators

A binary tree with 100 million nodes is 27 levels deep. A B+-tree with 100 million entries is 4 levels deep. This difference—enabled by fanout—is why B+-trees dominate in databases.

Converting Mermaid diagram...

Step-by-Step Index Lookup Process

Let's trace exactly what happens during an index lookup. Understanding this process helps diagnose performance issues and reason about expected costs.

Query:

SELECT name, salary FROM employees WHERE employee_id = 5042;

Assumptions:

B+-tree index on employee_id (primary clustered index)
Tree height = 3 (root + 2 internal levels + leaves)
Root and level-1 pages cached in buffer pool
Target record exists

Index Lookup Steps

•Parse and Optimize: Query parser identifies employee_id = 5042 equality predicate. Optimizer selects index seek on the primary index.
•Read Root Page (cached): Root page is in buffer. Binary search within root page: 5042 falls between separators 5000 and 6000, pointing to child page 1247.
•Read Level-1 Page (cached): Page 1247 is in buffer. Binary search: 5042 falls between 5040 and 5060, pointing to child page 8903.
•Read Level-2 Page (I/O): Page 8903 not in buffer. Issue disk read. Binary search: 5042 falls between 5041 and 5043, pointing to leaf page 12055.
•Read Leaf Page (I/O): Page 12055 not in buffer. Issue disk read. Binary search within leaf finds entry (5042, data_record).
•Return Result: For clustered index, data is in the leaf entry. Extract name and salary columns and return to client.

Cost Analysis:

Step	Memory/Disk	Time
Root lookup	Memory (cached)	~0.001 ms
Level-1 lookup	Memory (cached)	~0.001 ms
Level-2 lookup	Disk I/O	~1-10 ms
Leaf lookup	Disk I/O	~1-10 ms
Total		~2-20 ms

Compare this to a table scan that would read 100,000 pages at 1ms each: 100 seconds. The index reduced latency by a factor of 5,000-50,000.

Key Observations:

Most lookups cost only 1-2 disk I/Os (upper levels cached)
The number of I/Os is predictable: approximately height - (cached levels)
Each I/O retrieves exactly one page (no unnecessary data read)
Within-page search uses binary search (O(log entries_per_page))

Range Query Acceleration

Indexes accelerate not just point lookups but also range queries. The mechanism differs slightly but remains fundamentally efficient.

Query:

SELECT * FROM orders 
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';

Range Scan Algorithm:

Find Start Point: Use tree traversal to locate the first entry where order_date >= '2024-01-01'
Sequential Scan: From that point, scan forward through leaf pages (they're linked!) until order_date > '2024-01-31'
Terminate: Stop when the range is exhausted

The beauty of B+-trees: leaf pages are linked in a doubly-linked list. Once we find the range start, we scan sequentially—no more tree traversals needed.

Cost Analysis for Range Queries:

If the range contains K matching records spread across P pages:

Component	Cost
Locate start	O(log N) page reads
Scan range	O(P) page reads
Total	O(log N + P)

Why This Beats Table Scan:

A table scan reads ALL pages regardless of how few records match. A range scan reads:

A few pages to find the start (log N)
Only the pages containing matching records (P)

If your range is 1% of the table, you read ~1% of the pages (plus the initial lookup cost). This scales beautifully.

Example:

Table has 10 million orders across 100,000 pages
January 2024 has 100,000 orders across 1,000 pages
Table scan: 100,000 page reads
Range scan: 4 pages (tree traversal) + 1,000 pages (range) = 1,004 page reads
100× improvement

The Clustering Factor Secret

Range scan efficiency depends heavily on how well the data is clustered by the index key. If records are stored in index order (clustered index), consecutive index entries point to consecutive data pages—minimal I/O. If scattered (non-clustered index), each entry may require a separate random I/O—potentially worse than a table scan for large ranges.

Hash-Based Constant Time Lookup

While B+-trees achieve O(log N) lookup, hash indexes can achieve O(1) average-case lookup—constant time regardless of data size. This seems miraculous but comes with significant trade-offs.

Hash Index Mechanism:

Apply hash function to search key: bucket = hash(key) mod bucket_count
Read the bucket page(s)
Scan entries within bucket for exact match
Return matching record location(s)

Why O(1)?

The hash function directly computes where to look—no traversal needed. Regardless of whether the table has 1,000 or 1 billion rows, the same number of steps (1-2 page reads) finds the target bucket.

Hash Index Advantages

•True O(1) lookup: Constant time for equality
•Simple structure: No complex tree maintenance
•Excellent for point queries: Fastest possible equality lookup
•Memory-efficient: Can achieve high load factors

Hash Index Disadvantages

•No range queries: Cannot answer < > BETWEEN
•No ordering: Cannot support ORDER BY
•No prefix matching: LIKE 'abc%' not supported
•Bucket overflow handling: Can degrade to O(N) worst case

When to Use Hash Indexes:

Hash indexes are ideal for:

Pure equality lookups on high-cardinality columns
Primary key lookups (when range queries aren't needed)
Join key lookups in hash join algorithms
In-memory databases (where random access has no penalty)

They are poor choices for:

Range queries of any kind
Ordered access requirements
Low-cardinality columns (many collisions per bucket)
Prefix or pattern matching

PostgreSQL Example:

-- Create a hash index (PostgreSQL syntax)
CREATE INDEX idx_user_email_hash ON users USING hash (email);

-- This query uses the hash index
SELECT * FROM users WHERE email = 'user@example.com';

-- This query CANNOT use the hash index (range predicate)
SELECT * FROM users WHERE email > 'a' AND email < 'b';

Index-Only Scans: Eliminating Table Access

The ultimate acceleration is avoiding table access entirely. If the index contains all columns the query needs, the database can answer the query from the index alone—an index-only scan (also called a covering index scenario).

Traditional Index Access:

1. Search index → Find RID(s)
2. Use RID to fetch record from table → One I/O per record potentially
3. Extract requested columns

Index-Only Scan:

1. Search index → Find matching entries
2. Extract columns directly from index entries
3. Done (no table access)

The savings can be enormous, especially when:

Many records match the predicate
Records are scattered across many pages
Only a few columns are needed in the result

Creating Covering Indexes:

-- Query we want to optimize
SELECT order_id, total_amount 
FROM orders 
WHERE customer_id = 12345;

-- Index that enables index-only scan
CREATE INDEX idx_orders_covering 
ON orders (customer_id) 
INCLUDE (order_id, total_amount);

The INCLUDE clause (or equivalent) adds columns to the leaf entries without making them part of the search key. This means:

The index is still ordered by customer_id only
But leaf entries include order_id and total_amount
Queries selecting only these columns can be answered from the index

Space-Performance Trade-off:

Covering indexes increase index size. Each leaf entry now includes additional columns. For a heavily-used query pattern, this trade-off is usually worthwhile. For rarely-used queries, the space cost may not justify the benefit.

Spotting Index-Only Scan Potential

When reviewing execution plans, look for indications like 'Index Only Scan' (PostgreSQL), 'Using index' (MySQL), or Cost estimates that don't include table access. If your query qualifies for index-only scan but isn't getting it, check: Does the index include all SELECT columns? Are visibility map pages clean (PostgreSQL)? Is the query eligible?

Index Scan vs Index-Only Scan Cost Comparison
Scenario	Matching Records	Index Scan I/O	Index-Only Scan I/O	Savings
Single record	1	3-4 pages	2-3 pages	25-33%
Small range	100	3 + ~100 pages	3-10 pages	90%+
Large range	10,000	3 + ~5,000 pages	3 + ~50 pages	99%+

Acceleration Limits and When Indexes Don't Help

Indexes are powerful but not omnipotent. Understanding when indexes cannot help—or actively hurt—is as important as understanding when they help.

Low Selectivity Predicates:

When a predicate matches most rows, index lookup is slower than table scan:

Each matching row requires a separate pointer chase (random I/O)
Table scan reads pages sequentially (faster I/O pattern)
The "breakeven point" is typically 10-30% of rows

Example:

-- Index probably helps (assuming 'suspended' is rare)
SELECT * FROM accounts WHERE status = 'suspended';

-- Index probably hurts (most accounts are active)
SELECT * FROM accounts WHERE status = 'active';

Non-Sargable Predicates:

Some predicates cannot use indexes regardless of selectivity. These are called non-sargable (not Search ARGument ABLE):

-- Non-sargable: Function applied to column
WHERE UPPER(email) = 'USER@EXAMPLE.COM'

-- Sargable alternative (if stored in uppercase) 
WHERE email = 'USER@EXAMPLE.COM'

-- Non-sargable: Leading wildcard
WHERE name LIKE '%smith'

-- Sargable: Trailing wildcard
WHERE name LIKE 'smith%'

-- Non-sargable: Expression on column
WHERE salary * 1.1 > 100000

-- Sargable alternative
WHERE salary > 100000 / 1.1

Wrong Column Order (Composite Index):

A composite index on (A, B, C) cannot accelerate queries that filter only on B or C.

When Indexes Do NOT Help

•Low selectivity: More than 10-30% of rows match
•Functions on columns: WHERE YEAR(date) = 2024
•Leading wildcards: WHERE name LIKE '%suffix'
•OR predicates spanning indexes: WHERE a = 1 OR b = 2 (may require index merge)
•Type mismatches: WHERE varchar_column = 12345 (implicit conversion)
•Prefix not constrained: Composite (A,B) with WHERE B = 5
•Small tables: Overhead exceeds benefit when table fits in a few pages

Summary: Understanding Lookup Acceleration

Lookup acceleration is the raison d'être of indexing. Understanding the mechanisms behind this acceleration enables you to design effective indexes and diagnose performance issues. Let's consolidate the key concepts:

Key Takeaways

•Linear scan is unsustainable at scale — O(N) grows unacceptably with data size; indexes are not optional for large datasets.
•B+-trees achieve O(log N) through high fanout — Hundreds of children per node create shallow trees (3-4 levels for billions of rows).
•Index lookup has predictable I/O cost — Height minus cached levels gives actual disk reads needed.
•Range queries benefit from leaf linking — One traversal plus sequential leaf scan; cost proportional to result size.
•Hash indexes achieve O(1) for equality — Constant time lookup but no range query support.
•Index-only scans eliminate table access — When index contains all needed columns, massive I/O savings result.
•Indexes have limits — Low selectivity, non-sargable predicates, and wrong column order negate index benefits.

What's Next:

We now understand what indexes are, how they're organized (search keys and entries), and how they accelerate lookups. The final piece is understanding the trade-offs — the costs and limitations of indexing that must be balanced against benefits. The next page provides a comprehensive analysis of index trade-offs.

Page Complete

You now understand the mechanisms behind index acceleration: tree-based logarithmic lookup, range scan efficiency, hash-based constant time, and index-only optimization. This knowledge empowers you to predict and optimize query performance.

Lookup Acceleration

The Heart of Indexing Performance

What You Will Learn

The Problem: Linear Search is Unsustainable

To appreciate what indexes provide, we must first understand what happens without them. Consider a table with N rows stored as a heap file (unsorted, unindexed). To find a specific record:

SELECT * FROM employees WHERE employee_id = 12345;

Without an index, the database must:

Start at the first page of the table
Read each page into memory
Examine every record on each page
Check if employee_id = 12345
Continue until the record is found or all pages are exhausted

This is fundamentally O(N) — the time grows linearly with data size.

The Scale Problem:

Linear time sounds acceptable until you consider real-world scale:

Table Scan Time at Scale (assuming 1ms per page, 100 rows per page)
Row Count	Pages	Scan Time	Index Lookup (log₂N levels)
1,000	10	10ms	~2ms (4 levels)
100,000	1,000	1 second	~3ms (5 levels)
10,000,000	100,000	100 seconds	~4ms (7 levels)
1,000,000,000	10,000,000	2.8 hours	~5ms (10 levels)

The Exponential Gap

The Impossibility of Linear Scale:

Indexes break this linear dependency completely.

Logarithmic Time Through Tree Structures

The Key Insight: Divide and Conquer

Instead of examining all N entries, a tree structure allows us to:

Examine a root node with M pointers to child subtrees
Determine which child subtree contains our target
Move to that child (eliminating N - N/M entries from consideration)
Repeat until we reach a leaf containing the actual entry

With each step, we eliminate a fraction of the search space. The number of steps needed is the tree's height, which is log(N) for a balanced tree.

B+-Tree Height Calculation:

For a B+-tree with:

Page size P (typically 8-16 KB)
Internal node fanout F (typically 200-500 for integer keys)
N total records

The height H is approximately:

H = ⌈log_F(N)⌉

Example:

100 million records (N = 10⁸)
Fanout of 300 (F = 300)
Height = log₃₀₀(10⁸) = log(10⁸)/log(300) ≈ 3.23

This means 4 page reads to locate any record among 100 million. And because root and high-level internal nodes are frequently accessed, they're typically cached in memory, reducing I/O further.

The Fanout Advantage:

B+-trees achieve high fanout (branching factor) because:

Each internal node contains many keys and pointers (not just 2 like binary trees)
Large page sizes (8-16 KB) hold many small entries
Keys in internal nodes can be truncated to separators

A binary tree with 100 million nodes is 27 levels deep. A B+-tree with 100 million entries is 4 levels deep. This difference—enabled by fanout—is why B+-trees dominate in databases.

Converting Mermaid diagram...

Step-by-Step Index Lookup Process

Let's trace exactly what happens during an index lookup. Understanding this process helps diagnose performance issues and reason about expected costs.

Query:

SELECT name, salary FROM employees WHERE employee_id = 5042;

Assumptions:

B+-tree index on employee_id (primary clustered index)
Tree height = 3 (root + 2 internal levels + leaves)
Root and level-1 pages cached in buffer pool
Target record exists

Index Lookup Steps

•Parse and Optimize: Query parser identifies employee_id = 5042 equality predicate. Optimizer selects index seek on the primary index.
•Read Root Page (cached): Root page is in buffer. Binary search within root page: 5042 falls between separators 5000 and 6000, pointing to child page 1247.
•Read Level-1 Page (cached): Page 1247 is in buffer. Binary search: 5042 falls between 5040 and 5060, pointing to child page 8903.
•Read Level-2 Page (I/O): Page 8903 not in buffer. Issue disk read. Binary search: 5042 falls between 5041 and 5043, pointing to leaf page 12055.
•Read Leaf Page (I/O): Page 12055 not in buffer. Issue disk read. Binary search within leaf finds entry (5042, data_record).
•Return Result: For clustered index, data is in the leaf entry. Extract name and salary columns and return to client.

Cost Analysis:

Step	Memory/Disk	Time
Root lookup	Memory (cached)	~0.001 ms
Level-1 lookup	Memory (cached)	~0.001 ms
Level-2 lookup	Disk I/O	~1-10 ms
Leaf lookup	Disk I/O	~1-10 ms
Total		~2-20 ms

Compare this to a table scan that would read 100,000 pages at 1ms each: 100 seconds. The index reduced latency by a factor of 5,000-50,000.

Key Observations:

Most lookups cost only 1-2 disk I/Os (upper levels cached)
The number of I/Os is predictable: approximately height - (cached levels)
Each I/O retrieves exactly one page (no unnecessary data read)
Within-page search uses binary search (O(log entries_per_page))

Range Query Acceleration

Indexes accelerate not just point lookups but also range queries. The mechanism differs slightly but remains fundamentally efficient.

Query:

SELECT * FROM orders 
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';

Range Scan Algorithm:

Find Start Point: Use tree traversal to locate the first entry where order_date >= '2024-01-01'
Sequential Scan: From that point, scan forward through leaf pages (they're linked!) until order_date > '2024-01-31'
Terminate: Stop when the range is exhausted

The beauty of B+-trees: leaf pages are linked in a doubly-linked list. Once we find the range start, we scan sequentially—no more tree traversals needed.

Cost Analysis for Range Queries:

If the range contains K matching records spread across P pages:

Component	Cost
Locate start	O(log N) page reads
Scan range	O(P) page reads
Total	O(log N + P)

Why This Beats Table Scan:

A table scan reads ALL pages regardless of how few records match. A range scan reads:

A few pages to find the start (log N)
Only the pages containing matching records (P)

If your range is 1% of the table, you read ~1% of the pages (plus the initial lookup cost). This scales beautifully.

Example:

Table has 10 million orders across 100,000 pages
January 2024 has 100,000 orders across 1,000 pages
Table scan: 100,000 page reads
Range scan: 4 pages (tree traversal) + 1,000 pages (range) = 1,004 page reads
100× improvement

The Clustering Factor Secret

Hash-Based Constant Time Lookup

While B+-trees achieve O(log N) lookup, hash indexes can achieve O(1) average-case lookup—constant time regardless of data size. This seems miraculous but comes with significant trade-offs.

Hash Index Mechanism:

Apply hash function to search key: bucket = hash(key) mod bucket_count
Read the bucket page(s)
Scan entries within bucket for exact match
Return matching record location(s)

Why O(1)?

Hash Index Advantages

•True O(1) lookup: Constant time for equality
•Simple structure: No complex tree maintenance
•Excellent for point queries: Fastest possible equality lookup
•Memory-efficient: Can achieve high load factors

Hash Index Disadvantages

•No range queries: Cannot answer < > BETWEEN
•No ordering: Cannot support ORDER BY
•No prefix matching: LIKE 'abc%' not supported
•Bucket overflow handling: Can degrade to O(N) worst case

When to Use Hash Indexes:

Hash indexes are ideal for:

Pure equality lookups on high-cardinality columns
Primary key lookups (when range queries aren't needed)
Join key lookups in hash join algorithms
In-memory databases (where random access has no penalty)

They are poor choices for:

Range queries of any kind
Ordered access requirements
Low-cardinality columns (many collisions per bucket)
Prefix or pattern matching

PostgreSQL Example:

-- Create a hash index (PostgreSQL syntax)
CREATE INDEX idx_user_email_hash ON users USING hash (email);

-- This query uses the hash index
SELECT * FROM users WHERE email = 'user@example.com';

-- This query CANNOT use the hash index (range predicate)
SELECT * FROM users WHERE email > 'a' AND email < 'b';

Index-Only Scans: Eliminating Table Access

Traditional Index Access:

1. Search index → Find RID(s)
2. Use RID to fetch record from table → One I/O per record potentially
3. Extract requested columns

Index-Only Scan:

1. Search index → Find matching entries
2. Extract columns directly from index entries
3. Done (no table access)

The savings can be enormous, especially when:

Many records match the predicate
Records are scattered across many pages
Only a few columns are needed in the result

Creating Covering Indexes:

-- Query we want to optimize
SELECT order_id, total_amount 
FROM orders 
WHERE customer_id = 12345;

-- Index that enables index-only scan
CREATE INDEX idx_orders_covering 
ON orders (customer_id) 
INCLUDE (order_id, total_amount);

The INCLUDE clause (or equivalent) adds columns to the leaf entries without making them part of the search key. This means:

The index is still ordered by customer_id only
But leaf entries include order_id and total_amount
Queries selecting only these columns can be answered from the index

Space-Performance Trade-off:

Spotting Index-Only Scan Potential

Index Scan vs Index-Only Scan Cost Comparison
Scenario	Matching Records	Index Scan I/O	Index-Only Scan I/O	Savings
Single record	1	3-4 pages	2-3 pages	25-33%
Small range	100	3 + ~100 pages	3-10 pages	90%+
Large range	10,000	3 + ~5,000 pages	3 + ~50 pages	99%+

Acceleration Limits and When Indexes Don't Help

Indexes are powerful but not omnipotent. Understanding when indexes cannot help—or actively hurt—is as important as understanding when they help.

Low Selectivity Predicates:

When a predicate matches most rows, index lookup is slower than table scan:

Each matching row requires a separate pointer chase (random I/O)
Table scan reads pages sequentially (faster I/O pattern)
The "breakeven point" is typically 10-30% of rows

Example:

-- Index probably helps (assuming 'suspended' is rare)
SELECT * FROM accounts WHERE status = 'suspended';

-- Index probably hurts (most accounts are active)
SELECT * FROM accounts WHERE status = 'active';

Non-Sargable Predicates:

Some predicates cannot use indexes regardless of selectivity. These are called non-sargable (not Search ARGument ABLE):

-- Non-sargable: Function applied to column
WHERE UPPER(email) = 'USER@EXAMPLE.COM'

-- Sargable alternative (if stored in uppercase) 
WHERE email = 'USER@EXAMPLE.COM'

-- Non-sargable: Leading wildcard
WHERE name LIKE '%smith'

-- Sargable: Trailing wildcard
WHERE name LIKE 'smith%'

-- Non-sargable: Expression on column
WHERE salary * 1.1 > 100000

-- Sargable alternative
WHERE salary > 100000 / 1.1

Wrong Column Order (Composite Index):

A composite index on (A, B, C) cannot accelerate queries that filter only on B or C.

When Indexes Do NOT Help

•Low selectivity: More than 10-30% of rows match
•Functions on columns: WHERE YEAR(date) = 2024
•Leading wildcards: WHERE name LIKE '%suffix'
•OR predicates spanning indexes: WHERE a = 1 OR b = 2 (may require index merge)
•Type mismatches: WHERE varchar_column = 12345 (implicit conversion)
•Prefix not constrained: Composite (A,B) with WHERE B = 5
•Small tables: Overhead exceeds benefit when table fits in a few pages

Summary: Understanding Lookup Acceleration

Key Takeaways

•Linear scan is unsustainable at scale — O(N) grows unacceptably with data size; indexes are not optional for large datasets.
•B+-trees achieve O(log N) through high fanout — Hundreds of children per node create shallow trees (3-4 levels for billions of rows).
•Index lookup has predictable I/O cost — Height minus cached levels gives actual disk reads needed.
•Range queries benefit from leaf linking — One traversal plus sequential leaf scan; cost proportional to result size.
•Hash indexes achieve O(1) for equality — Constant time lookup but no range query support.
•Index-only scans eliminate table access — When index contains all needed columns, massive I/O savings result.
•Indexes have limits — Low selectivity, non-sargable predicates, and wrong column order negate index benefits.

What's Next:

Page Complete