B Plus Tree Operations - Learning Module

Loading content...

0/252

Search in B+-tree

The Art of Finding Data in Milliseconds

Every time you execute a SQL query with a WHERE clause on an indexed column, you're invoking one of the most elegant search algorithms in computer science: the B+-tree search. This operation transforms what would be a linear scan through millions of records into a graceful descent through a handful of tree levels, delivering results in milliseconds regardless of table size.

Understanding B+-tree search isn't merely academic—it's the foundation for comprehending query performance, predicting I/O costs, and making informed decisions about index design. A Principal Engineer doesn't just use indexes; they understand precisely how the search algorithm navigates the tree structure and why this navigation is extraordinarily efficient.

What You Will Learn

By the end of this page, you will understand the complete mechanics of B+-tree search, from root traversal to leaf node inspection. You'll be able to trace search paths, calculate I/O costs, analyze time complexity, and recognize why B+-trees guarantee logarithmic search performance in the worst case.

Search Problem Formulation

Before diving into the algorithm, let's precisely formulate what we're trying to accomplish. The B+-tree search problem has two distinct variants:

Point Query (Equality Search): Given a search key K, find all records where the indexed attribute equals K. For example: SELECT * FROM employees WHERE employee_id = 12345.

Range Query: Given a range [K₁, K₂], find all records where the indexed attribute falls within this range. For example: SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31'.

This page focuses on the point query mechanics. Range queries build upon this foundation and are covered in the next page.

Key Distinction: B+-tree vs B-tree Search

In a B+-tree, all data resides in leaf nodes. Internal nodes serve purely as routing guides. This is fundamentally different from B-trees, where data can be found at any level. The B+-tree search always terminates at a leaf node, even if we encounter the search key in an internal node—we must continue to the leaf to find the actual record pointer.

Formal Problem Statement:

Given:

A B+-tree with order n (maximum n children per node, minimum ⌈n/2⌉ children for non-root nodes)
A search key K

Find:

The leaf node L that would contain key K if it exists
Whether K exists in L
If K exists, return the associated record pointer(s)

The algorithm must guarantee:

Correct navigation through internal nodes
Arrival at the correct leaf node
O(log n) I/O operations where N is the number of entries

B+-tree Structure: The Search Context

To understand search, we must first crystallize the B+-tree structure that the search algorithm navigates:

Internal Node Structure: Each internal node contains:

Keys: K₁, K₂, ..., Kₘ₋₁ (where m is the number of children)
Pointers: P₁, P₂, ..., Pₘ (pointers to child nodes)

The separator key invariant holds:

All keys in subtree P₁ are < K₁
All keys in subtree Pᵢ satisfy Kᵢ₋₁ ≤ keys < Kᵢ (for 1 < i < m)
All keys in subtree Pₘ are ≥ Kₘ₋₁

Leaf Node Structure: Each leaf node contains:

Keys: K₁, K₂, ..., Kₖ (actual indexed values)
Record Pointers: corresponding pointers to data records or RIDs
Sibling Pointer: pointer to the next leaf node (for range scans)

B+-tree Node Components and Their Search Role
Component	Location	Role in Search	Search Interpretation
Separator Keys	Internal Nodes	Routing decisions	Compare with search key to choose path
Child Pointers	Internal Nodes	Navigation links	Follow to descend one level
Actual Keys	Leaf Nodes	Search targets	Match against search key
Record Pointers	Leaf Nodes	Final result	Returned when key matches
Sibling Pointer	Leaf Nodes	Range traversal	Used after point search for ranges

Critical Insight: Separator Keys ≠ Data Keys

Internal node keys are separators, not actual data values. They might be copies of leaf keys, but their sole purpose is routing. In some B+-tree implementations, when a leaf key is deleted, its copy in an internal node may remain as a separator—this doesn't affect correctness because the separator still correctly routes searches.

The Search Algorithm: Step-by-Step

The B+-tree search algorithm is elegantly simple yet powerful. It consists of two phases:

Phase 1: Tree Descent (Finding the Leaf) Starting from the root, traverse down through internal nodes, selecting the appropriate child pointer at each level until reaching a leaf node.

Phase 2: Leaf Search (Finding the Key) Within the leaf node, search for the target key using binary search and return the associated record pointer(s) if found.

B+-Tree Search Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
FUNCTION BPlusTreeSearch(root, searchKey K)
    // ================================================
    // PHASE 1: TREE DESCENT - Navigate to correct leaf
    // ================================================
    currentNode ← root
    
    WHILE currentNode is NOT a leaf node DO
        // Find the appropriate child pointer to follow
        childIndex ← FindChildIndex(currentNode, K)
        currentNode ← currentNode.children[childIndex]
    END WHILE
    
    // currentNode is now a leaf node
    leafNode ← currentNode
    
    // ================================================
    // PHASE 2: LEAF SEARCH - Find key within leaf
    // ================================================
    RETURN SearchInLeaf(leafNode, K)
END FUNCTION
 
FUNCTION FindChildIndex(internalNode, K)
    // Binary search to find correct child pointer
    // internalNode has keys: K₁, K₂, ..., Kₘ₋₁
    // and children: P₁, P₂, ..., Pₘ
    
    keys ← internalNode.keys    // m-1 keys
    m ← internalNode.childCount // m children
    
    // Find smallest i such that K < keys[i]
    // If no such i exists, use the rightmost child
    
    FOR i ← 1 TO m-1 DO
        IF K < keys[i] THEN
            RETURN i    // Follow child pointer Pᵢ
        END IF
    END FOR
    
    RETURN m    // K ≥ all keys, follow rightmost pointer Pₘ
END FUNCTION
 
FUNCTION SearchInLeaf(leafNode, K)
    // Binary search within leaf node
    keys ← leafNode.keys
    n ← leafNode.keyCount
    
    // Binary search for K
    left ← 1
    right ← n
    
    WHILE left ≤ right DO
        mid ← (left + right) / 2
        
        IF keys[mid] = K THEN
            // Found! Return associated record pointer(s)
            RETURN (TRUE, leafNode.recordPointers[mid])
        ELSE IF keys[mid] < K THEN
            left ← mid + 1
        ELSE
            right ← mid - 1
        END IF
    END WHILE
    
    // Key not found
    RETURN (FALSE, NULL)
END FUNCTION

Implementation Note: Duplicate Keys

When duplicate keys are allowed (non-unique indexes), the search might find one occurrence, but additional duplicates may exist to the left or right in the same leaf, or in adjacent leaf nodes. A complete search for all duplicates requires scanning in both directions or using a different duplicate handling strategy (like overflow chains).

Worked Example: Tracing a Search

Let's trace a complete search operation through a concrete B+-tree example. Consider a B+-tree of order 4 (maximum 4 children per internal node, maximum 3 keys per leaf node) indexing employee IDs:

Tree Structure:

Converting Mermaid diagram...

Search for Key = 42:

Step 1: Start at Root

Current Node: Root with keys [30, 60]
Search Key: 42
Comparison: 42 ≥ 30 and 42 < 60
Decision: Follow second child pointer P₂ (between keys 30 and 60)

Step 2: At Internal Node B

Current Node: Internal Node B with keys [40, 50]
Search Key: 42
Comparison: 42 ≥ 40 and 42 < 50
Decision: Follow second child pointer P₂

Step 3: At Leaf Node 5

Current Node: Leaf 5 with keys [40, 42, 45]
Binary search for 42
Found at position 2!
Return: (TRUE, record_pointer_for_42)

Search Trace Summary for Key = 42
Step	Node Type	Node Contents	Comparison	Action
1	Root (Internal)	[30, 60]	30 ≤ 42 < 60	Follow P₂ → Node B
2	Internal	[40, 50]	40 ≤ 42 < 50	Follow P₂ → Leaf 5
3	Leaf	[40, 42, 45]	42 = 42 ✓	Key Found!

Search for Key = 25 (Not Present):

Step 1: Start at Root

Keys [30, 60], Search Key 25
25 < 30, follow first child pointer P₁

Step 2: At Internal Node A

Keys [10, 20], Search Key 25
25 ≥ 20, follow third child pointer P₃

Step 3: At Leaf Node 3

Keys [20, 22, 28], Search Key 25
Binary search: 25 not found (would fall between 22 and 28)
Return: (FALSE, NULL)

Notice that the search correctly navigates to Leaf 3, which is where 25 would be stored if it existed. This property is crucial for insertions.

Search Finds the Correct Leaf Even for Missing Keys

The search algorithm locates the leaf node where a key belongs, whether or not the key actually exists. This is essential for insert operations—run the search to find the correct leaf, then insert the key there. The search is thus reused as the first step of insertion.

Complexity Analysis: Why B+-trees Excel

The B+-tree search algorithm's efficiency stems from its logarithmic depth. Let's derive the time and I/O complexity rigorously.

Height Analysis:

For a B+-tree of order n (max n children per node):

Minimum children per non-root internal node: ⌈n/2⌉
Minimum keys in leaf: ⌊(n-1)/2⌋

With N total keys, the maximum height h satisfies:

N ≤ (n-1) × n^(h-1)

Solving for h:

h ≤ 1 + log_n(N/(n-1))
h = O(log_n N)

For typical B+-trees where n = 100-200:

1 million records: height ≈ 3-4
1 billion records: height ≈ 5-6

B+-tree Height for Various Data Sizes (Order n = 100)
Number of Records (N)	Maximum Height	Disk I/Os for Search
1,000	2	2
100,000	3	3
10,000,000	4	4
1,000,000,000	5	5
100,000,000,000	6	6

I/O Complexity:

Each level traversed requires one disk I/O to read a node. The search reads exactly h nodes where h is the tree height.

I/O Cost = O(log_n N) = O(height)

This is remarkably efficient. Searching among 1 billion records requires only 5 disk reads. At 10ms per random disk seek, that's 50ms—fast enough for interactive queries.

CPU Complexity:

At each internal node, we perform binary search on up to n-1 keys:

Time per node: O(log n)
Total time: O(h × log n) = O(log_n N × log n) = O(log N)

Since disk I/O dominates, the O(log N) CPU time is negligible.

The Power of High Fanout

The key to B+-tree efficiency is the high fanout (number of children per node). With n=100, each level of the tree can partition the search space by a factor of 100. Compare this to a binary tree where each level only halves the search space. A binary tree would require 30 levels for 1 billion records; a B+-tree with n=100 needs only 5.

B+-tree Search Advantages

•Guaranteed O(log N) worst case — no degenerate cases like unbalanced BSTs
•Minimal disk I/Os — typically 3-5 for billions of records
•Cache-friendly — large nodes exploit disk block reads
•Predictable performance — height varies logarithmically
•Root commonly cached — reduces search to h-1 I/Os in practice

Search Overhead Considerations

•Random access pattern — each level may hit different disk block
•Per-query overhead — not amortized like sequential scan
•Index maintenance cost — search speed trades off with write speed
•Memory for caching — upper levels should stay in buffer pool
•Selectivity dependency — point queries fast, but full scans still O(N)

Real-World Search Optimizations

Production database systems implement several optimizations to make B+-tree search even faster:

Common B+-tree Search Optimizations

•Root Node Pinning — The root node is permanently kept in memory. Since every search starts at the root, pinning eliminates one guaranteed disk I/O per search.
•Upper Level Caching — Internal nodes near the root are heavily accessed. Modern systems keep the top 2-3 levels entirely in the buffer pool, reducing typical search to 1-2 I/Os.
•Binary Search with SIMD — Within a node, modern CPUs use SIMD instructions to compare the search key against multiple node keys simultaneously, accelerating the node-local search.
•Interpolation Hints — For numeric keys with known distribution, some systems use interpolation instead of binary search to guess the approximate position within a node.
•Prefix Compression — Keys within a node often share common prefixes. Compressing these prefixes allows more keys per node, increasing effective fanout without increasing node size.
•Fence Keys — Internal nodes store the minimum and maximum key values in their subtree, allowing early termination when the search key is clearly out of range.

Buffer Pool Impact

In a well-tuned system with adequate buffer pool size, the top levels of frequently-used B+-trees remain cached. For a B+-tree with height 4, if levels 0-2 are cached, searches require only 1 disk I/O (for the leaf node). This is why memory sizing is crucial for database performance.

Adaptive Search Strategies:

Some advanced systems employ adaptive techniques:

Hybrid Search: For very wide nodes (high fanout), systems may use a hybrid of binary search and linear probe. Binary search narrows to a cache-line-sized region, then linear scan identifies the exact position.

Branching Hints: After traversing to a leaf, the search path can be cached as a "hint" for subsequent searches with nearby keys. This is particularly effective for range queries and sequential access patterns.

NUMA-Aware Placement: In multi-socket systems, B+-tree nodes are placed in memory close to the processor that most frequently accesses them, reducing memory access latency.

Comparative Analysis: Why B+-trees Dominate

To appreciate B+-tree search, compare it with alternative index structures:

Point Query Search Performance Comparison
Structure	Search Complexity	Disk I/Os (1B records)	Strengths	Weaknesses
B+-tree	O(log N)	~5	Balanced, cache-friendly, supports ranges	Moderate insert overhead
Binary Search Tree	O(log N) avg, O(N) worst	~30 (if binary)	Simple implementation	Degenerates if unbalanced
Hash Index	O(1) average	~1-2	Fastest for equality	No range queries
Linear Scan	O(N)	~millions	No structure needed	Impractical at scale
Sorted Array	O(log N)	~30 (binary search)	Simple, compact	Expensive inserts O(N)

Why B+-trees Win for General-Purpose Indexing:

Guaranteed Balance: Unlike BSTs, B+-trees cannot become unbalanced. Every leaf is exactly the same distance from the root, ensuring consistent O(log N) performance regardless of insertion order.
Disk-Optimized Fanout: By storing hundreds of keys per node (matching disk block size), B+-trees minimize the number of disk accesses. A binary tree would require one disk access per comparison—catastrophic for large datasets.
Range Query Support: Unlike hash indexes, B+-trees maintain sorted order within leaves and link leaves together. After finding a start key, range queries simply follow sibling pointers—no additional tree traversals needed.
Mature Implementations: Decades of optimization have produced B+-tree implementations that exploit CPU caches, SIMD, and modern storage hierarchies. The simple algorithm enables sophisticated engineering.

Hash Indexes: The Exception

Hash indexes achieve O(1) average-case point queries, faster than B+-trees. However, they cannot support range queries, inequality comparisons, or ORDER BY without sorting. Most databases default to B+-trees because they're more general-purpose. Hash indexes are used selectively for specific equality-lookup workloads.

Summary: B+-tree Search Mastery

Let's consolidate the essential concepts of B+-tree search:

Key Takeaways

•Two-Phase Algorithm: Search consists of tree descent (navigating internal nodes) and leaf search (binary search within the leaf). Both phases are logarithmic.
•Separator Key Navigation: Internal node keys are separators that route searches left or right. Follow the pointer whose key range contains the search key.
•All Data in Leaves: Unlike B-trees, B+-tree searches always terminate at leaf nodes. Internal node keys are just routing guides, not data.
•O(log_n N) I/Os: With fanout n, height is logarithmic. Typical databases need 3-5 disk reads for billions of records.
•Correct Leaf Discovery: The search finds the leaf where a key belongs, even if the key doesn't exist. This enables efficient insertions.
•Real-World Optimizations: Caching upper levels, SIMD search, and prefix compression further accelerate production systems.

What's Next:

Now that you understand point queries, the next page explores range queries—one of B+-trees' defining advantages. You'll see how the linked leaf structure enables efficient range scans after an initial search, making B+-trees ideal for BETWEEN clauses, ORDER BY, and range-based analytics.

Page Complete

You now understand the complete mechanics of B+-tree point query search. This algorithm runs thousands of times per second in every database system, and your knowledge of its inner workings will inform index design, query tuning, and performance analysis throughout your career.

Search in B+-tree

The Art of Finding Data in Milliseconds

What You Will Learn

Search Problem Formulation

Before diving into the algorithm, let's precisely formulate what we're trying to accomplish. The B+-tree search problem has two distinct variants:

Point Query (Equality Search): Given a search key K, find all records where the indexed attribute equals K. For example: SELECT * FROM employees WHERE employee_id = 12345.

This page focuses on the point query mechanics. Range queries build upon this foundation and are covered in the next page.

Key Distinction: B+-tree vs B-tree Search

Formal Problem Statement:

Given:

A B+-tree with order n (maximum n children per node, minimum ⌈n/2⌉ children for non-root nodes)
A search key K

Find:

The leaf node L that would contain key K if it exists
Whether K exists in L
If K exists, return the associated record pointer(s)

The algorithm must guarantee:

Correct navigation through internal nodes
Arrival at the correct leaf node
O(log n) I/O operations where N is the number of entries

B+-tree Structure: The Search Context

To understand search, we must first crystallize the B+-tree structure that the search algorithm navigates:

Internal Node Structure: Each internal node contains:

Keys: K₁, K₂, ..., Kₘ₋₁ (where m is the number of children)
Pointers: P₁, P₂, ..., Pₘ (pointers to child nodes)

The separator key invariant holds:

All keys in subtree P₁ are < K₁
All keys in subtree Pᵢ satisfy Kᵢ₋₁ ≤ keys < Kᵢ (for 1 < i < m)
All keys in subtree Pₘ are ≥ Kₘ₋₁

Leaf Node Structure: Each leaf node contains:

Keys: K₁, K₂, ..., Kₖ (actual indexed values)
Record Pointers: corresponding pointers to data records or RIDs
Sibling Pointer: pointer to the next leaf node (for range scans)

B+-tree Node Components and Their Search Role
Component	Location	Role in Search	Search Interpretation
Separator Keys	Internal Nodes	Routing decisions	Compare with search key to choose path
Child Pointers	Internal Nodes	Navigation links	Follow to descend one level
Actual Keys	Leaf Nodes	Search targets	Match against search key
Record Pointers	Leaf Nodes	Final result	Returned when key matches
Sibling Pointer	Leaf Nodes	Range traversal	Used after point search for ranges

Critical Insight: Separator Keys ≠ Data Keys

The Search Algorithm: Step-by-Step

The B+-tree search algorithm is elegantly simple yet powerful. It consists of two phases:

Phase 1: Tree Descent (Finding the Leaf) Starting from the root, traverse down through internal nodes, selecting the appropriate child pointer at each level until reaching a leaf node.

Phase 2: Leaf Search (Finding the Key) Within the leaf node, search for the target key using binary search and return the associated record pointer(s) if found.

B+-Tree Search Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
FUNCTION BPlusTreeSearch(root, searchKey K)
    // ================================================
    // PHASE 1: TREE DESCENT - Navigate to correct leaf
    // ================================================
    currentNode ← root
    
    WHILE currentNode is NOT a leaf node DO
        // Find the appropriate child pointer to follow
        childIndex ← FindChildIndex(currentNode, K)
        currentNode ← currentNode.children[childIndex]
    END WHILE
    
    // currentNode is now a leaf node
    leafNode ← currentNode
    
    // ================================================
    // PHASE 2: LEAF SEARCH - Find key within leaf
    // ================================================
    RETURN SearchInLeaf(leafNode, K)
END FUNCTION
 
FUNCTION FindChildIndex(internalNode, K)
    // Binary search to find correct child pointer
    // internalNode has keys: K₁, K₂, ..., Kₘ₋₁
    // and children: P₁, P₂, ..., Pₘ
    
    keys ← internalNode.keys    // m-1 keys
    m ← internalNode.childCount // m children
    
    // Find smallest i such that K < keys[i]
    // If no such i exists, use the rightmost child
    
    FOR i ← 1 TO m-1 DO
        IF K < keys[i] THEN
            RETURN i    // Follow child pointer Pᵢ
        END IF
    END FOR
    
    RETURN m    // K ≥ all keys, follow rightmost pointer Pₘ
END FUNCTION
 
FUNCTION SearchInLeaf(leafNode, K)
    // Binary search within leaf node
    keys ← leafNode.keys
    n ← leafNode.keyCount
    
    // Binary search for K
    left ← 1
    right ← n
    
    WHILE left ≤ right DO
        mid ← (left + right) / 2
        
        IF keys[mid] = K THEN
            // Found! Return associated record pointer(s)
            RETURN (TRUE, leafNode.recordPointers[mid])
        ELSE IF keys[mid] < K THEN
            left ← mid + 1
        ELSE
            right ← mid - 1
        END IF
    END WHILE
    
    // Key not found
    RETURN (FALSE, NULL)
END FUNCTION

Implementation Note: Duplicate Keys

Worked Example: Tracing a Search

Let's trace a complete search operation through a concrete B+-tree example. Consider a B+-tree of order 4 (maximum 4 children per internal node, maximum 3 keys per leaf node) indexing employee IDs:

Tree Structure:

Converting Mermaid diagram...

Search for Key = 42:

Step 1: Start at Root

Current Node: Root with keys [30, 60]
Search Key: 42
Comparison: 42 ≥ 30 and 42 < 60
Decision: Follow second child pointer P₂ (between keys 30 and 60)

Step 2: At Internal Node B

Current Node: Internal Node B with keys [40, 50]
Search Key: 42
Comparison: 42 ≥ 40 and 42 < 50
Decision: Follow second child pointer P₂

Step 3: At Leaf Node 5

Current Node: Leaf 5 with keys [40, 42, 45]
Binary search for 42
Found at position 2!
Return: (TRUE, record_pointer_for_42)

Search Trace Summary for Key = 42
Step	Node Type	Node Contents	Comparison	Action
1	Root (Internal)	[30, 60]	30 ≤ 42 < 60	Follow P₂ → Node B
2	Internal	[40, 50]	40 ≤ 42 < 50	Follow P₂ → Leaf 5
3	Leaf	[40, 42, 45]	42 = 42 ✓	Key Found!

Search for Key = 25 (Not Present):

Step 1: Start at Root

Keys [30, 60], Search Key 25
25 < 30, follow first child pointer P₁

Step 2: At Internal Node A

Keys [10, 20], Search Key 25
25 ≥ 20, follow third child pointer P₃

Step 3: At Leaf Node 3

Keys [20, 22, 28], Search Key 25
Binary search: 25 not found (would fall between 22 and 28)
Return: (FALSE, NULL)

Notice that the search correctly navigates to Leaf 3, which is where 25 would be stored if it existed. This property is crucial for insertions.

Search Finds the Correct Leaf Even for Missing Keys

Complexity Analysis: Why B+-trees Excel

The B+-tree search algorithm's efficiency stems from its logarithmic depth. Let's derive the time and I/O complexity rigorously.

Height Analysis:

For a B+-tree of order n (max n children per node):

Minimum children per non-root internal node: ⌈n/2⌉
Minimum keys in leaf: ⌊(n-1)/2⌋

With N total keys, the maximum height h satisfies:

N ≤ (n-1) × n^(h-1)

Solving for h:

h ≤ 1 + log_n(N/(n-1))
h = O(log_n N)

For typical B+-trees where n = 100-200:

1 million records: height ≈ 3-4
1 billion records: height ≈ 5-6

B+-tree Height for Various Data Sizes (Order n = 100)
Number of Records (N)	Maximum Height	Disk I/Os for Search
1,000	2	2
100,000	3	3
10,000,000	4	4
1,000,000,000	5	5
100,000,000,000	6	6

I/O Complexity:

Each level traversed requires one disk I/O to read a node. The search reads exactly h nodes where h is the tree height.

I/O Cost = O(log_n N) = O(height)

This is remarkably efficient. Searching among 1 billion records requires only 5 disk reads. At 10ms per random disk seek, that's 50ms—fast enough for interactive queries.

CPU Complexity:

At each internal node, we perform binary search on up to n-1 keys:

Time per node: O(log n)
Total time: O(h × log n) = O(log_n N × log n) = O(log N)

Since disk I/O dominates, the O(log N) CPU time is negligible.

The Power of High Fanout

B+-tree Search Advantages

•Guaranteed O(log N) worst case — no degenerate cases like unbalanced BSTs
•Minimal disk I/Os — typically 3-5 for billions of records
•Cache-friendly — large nodes exploit disk block reads
•Predictable performance — height varies logarithmically
•Root commonly cached — reduces search to h-1 I/Os in practice

Search Overhead Considerations

•Random access pattern — each level may hit different disk block
•Per-query overhead — not amortized like sequential scan
•Index maintenance cost — search speed trades off with write speed
•Memory for caching — upper levels should stay in buffer pool
•Selectivity dependency — point queries fast, but full scans still O(N)

Real-World Search Optimizations

Production database systems implement several optimizations to make B+-tree search even faster:

Common B+-tree Search Optimizations

•Root Node Pinning — The root node is permanently kept in memory. Since every search starts at the root, pinning eliminates one guaranteed disk I/O per search.
•Upper Level Caching — Internal nodes near the root are heavily accessed. Modern systems keep the top 2-3 levels entirely in the buffer pool, reducing typical search to 1-2 I/Os.
•Binary Search with SIMD — Within a node, modern CPUs use SIMD instructions to compare the search key against multiple node keys simultaneously, accelerating the node-local search.
•Interpolation Hints — For numeric keys with known distribution, some systems use interpolation instead of binary search to guess the approximate position within a node.
•Prefix Compression — Keys within a node often share common prefixes. Compressing these prefixes allows more keys per node, increasing effective fanout without increasing node size.
•Fence Keys — Internal nodes store the minimum and maximum key values in their subtree, allowing early termination when the search key is clearly out of range.

Buffer Pool Impact

Adaptive Search Strategies:

Some advanced systems employ adaptive techniques:

NUMA-Aware Placement: In multi-socket systems, B+-tree nodes are placed in memory close to the processor that most frequently accesses them, reducing memory access latency.

Comparative Analysis: Why B+-trees Dominate

To appreciate B+-tree search, compare it with alternative index structures:

Point Query Search Performance Comparison
Structure	Search Complexity	Disk I/Os (1B records)	Strengths	Weaknesses
B+-tree	O(log N)	~5	Balanced, cache-friendly, supports ranges	Moderate insert overhead
Binary Search Tree	O(log N) avg, O(N) worst	~30 (if binary)	Simple implementation	Degenerates if unbalanced
Hash Index	O(1) average	~1-2	Fastest for equality	No range queries
Linear Scan	O(N)	~millions	No structure needed	Impractical at scale
Sorted Array	O(log N)	~30 (binary search)	Simple, compact	Expensive inserts O(N)

Why B+-trees Win for General-Purpose Indexing:

Guaranteed Balance: Unlike BSTs, B+-trees cannot become unbalanced. Every leaf is exactly the same distance from the root, ensuring consistent O(log N) performance regardless of insertion order.
Disk-Optimized Fanout: By storing hundreds of keys per node (matching disk block size), B+-trees minimize the number of disk accesses. A binary tree would require one disk access per comparison—catastrophic for large datasets.
Range Query Support: Unlike hash indexes, B+-trees maintain sorted order within leaves and link leaves together. After finding a start key, range queries simply follow sibling pointers—no additional tree traversals needed.
Mature Implementations: Decades of optimization have produced B+-tree implementations that exploit CPU caches, SIMD, and modern storage hierarchies. The simple algorithm enables sophisticated engineering.

Hash Indexes: The Exception

Summary: B+-tree Search Mastery

Let's consolidate the essential concepts of B+-tree search:

Key Takeaways

•Two-Phase Algorithm: Search consists of tree descent (navigating internal nodes) and leaf search (binary search within the leaf). Both phases are logarithmic.
•Separator Key Navigation: Internal node keys are separators that route searches left or right. Follow the pointer whose key range contains the search key.
•All Data in Leaves: Unlike B-trees, B+-tree searches always terminate at leaf nodes. Internal node keys are just routing guides, not data.
•O(log_n N) I/Os: With fanout n, height is logarithmic. Typical databases need 3-5 disk reads for billions of records.
•Correct Leaf Discovery: The search finds the leaf where a key belongs, even if the key doesn't exist. This enables efficient insertions.
•Real-World Optimizations: Caching upper levels, SIMD search, and prefix compression further accelerate production systems.

What's Next:

Page Complete