B Tree Operations - Learning Module

Loading content...

0/252

Search Algorithm

The Heart of Index Utilization

Every database query that benefits from an index ultimately relies on B-tree search. When you execute a query like SELECT * FROM orders WHERE order_id = 12345, the database doesn't scan millions of rows—it navigates a B-tree, making precisely calculated decisions at each node to locate the target record with logarithmic efficiency.

The B-tree search algorithm is deceptively simple in concept yet profound in its implications. Understanding it deeply reveals why B-trees dominate database indexing and how seemingly minor implementation choices cascade into massive performance differences at scale.

What You Will Master

By completing this page, you will understand: (1) The complete B-tree search algorithm with formal correctness properties, (2) How search translates to disk I/O in real systems, (3) The mathematical foundations of logarithmic search performance, (4) Variants and optimizations used in production databases, and (5) Common pitfalls and debugging strategies for B-tree search issues.

B-Tree Search Fundamentals

B-tree search is a generalized binary search extended to multi-way branching. While a binary search tree compares a key against a single node value and chooses left or right, a B-tree compares against an ordered list of keys within each node and selects the appropriate child pointer from among multiple options.

The Core Invariant:

A B-tree maintains the following ordering invariant that makes search possible:

For every internal node with keys K₁, K₂, ..., Kₙ and child pointers P₀, P₁, ..., Pₙ:

All keys in the subtree pointed to by P₀ are less than K₁

All keys in the subtree pointed to by Pᵢ (for 1 ≤ i < n) are between Kᵢ and Kᵢ₊₁

All keys in the subtree pointed to by Pₙ are greater than or equal to Kₙ

This invariant is the foundation upon which all B-tree operations rest. Search exploits it to eliminate entire subtrees from consideration at each level.

Exact vs. Range Semantics

Different B-tree implementations use slightly different boundary semantics (strict less-than vs. less-than-or-equal). The key insight is that there must be a consistent, well-defined rule that partitions the key space. The exact choice affects edge cases in insertion and deletion but not the fundamental search algorithm.

Visual Understanding:

Consider a B-tree node with keys [20, 40, 60] and four child pointers:

           [20 | 40 | 60]
            /   |   |   
          P₀   P₁  P₂   P₃
          ↓    ↓   ↓    ↓
      keys   keys  keys  keys
      < 20   20-39 40-59  ≥ 60

Searching for key 15: Compare against 20 → 15 < 20 → follow P₀
Searching for key 35: Compare against 20 → 35 > 20; Compare against 40 → 35 < 40 → follow P₁
Searching for key 60: Compare against 20 → 60 > 20; Compare against 40 → 60 > 40; Compare against 60 → 60 = 60 → found (or follow P₃ in separator-only B+-trees)

At each node, we perform a within-node search (typically binary search for efficiency) to determine which path to follow.

The Search Algorithm in Detail

Let's formalize the B-tree search algorithm with precise pseudocode and then analyze each component:

Algorithm: B-Tree Search

BTREE-SEARCH(node, key):
    Input: node - current B-tree node being examined
           key  - the search key we're looking for
    Output: (node, index) pair if key found, or NULL if not found

    1.  i ← 0
    2.  // Find the smallest index where key ≤ node.keys[i]
    3.  WHILE i < node.numKeys AND key > node.keys[i]:
    4.      i ← i + 1
    5.  
    6.  // Check if we found the key at position i
    7.  IF i < node.numKeys AND key = node.keys[i]:
    8.      RETURN (node, i)    // Key found at position i
    9.  
   10.  // Key not found in this node
   11.  IF node.isLeaf:
   12.      RETURN NULL         // Key doesn't exist in tree
   13.  ELSE:
   14.      // Read child from disk if not in memory
   15.      DISK-READ(node.children[i])
   16.      RETURN BTREE-SEARCH(node.children[i], key)

This recursive algorithm captures the essence of B-tree search. Let's examine each phase:

Algorithm Phases

•Phase 1: Within-Node Search (Lines 1-4) — Locate the position where the search key would belong within the current node's key array. The linear scan can be replaced with binary search for nodes with many keys.
•Phase 2: Key Match Check (Lines 7-8) — After positioning, check if the key at the computed index matches our search key. If so, we've found the target.
•Phase 3: Leaf Termination (Lines 11-12) — If we're at a leaf and haven't found the key, it doesn't exist in the tree. This is the base case for unsuccessful search.
•Phase 4: Recursive Descent (Lines 13-16) — If we're at an internal node, descend to the appropriate child. The child pointer at position i leads to keys in the range (keys[i-1], keys[i]].

Disk I/O is the Critical Cost

Line 15 (DISK-READ) is where the real cost lies. Each recursive call potentially triggers a disk read, which takes milliseconds—orders of magnitude slower than in-memory operations. The entire design of B-trees is optimized to minimize the number of these disk reads.

Within-Node Search Optimization

The within-node search (finding the correct key position within a single node) is a crucial optimization point. While the number of disk I/Os dominates overall cost, the CPU time for within-node searches becomes significant with large fan-outs or under high query loads.

Linear Search:

The naive approach scans keys sequentially:

Time complexity: O(n) where n = number of keys in node
Simple to implement, cache-friendly sequential access
Works well for small n (e.g., n ≤ 8)

Binary Search:

For larger nodes, binary search reduces comparisons:

Time complexity: O(log n)
More complex logic with potential cache misses
Dominant choice for nodes with many keys

Interpolation Search:

When keys are uniformly distributed:

Expected complexity: O(log log n)
Requires numeric keys with known distribution
Rarely used in practice due to distribution assumptions

Within-Node Search Strategy Comparison
Node Size	Linear Search	Binary Search	Recommendation
4-8 keys	4-8 comparisons	2-3 comparisons	Either works; linear often faster due to simplicity
16-32 keys	16-32 comparisons	4-5 comparisons	Binary search preferred
64-128 keys	64-128 comparisons	6-7 comparisons	Binary search essential
256+ keys	256+ comparisons	8+ comparisons	Binary search mandatory

SIMD-Accelerated Search:

Modern database systems exploit Single Instruction, Multiple Data (SIMD) instructions for within-node search:

// Conceptual SIMD search within a node
__m256i key_vec = _mm256_set1_epi32(search_key);
__m256i node_keys = _mm256_loadu_si256((__m256i*)node->keys);
__m256i cmp_result = _mm256_cmpgt_epi32(key_vec, node_keys);
int mask = _mm256_movemask_ps((__m256)cmp_result);
int pos = __builtin_popcount(mask);  // Count keys < search_key

This compares 8 keys simultaneously, reducing search time dramatically for fixed-size integer keys. Production systems like PostgreSQL and MySQL leverage such optimizations.

Cache Line Awareness

A CPU cache line is typically 64 bytes. If keys are 8-byte integers, a cache line holds 8 keys. Accessing the first key in a group brings the entire cache line into L1 cache. Smart implementations align nodes to cache line boundaries and search in cache-line-sized chunks.

Time and Space Complexity Analysis

Understanding B-tree search complexity requires separating two cost models: disk I/O (the dominant factor) and CPU operations (significant under high load).

Height of a B-Tree:

For a B-tree with minimum degree t (each node has at least t-1 keys and at most 2t-1 keys):

Minimum keys per node (except root): t - 1
Maximum keys per node: 2t - 1
For n total keys, the height h satisfies:

$$h \leq \log_t \left( \frac{n+1}{2} \right)$$

This means a B-tree with t = 100 (typical for disk-based systems) containing 1 billion keys has height at most:

$$h \leq \log_{100}(500,000,000) \approx 4.35$$

So at most 5 disk reads locate any key among 1 billion!

B-Tree Search Complexity by Cost Model
Cost Metric	Complexity	Practical Interpretation
Disk I/O	O(log_t n)	Number of nodes read from disk; typically 3-5 for billion-key trees
CPU comparisons (linear within-node)	O(t · log_t n)	Linear scan of each visited node
CPU comparisons (binary within-node)	O(log t · log_t n) = O(log n)	Binary search within each visited node
Space for search	O(log_t n)	Recursion stack depth or path storage

Why Disk I/O Dominates:

Consider typical access times:

Disk seek + rotational latency: ~5-10 ms (HDD)
SSD random read: ~0.1-0.2 ms
Main memory access: ~100 ns
L1 cache access: ~1 ns
CPU comparison: ~1 ns

A single disk read is 1,000,000× slower than a CPU comparison. This is why B-trees optimize for fewer disk reads (by maximizing node size) even at the cost of more CPU work per node.

Practical Example:

Searching a B-tree with t = 200 for a key among 10⁹ keys:

Height: ⌈log₂₀₀(10⁹)⌉ = 4 levels
Disk I/Os: 4 reads
Time at 0.1ms per SSD read: 0.4 ms
Alternative (linear scan of 10⁹ keys): ~11 days of sequential reading

The B-tree transforms an impossible problem into a trivial one.

The Root Node Exception

Most database systems keep the root node (and often the first 2-3 levels) permanently in memory. For a billion-key B-tree, this means only 1-2 disk reads are actually needed, not the theoretical 4-5. This optimization is critical for high-performance systems.

B-Tree vs. B+-Tree Search Differences

While the basic search algorithm is similar for B-trees and B+-trees, there are critical differences that affect implementation and performance:

B-Tree (Classic):

Keys can appear in any node (internal or leaf)
Search may terminate at an internal node if the key is found
Associated data (record pointers) stored with keys at all levels
Node structure: [P₀, K₁, D₁, P₁, K₂, D₂, P₂, ...] where Dᵢ is data pointer

B+-Tree (Database Standard):

Keys in internal nodes are separators only (copies or routing keys)
Search always descends to a leaf, even if separator matches
Data pointers exist only in leaf nodes
Leaf nodes linked for sequential access
Internal node structure: [P₀, K₁, P₁, K₂, P₂, ...] (no data pointers)

B-Tree Search Characteristics

•May find key at any level
•Average path length slightly shorter
•Data distributed across all nodes
•Harder to support range queries
•Each node larger (includes data)

B+-Tree Search Characteristics

•Always reaches leaf level
•Consistent path length for all keys
•Data concentrated in leaves
•Natural range query support
•Higher fanout (smaller internal nodes)

B+-Tree Search Modification:

BPLUS-TREE-SEARCH(node, key):
    WHILE NOT node.isLeaf:
        i ← FindChildIndex(node, key)  // Binary search for position
        DISK-READ(node.children[i])
        node ← node.children[i]
    
    // Now at leaf level - search for exact key
    i ← BinarySearch(node.keys, key)
    IF i < node.numKeys AND node.keys[i] = key:
        RETURN (node, i)  // Found: return leaf and index
    ELSE:
        RETURN NULL       // Not found

Note that B+-tree search is inherently iterative rather than recursive—we descend until reaching a leaf, then perform a final lookup. This simplifies implementation and reduces stack usage.

Why B+-Trees Dominate Databases

B+-trees have higher fanout (more keys per node, thus shorter trees) because internal nodes don't store data pointers. For a node that fits in one disk page, a B+-tree fits more keys, reducing tree height and disk I/Os. This is why virtually all modern database indexes use B+-trees, not classic B-trees.

Range Query Optimization

While point queries (finding a single key) are important, database workloads often involve range queries: WHERE age BETWEEN 25 AND 35 or WHERE date >= '2024-01-01'. B+-trees excel at range queries due to their leaf-level structure.

Range Query Algorithm:

BPLUS-TREE-RANGE-SEARCH(tree, lowKey, highKey):
    // Step 1: Find the starting leaf
    leaf ← FindLeaf(tree.root, lowKey)
    
    // Step 2: Collect results by scanning leaves sequentially
    results ← []
    WHILE leaf ≠ NULL:
        FOR i ← 0 TO leaf.numKeys - 1:
            IF leaf.keys[i] > highKey:
                RETURN results  // Past range, done
            IF leaf.keys[i] >= lowKey:
                results.append((leaf.keys[i], leaf.data[i]))
        leaf ← leaf.nextLeaf  // Follow sibling pointer
    
    RETURN results

The key insight: once we find the starting position, we traverse linked leaves without ever ascending the tree. This turns range queries into sequential I/O, which is dramatically faster than random I/O.

Range Query I/O Patterns
Operation	I/O Type	Cost Characteristic
Find starting leaf	Random (tree descent)	O(log_t n) disk reads
Scan matching leaves	Sequential (linked list)	O(k / B) where k = result count, B = keys per leaf
Fetch actual records	Random (pointer following)	O(k) in worst case; clustered index avoids this

Sequential I/O Advantage:

Hard drives and SSDs have vastly different performance characteristics for sequential vs. random access:

Storage Type	Sequential Read	Random Read	Ratio
HDD	200 MB/s	1 MB/s	200:1
SATA SSD	550 MB/s	50 MB/s	11:1
NVMe SSD	3500 MB/s	500 MB/s	7:1

Even on fast NVMe storage, sequential reads are 7× faster. For HDDs, the difference is 200×. B+-tree leaf linking transforms range queries from random access nightmares into fast sequential scans.

Bidirectional Pointers:

Advanced implementations maintain both nextLeaf and prevLeaf pointers, enabling efficient backward scans for:

ORDER BY ... DESC queries
Finding the maximum key in a range
Cursor repositioning in both directions

Covering Indexes and Range Queries

If all columns needed by a query exist in the index (a 'covering index'), the entire result can be produced by scanning leaves—no record fetches needed. This eliminates the random I/O phase, making range queries exceptionally fast.

Production Implementation Patterns

Real database systems implement B-tree search with numerous practical considerations that textbook algorithms omit. Let's examine production-grade patterns:

1. Buffer Pool Integration:

Search doesn't directly perform disk I/O—it requests pages from a buffer pool manager:

Node* getNode(PageID pid) {
    // Check if page is already in memory
    Frame* frame = bufferPool.lookup(pid);
    if (frame != nullptr) {
        frame->pinCount++;  // Prevent eviction during use
        return frame->page;
    }
    
    // Page miss: fetch from disk
    frame = bufferPool.evictAndFetch(pid);
    return frame->page;
}

This allows the upper levels of the tree (frequently accessed) to remain in memory while leaves are fetched on demand.

2. Latching for Concurrency:

Concurrent access requires protecting nodes during traversal:

Record* search(Key key) {
    Node* current = root;
    current->latch.lockShared();  // Shared latch - multiple readers OK
    
    while (!current->isLeaf) {
        int i = findChildIndex(current, key);
        Node* child = getNode(current->children[i]);
        child->latch.lockShared();
        current->latch.unlock();  // Release parent before descending (latch coupling)
        current = child;
    }
    
    // Search within leaf...
    current->latch.unlock();
}

This latch coupling (or crabbing) pattern ensures we hold latches on at most two nodes at any time.

Additional Production Considerations

•Prefix Compression — Store common prefixes once to fit more keys per node, increasing fanout and reducing tree height.
•Suffix Truncation — Internal separator keys need only enough bytes to distinguish subtrees, not full key values.
•Page ID Indirection — Use logical page IDs that map to physical locations, enabling page relocation without tree modification.
•Fence Keys — Each leaf stores its low and high key boundaries, enabling consistency checks and crash recovery.
•Bloom Filters — Some systems add Bloom filters to leaves to quickly reject non-existent key lookups without scanning all keys.

The Slot Array Pattern

Production B-tree nodes often use a slot array (indirection layer) rather than storing keys contiguously. This enables efficient insertion/deletion within a node without moving large amounts of data—just update the slot pointers. Binary search operates on the slot array, then follows the indirection to access actual keys.

Search Debugging and Correctness Verification

When B-tree search behaves unexpectedly, systematic debugging is essential. Common issues include corrupted invariants, incorrect boundary handling, and concurrency bugs.

Invariant Verification:

A correct B-tree satisfies these properties. Violations indicate bugs:

Ordering Invariant: Keys within each node are sorted
Separator Invariant: All keys in child[i] are < separator[i] (for B+-trees)
Leaf Linkage: Leaves form a sorted linked list
Balance Invariant: All leaves are at the same depth
Occupancy Invariant: Each non-root node has between ⌈t/2⌉ and t-1 keys

Implement a verifyTree() function that recursively checks these properties during development.

btree-verification.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
interface VerifyResult {
    valid: boolean;
    error?: string;
    path?: number[];  // Path to problematic node
}
 
function verifyBTree(node: BTreeNode, minKey: Key | null, maxKey: Key | null, depth: number, expectedDepth: number | null): VerifyResult {
    // Check ordering within node
    for (let i = 1; i < node.keys.length; i++) {
        if (node.keys[i] <= node.keys[i-1]) {
            return { valid: false, error: `Keys not sorted at depth ${depth}`, path: [i] };
        }
    }
    
    // Check boundary constraints
    if (minKey !== null && node.keys[0] < minKey) {
        return { valid: false, error: `Key ${node.keys[0]} violates lower bound ${minKey}` };
    }
    if (maxKey !== null && node.keys[node.keys.length-1] >= maxKey) {
        return { valid: false, error: `Key ${node.keys[node.keys.length-1]} violates upper bound ${maxKey}` };
    }
    
    // Leaf depth consistency
    if (node.isLeaf) {
        if (expectedDepth !== null && depth !== expectedDepth) {
            return { valid: false, error: `Leaf at depth ${depth}, expected ${expectedDepth}` };
        }
        return { valid: true };
    }
    
    // Verify children recursively
    for (let i = 0; i <= node.keys.length; i++) {
        const childMin = i === 0 ? minKey : node.keys[i-1];
        const childMax = i === node.keys.length ? maxKey : node.keys[i];
        const childResult = verifyBTree(node.children[i], childMin, childMax, depth + 1, expectedDepth);
        if (!childResult.valid) {
            return { ...childResult, path: [i, ...(childResult.path || [])] };
        }
    }
    
    return { valid: true };
}

Common Search Bugs and Fixes:

Symptom	Likely Cause	Debugging Approach
Key exists but search returns NULL	Off-by-one in boundary comparison	Log visited nodes and compare paths
Wrong value returned	Key duplicates with wrong handling	Verify uniqueness or duplicate semantics
Infinite loop	Cycle in tree structure	Add visited-node tracking, verify parent/child consistency
Performance degradation	Tree becoming unbalanced	Measure actual height vs. theoretical minimum
Inconsistent results	Concurrent modification bug	Enable latch debugging, review latch coupling logic

Debug Logging Strategy

Implement a debug mode that logs: (1) Each node visited with its key range, (2) Which child pointer is chosen and why, (3) The final decision (found/not found). This trace makes it trivial to identify where search diverges from expected behavior.

Summary: Mastering B-Tree Search

B-tree search is the foundation upon which all B-tree operations and database indexing rest. Let's consolidate the key insights from this comprehensive exploration:

Core Takeaways

•Search exploits the ordering invariant — At each node, comparisons determine which subtree to explore, eliminating large portions of the key space per comparison.
•Disk I/O is the dominant cost — B-tree design minimizes disk reads by maximizing node size (thus fanout), keeping tree height low even for billions of keys.
•Within-node search is a secondary concern — Binary search or SIMD acceleration within nodes improves CPU efficiency but matters far less than disk access reduction.
•B+-trees descend to leaves unconditionally — This consistency enables superior range query support and higher fanout compared to classic B-trees.
•Leaf linking enables efficient range scans — Once the starting position is found, range queries become sequential I/O, dramatically faster than random access.
•Production implementations integrate with caching and concurrency systems — Buffer pools, latch coupling, and page-level indirection are essential for real-world deployments.

Ready for Insertion

You now understand B-tree search at a depth suitable for production implementation and optimization. The next page explores the insertion algorithm, where we'll see how B-trees maintain their balance invariants while growing—including the critical concept of node splitting.