Indexes And Query Performance - Learning Module

Loading content...

0/273

B-tree Indexes

The Universal Index Structure

If there's one data structure that powers the modern database world, it's the B-tree. Invented by Rudolf Bayer and Edward McCreight in 1970 at Boeing Research Labs, the B-tree has become the de facto standard index structure in virtually every relational database system: PostgreSQL, MySQL, SQL Server, Oracle, SQLite, and countless others.

The B-tree's dominance isn't accidental. It was specifically designed to optimize disk I/O operations, making it ideally suited for databases where data far exceeds available memory. More than 50 years later, despite dramatic changes in storage technology, B-trees remain the optimal choice for most indexing scenarios.

In this page, we'll explore B-trees from their mathematical foundations through their practical behavior in production database systems, giving you the deep understanding needed to reason about B-tree index performance and design effective indexing strategies.

What You Will Master

By the end of this page, you'll understand the B-tree data structure at a fundamental level, including node organization, search algorithms, insertion with splitting, deletion with merging, the B+ tree variant used in databases, and the factors that determine B-tree performance in practice.

Why B-trees? The Disk I/O Problem

To understand why B-trees dominate database indexing, we must first understand the problem they solve: minimizing disk I/O while maintaining efficient search, insert, and delete operations.

The Memory Hierarchy Reality:

Modern computer systems have a dramatic performance gap between memory and disk:

RAM access: ~100 nanoseconds
SSD random read: ~100 microseconds (1,000x slower)
HDD random read: ~10 milliseconds (100,000x slower than RAM)

When data doesn't fit in memory, database performance is dominated by disk I/O. The goal of any disk-based data structure is to minimize the number of disk reads per operation.

Why Not Binary Search Trees?

A balanced binary search tree (like an AVL tree or Red-Black tree) provides O(log₂ N) search performance. For a billion elements, that's about 30 comparisons—which sounds excellent. But there's a critical problem:

Each comparison might require a disk read.

In a binary tree, each node has only 2 children. To store 1 billion elements, you need a tree of height ~30. If each node is on a different disk page, you need 30 disk reads per lookup—that's 30 × 10ms = 300ms per search on an HDD, or about 3 queries per second.

Tree Fanout vs Disk Reads for 1 Billion Elements
Tree Type	Fanout	Height (1B elements)	Disk Reads/Lookup	Time (HDD)	Time (SSD)
Binary tree	2	30	30	300ms	3ms
4-way tree	4	15	15	150ms	1.5ms
100-way tree	100	5	5	50ms	0.5ms
B-tree (500-way)	500	3-4	3-4	30-40ms	0.3-0.4ms

The B-tree Solution:

B-trees solve the disk I/O problem by maximizing the fanout—the number of children per node. Instead of 2 children (binary), B-trees have hundreds or thousands of children per node.

The key insight is that a single disk read retrieves an entire disk page (typically 4KB-16KB). Rather than wasting most of that page on a single tree node with 2 children, B-trees pack as many keys and pointers as possible into each page.

With a fanout of 500:

Height for 1 billion elements: log₅₀₀(1,000,000,000) ≈ 3.35
Only 4 disk reads needed for any lookup
That's a 7-8x improvement over binary trees

The Logarithmic Base Matters:

Remember that log₂(N) vs log₅₀₀(N) = log₂(N) / log₂(500) ≈ log₂(N) / 9. By increasing the fanout from 2 to 500, we reduce tree height by a factor of 9.

The Core B-tree Insight

B-trees trade off in-memory comparison efficiency for disk I/O efficiency. It's faster to do 100 comparisons within a single page in RAM than to do 2 comparisons across 50 pages on disk. This trade-off is wildly profitable—the disk I/O saved dwarfs the extra in-memory comparisons.

B-tree Structure and Properties

A B-tree of order m (also called minimum degree t in some definitions) satisfies the following properties:

Structural Properties:

Every node has at most m children (and thus at most m-1 keys)
Every non-root internal node has at least ⌈m/2⌉ children (and thus at least ⌈m/2⌉-1 keys)
The root has at least 2 children if it is not a leaf
All leaves appear at the same depth (the tree is perfectly balanced)
Keys within a node are sorted in ascending order

Node Structure:

Each node contains:

n keys: k₁ < k₂ < ... < kₙ (sorted)
n+1 child pointers: c₀, c₁, ..., cₙ (for internal nodes)
Metadata: node type (leaf/internal), parent pointer (optional), key count

The Search Invariant:

For each key kᵢ in an internal node:

All keys in subtree pointed to by cᵢ₋₁ are less than kᵢ
All keys in subtree pointed to by cᵢ are greater than kᵢ

This invariant enables binary search within nodes to find the correct child pointer.

btree-node-structure.txt
B-tree Node Structure (Order m=5, so 2-4 keys per node)
 
┌────────────────────────────────────────────┐
│              INTERNAL NODE                  │
├────────────────────────────────────────────┤
│  Header: [type=INTERNAL, n=3]               │
├────────────────────────────────────────────┤
│  Keys:     │ 25 │ 50 │ 75 │ -- │            │
├────────────────────────────────────────────┤
│  Pointers: │ p0 │ p1 │ p2 │ p3 │            │
└────────────────────────────────────────────┘
              │    │    │    │
              ▼    ▼    ▼    ▼
           <25  25-50 50-75  >75
 
┌────────────────────────────────────────────┐
│                LEAF NODE                    │
├────────────────────────────────────────────┤
│  Header: [type=LEAF, n=4]                   │
├────────────────────────────────────────────┤
│  Keys:     │ 10 │ 15 │ 20 │ 23 │            │
├────────────────────────────────────────────┤
│  Values:   │ v1 │ v2 │ v3 │ v4 │  ← Row pointers or data
├────────────────────────────────────────────┤
│  Next Leaf: ────────────────────→           │  ← For range scans (B+ tree)
└────────────────────────────────────────────┘

Why the Minimum Fill Requirement?

The requirement that non-root nodes be at least half-full serves critical purposes:

Guarantees logarithmic height: With at least ⌈m/2⌉ children per node, height is O(log_{m/2} N)
Bounds wasted space: At least 50% of allocated space is used
Enables efficient rebalancing: There's always room to borrow keys from siblings

Height Calculation:

For a B-tree of order m with N keys:

Maximum height: log_{⌈m/2⌉}((N+1)/2)
For m=500, N=1 billion: height ≈ 3-4

This height guarantee is what makes B-trees so powerful—regardless of the data distribution, lookups require at most 4 disk reads for billion-element indexes.

Order vs Degree Terminology

Different sources use different terminology. 'Order m' typically means max m children (Knuth's definition). 'Minimum degree t' means non-root nodes have t to 2t children (CLRS). When reading about B-trees, always verify which definition is being used.

B-tree Search Algorithm

Searching a B-tree combines traversal from root to leaf with binary search within each node. The algorithm is elegant and efficient.

Search Algorithm:

BTREE_SEARCH(node, key):
    i = 0
    // Find the smallest index i where key ≤ node.keys[i]
    while i < node.n and key > node.keys[i]:
        i = i + 1
    
    // Case 1: Found exact match
    if i < node.n and key == node.keys[i]:
        return (node, i)  // Return node and position
    
    // Case 2: Not found in this node
    if node.is_leaf:
        return NOT_FOUND  // Key doesn't exist
    else:
        // Recursively search the appropriate child
        return BTREE_SEARCH(node.children[i], key)

Searching Within a Node:

The inner search (finding position i) can be done via:

Linear search: O(m) - simple, fast for small m
Binary search: O(log m) - better for large m

In practice, for typical B-tree orders (m < 500), linear search is often faster due to better cache behavior and simpler code.

btree-search-example.txt
Search for key 67 in B-tree:
 
Step 1: Start at root
┌───────────────────────┐
│     [30, 60, 90]      │  ← 67 > 60, 67 < 90, go to child[2]
└───────────────────────┘
        │    │    │    │
        ▼    ▼    ▼    ▼
              └────┐
                   ▼
Step 2: Read internal node
┌───────────────────────┐
│    [65, 70, 80, 85]   │  ← 67 > 65, 67 < 70, go to child[1]
└───────────────────────┘
   │    │    │    │    │
   ▼    ▼    ▼    ▼    ▼
        └────┐
             ▼
Step 3: Read leaf node
┌───────────────────────┐
│   [66, 67, 68, 69]    │  ← 67 == keys[1], FOUND!
│   [p1, p2, p3, p4]    │     return pointer p2
└───────────────────────┘
 
Total disk reads: 3 (root often cached → 2)

Time Complexity Analysis:

Disk I/O: O(log_m N) — one disk read per level
CPU comparisons: O(m × log_m N) with linear search, O(log m × log_m N) with binary search
Total: O(log N) with constants depending on m

Practical Performance:

In practice, B-tree search performance is dominated by disk I/O. The CPU comparisons within each node are nearly instantaneous compared to disk access. This is why we optimize for minimizing tree height, even at the cost of more comparisons per node.

Caching Effects:

The root node and upper internal nodes are accessed by every lookup. Databases keep these nodes cached in the buffer pool, often resulting in:

Root: always in cache (100%)
Level 1 internal nodes: usually in cache (90%+)
Level 2 internal nodes: often in cache (50-80%)
Leaf nodes: cache hit rate depends on access pattern

For a height-4 B-tree with good caching, a typical lookup might require only 1-2 actual disk reads, not 4.

The Buffer Pool is Key

Database buffer pool sizing directly affects B-tree performance. A larger buffer pool means more internal nodes stay cached. The ideal buffer pool can hold the entire index's internal nodes, leaving only leaf nodes requiring disk access.

B-tree Insertion with Splitting

Insertion in a B-tree is more complex than search because it must maintain all B-tree invariants, particularly the balanced height property and the minimum/maximum key constraints.

Insertion Algorithm Overview:

Search for the leaf node where the key should be inserted
If the leaf has room (< m-1 keys), insert the key in sorted position
If the leaf is full, split the node and propagate a key upward
If the split propagates to and splits the root, create a new root (tree grows taller)

The Splitting Operation:

When a node with m-1 keys needs to accept a new key:

Create a new sibling node
Take the middle key as the separator
Move the upper half of keys to the new sibling
Insert the separator and pointer to the new sibling into the parent
If the parent overflows, recursively split it

btree-split-example.txt
Splitting a Full Node (Order m=5, max 4 keys per node)
 
Before: Insert key 35 into full leaf
┌─────────────────────────────────────────┐
│ Parent:  [30, 50, 70]                   │
│          p0  p1  p2  p3                 │
└─────────────────────────────────────────┘
                │
                ▼ (p1 points to this full leaf)
┌────────────────────────────────┐
│ Leaf: [32, 34, 36, 38]  ← FULL │  Insert 35 → overflow!
└────────────────────────────────┘
 
Step 1: Split the leaf at median (35)
┌────────────────────┐   ┌────────────────────┐
│ Left: [32, 34]     │   │ Right: [36, 38]    │
└────────────────────┘   └────────────────────┘
            ↑                      ↑
      stays in place       new sibling created
            
Median key 35 promoted to parent
 
After split:
┌─────────────────────────────────────────┐
│ Parent:  [30, 35, 50, 70]               │  ← 35 promoted here
│          p0  p1  p2  p3  p4             │
└─────────────────────────────────────────┘
          │    │    │    
          ▼    ▼    ▼    
        [...]  │    │
               │    ▼
               │  [36, 38]  ← new sibling
               ▼
             [32, 34]  ← original (trimmed)

Root Splitting (Tree Height Increase):

When a split propagates all the way to the root and the root itself splits, the tree grows taller:

Split the root into two nodes
Create a new root containing only the median key
Set the new root's children as the two split halves

This is the only way a B-tree increases in height. It ensures that all paths from root to leaves remain the same length (perfect balance).

Insertion Complexity:

Best case: Leaf has room → O(log N) to find leaf, O(1) to insert
Worst case: Splits propagate to root → O(log N) splits, each O(m)
Amortized: O(log N) per insertion

Page Splits and I/O:

In database terms, each split involves:

Reading the full node (1 I/O)
Writing the modified original node (1 I/O)
Writing the new sibling node (1 I/O)
Writing the parent with the new key (1 I/O)

This is 4 I/O operations per split level. In the worst case (splits to root), a single insert might trigger 4 × height = 12-16 I/O operations. However, this is rare—most inserts don't cause any splits.

Sequential vs Random Insert Performance

Inserting keys in random order causes more splits than sequential insertion. Random inserts fill nodes unpredictably, while sequential inserts (like auto-increment IDs) append to the rightmost leaf, minimizing splits. This is why auto-increment primary keys are performance-friendly.

B-tree Deletion with Merging

Deletion in a B-tree must maintain the minimum fill constraint: non-root nodes must have at least ⌈m/2⌉-1 keys. When deletion causes underflow, the tree must be rebalanced.

Deletion Algorithm Overview:

Case 1: Key is in a leaf node

Simply remove the key
If the leaf now has fewer than minimum keys, rebalance

Case 2: Key is in an internal node

Replace with predecessor (max key in left subtree) or successor (min key in right subtree)
Delete the predecessor/successor from its leaf node (reduces to Case 1)

Rebalancing on Underflow:

When a node has too few keys after deletion:

Option A: Borrow from sibling (Rotation)

If an adjacent sibling has more than minimum keys
Move a key from sibling → parent → underflowed node
Maintain B-tree properties through "rotation"

Option B: Merge with sibling

If both siblings have exactly minimum keys
Combine the underflowed node with a sibling
Pull down the separator key from the parent
Parent now has one fewer key (may trigger recursive rebalancing)

btree-rebalance-example.txt
Rebalancing After Deletion (Order m=5, min 2 keys per node)
 
Scenario: Delete causes underflow, borrow from right sibling
 
Before: Delete key 18 from left leaf
┌───────────────────────────┐
│ Parent:  [20, 40]         │
└───────────────────────────┘
         │       │
         ▼       ▼
┌────────────┐  ┌─────────────────┐
│ [15, 18]   │  │ [25, 30, 35]    │  ← Right sibling has 3 keys
└────────────┘  └─────────────────┘
       ↑
  Delete 18 → underflow! (only 1 key left)
 
After: Borrow via rotation
┌───────────────────────────┐
│ Parent:  [25, 40]         │  ← Parent's 20 moved down, sibling's 25 moved up
└───────────────────────────┘
         │       │
         ▼       ▼
┌────────────┐  ┌────────────┐
│ [15, 20]   │  │ [30, 35]   │  ← Balanced! Both have ≥2 keys
└────────────┘  └────────────┘
       ↑             ↑
  Got parent's 20   Lost 25 to parent
 
---
 
Scenario: Merge when siblings can't spare
 
Before: Delete key 42, siblings at minimum
┌───────────────────────────────┐
│ Parent:  [40, 50]             │
└───────────────────────────────┘
       │       │       │
       ▼       ▼       ▼
┌────────┐ ┌────────┐ ┌────────┐
│[35,38] │ │[42,45] │ │[52,55] │  All at minimum (2 keys)
└────────┘ └────────┘ └────────┘
              ↑
         Delete 42 → underflow, can't borrow
 
After: Merge with left sibling, pull down separator
┌─────────────────────┐
│ Parent:  [50]       │  ← 40 pulled down into merged node
└─────────────────────┘
         │       │
         ▼       ▼
┌─────────────────┐ ┌────────┐
│[35, 38, 40, 45] │ │[52,55] │  ← Merged node
└─────────────────┘ └────────┘

Root Shrinking (Tree Height Decrease):

If the root becomes empty after a merge (its only children merged into one), the merged node becomes the new root. This is the only way a B-tree decreases in height.

Deletion Complexity:

Best case: Simple leaf deletion, no underflow → O(log N)
Worst case: Merges propagate to root → O(log N) merges
Amortized: O(log N)

Lazy Deletion in Databases:

Many database systems don't immediately perform deletion rebalancing. Instead, they use lazy deletion:

Mark entries as deleted (tombstone)
Periodically reorganize via VACUUM/OPTIMIZE
Rebalance during natural operations

This approach reduces immediate deletion overhead at the cost of some space inefficiency and occasional expensive cleanup operations.

Why Deletion is Complex

Deletion complexity is why many systems avoid it or defer it. Some databases mark rows as deleted but leave index entries until vacuum. Others use version-based approaches (MVCC) where 'deletion' is just creating a new version marked as deleted. Understanding this helps explain index bloat and vacuum importance.

B+ Trees: The Database Variant

While we've been discussing B-trees, virtually all database systems actually use a variant called the B+ tree. The B+ tree modifies the basic B-tree structure in ways that significantly improve database performance.

Key Differences from B-trees:

B-tree vs B+ Tree Comparison
Characteristic	B-tree	B+ tree
Data location	Internal and leaf nodes	Leaf nodes only
Internal node contents	Keys + data + child pointers	Keys + child pointers only
Leaf node linking	Not linked	Doubly-linked list
Duplicate keys in internal nodes	No	Yes (separator copies)
Fanout for same node size	Lower (data takes space)	Higher (more keys per node)
Range query efficiency	Tree traversal needed	Sequential leaf scan

Why Databases Use B+ Trees:

1. Higher Fanout: Since internal nodes don't store data (only keys and pointers), they can fit more keys per node. Higher fanout means shorter trees, fewer disk reads, and faster lookups.

2. Efficient Range Scans: Leaf nodes are linked into a doubly-linked list. Range queries can:

Navigate to the starting key (O(log N))
Follow the leaf chain sequentially (O(K) for K results)

In a pure B-tree, range scans require tree traversal for each key, which is much less efficient.

3. Predictable Performance: All data accesses follow the same path length (root → internal → ... → leaf). In a B-tree, internal node hits might be faster than leaf hits, causing variable performance.

4. Easier Caching: With data only in leaves, internal nodes are purely structural. They can be cached more aggressively since they're smaller and used more frequently.

bplus-tree-structure.txt
B+ Tree Structure
 
                    ┌─────────────────────┐
                    │  ROOT (keys only)   │
                    │   [30]    [60]      │
                    └─────────────────────┘
                      /       |        \
                     /        |         \
        ┌───────────┐  ┌───────────┐  ┌───────────┐
        │ [10, 20]  │  │ [40, 50]  │  │ [70, 80]  │
        └───────────┘  └───────────┘  └───────────┘
             /  \          /  \          /  \
            /    \        /    \        /    \
         ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐
         │5,9 │→│12, │→│32, │→│45, │→│65, │→│75, │
         │    │←│18  │←│38  │←│55  │←│68  │←│85  │
         │data│ │data│ │data│ │data│ │data│ │data│
         └────┘ └────┘ └────┘ └────┘ └────┘ └────┘
              ↑
        LEAF NODES contain actual data/pointers
        LINKED for efficient range scans
        
Range Query: "WHERE key BETWEEN 15 AND 50"
1. Navigate to leaf containing 15 (via tree traversal)
2. Scan right through linked leaves: 18 → 32 → 38 → 45
3. Stop when key > 50

When You Hear 'B-tree' in Databases

When database documentation mentions 'B-tree indexes,' they almost always mean B+ trees. The term 'B-tree' is commonly used as a generic term for this family of self-balancing tree data structures. PostgreSQL, MySQL, SQLite, SQL Server—all use B+ tree variants for their primary index structures.

B-tree Performance Characteristics

Understanding B-tree performance characteristics helps you predict query costs and design effective indexes.

Operation Complexity Summary:

B-tree Operation Complexity
Operation	Disk I/O (Worst)	CPU (Worst)	Notes
Point lookup	O(log_m N)	O(m × log_m N)	Usually 2-4 I/O for billions of rows
Range scan	O(log_m N + K)	O(m × log_m N + K)	K = number of results
Insert	O(log_m N)	O(m × log_m N)	Amortized; splits are rare
Delete	O(log_m N)	O(m × log_m N)	Amortized; often lazy in databases
Min/Max	O(log_m N)	O(m × log_m N)	Navigate to leftmost/rightmost leaf
Full scan	O(N/B)	O(N)	B = records per leaf page

Factors Affecting Real-World Performance:

1. Cache Hit Rate:

Internal nodes are frequently accessed → high cache hit rate
Leaf nodes have variable cache hit rate based on access pattern
Hot data (frequently accessed) stays cached

2. Fill Factor:

Nodes split when 100% full, leaving two ~50% full nodes
Average fill is typically 70-80%
FILLFACTOR parameter lets you leave room for inserts

3. Key Size:

Larger keys = fewer keys per node = higher tree = more I/O
VARCHAR(255) indexed → fewer keys per page than INT
Consider key compression for text indexes

4. Index Fragmentation:

Random inserts cause page splits, leading to fragmentation
Fragmented indexes have worse sequential scan performance
REINDEX/REBUILD commands defragment

B-tree Strengths

•Excellent for range queries
•Supports ORDER BY efficiently
•Good for equality predicates
•Scales to billions of rows
•Self-balancing (always O(log N))
•Supports prefix queries (LIKE 'abc%')
•Efficient MIN/MAX queries

B-tree Limitations

•Not ideal for pure equality (hash is O(1))
•Cannot help with suffix patterns (LIKE '%abc')
•Large key sizes reduce fanout
•Write amplification (index maintenance)
•Space overhead (~10-30% per index)
•Fragmentation requires maintenance
•Not suited for high-cardinality partial matches

The 80/20 Rule of Indexing

B-tree indexes handle 80% of database indexing needs excellently. Understanding B-trees deeply—their structure, performance characteristics, and limitations—provides the foundation for effective index design. The remaining 20% of cases may benefit from specialized indexes (hash, GiST, full-text, etc.).

B-tree Variants and Extensions

The B-tree family includes several important variants optimized for specific scenarios.

B Tree:*

B* trees require nodes to be at least 2/3 full (instead of 1/2). They achieve this by redistributing keys among siblings before splitting:

Advantage: Higher space utilization (~80% vs ~70%)
Disadvantage: More complex insertion, potentially more I/O per insert
Usage: Some filesystems (HFS+)

B-link Tree:

B-link trees add right-sibling pointers to internal nodes (not just leaves). This enables lock-free concurrent access:

During modifying operations, only the affected node is locked
Concurrent readers can follow sibling links if a node splits mid-read
PostgreSQL uses a B-link tree variant for its B-tree indexes

Copy-on-Write B-trees (CoW-B-trees):

Instead of modifying nodes in place, CoW-B-trees create new versions of modified nodes:

Advantage: Snapshots are free (just keep old root); crash recovery is simpler
Disadvantage: More I/O per write; garbage collection needed
Usage: LMDB, modern filesystems (Btrfs, ZFS)

cow-btree-update.txt
Copy-on-Write B-tree Update
 
Original tree (version 1):
         ┌───────┐
    ┌────│ ROOT1 │────┐
    │    └───────┘    │
    ▼                 ▼
┌───────┐         ┌───────┐
│ NODE2 │         │ NODE3 │
└───────┘         └───────┘
    │                 │
    ▼                 ▼
┌───────┐         ┌───────┐
│ LEAF4 │         │ LEAF5 │  ← Modify this leaf
└───────┘         └───────┘
 
After modification (version 2):
Copy path from modified leaf to root
 
         ┌───────┐
    ┌────│ ROOT1'│────┐    ← New root (copy)
    │    └───────┘    │
    │                 ▼
    │            ┌───────┐
    │            │ NODE3'│    ← New copy
    │            └───────┘
    │                 │
    ▼                 ▼
┌───────┐         ┌───────┐
│ LEAF4 │         │ LEAF5'│  ← New copy with modification
└───────┘         └───────┘
    ▲ (shared with version 1)
 
Version 1 (ROOT1) still works!
Version 2 (ROOT1') has the update.
Only 3 nodes written, not entire tree.

Prefix-Compressed B-trees:

For string keys with common prefixes, prefix compression stores only the distinguishing suffix:

Keys: 'database', 'dataframe', 'datatype', 'dataviz'
Stored: 'database', '+frame', '+type', '+viz' (+ indicates prefix from previous)

This significantly reduces space for sorted string keys.

Write-Optimized B-trees (Bε-trees, LSM-trees):

Traditional B-trees require random I/O for each insert. Write-optimized variants batch writes:

Bε-tree: Buffer pending modifications in internal nodes
LSM-tree: Write to memory, merge to disk in sorted runs
Trade-off: Worse read performance for better write performance
Usage: RocksDB, LevelDB, Cassandra (LSM-trees)

These variants are covered in detail when we discuss NoSQL databases and write-heavy workloads.

Database Implementation Details

Each database implements B-trees differently. PostgreSQL uses B-link trees with MVCC. MySQL/InnoDB uses clustered B+ trees with page-level locking. Understanding your specific database's implementation helps with performance tuning and debugging.

Summary: Mastering B-tree Indexes

We've explored B-tree indexes from their foundational motivation through their practical implementation in database systems. Let's consolidate the essential insights:

Key Takeaways

•B-trees minimize disk I/O by maximizing fanout—hundreds of keys per node keeps tree height to 3-4 levels even for billions of rows.
•B+ trees (used by all major databases) store data only in leaf nodes, link leaves for range scans, and maximize internal node fanout.
•Search is O(log N) with typically 2-4 disk reads per lookup due to caching of upper tree levels.
•Insertion uses splitting when nodes overflow, propagating upward if needed—this is the only way the tree grows taller.
•Deletion uses borrowing/merging to maintain minimum fill—often done lazily in databases to reduce overhead.
•B-trees excel at range queries thanks to sorted leaf nodes and linked-list structure, supporting efficient ORDER BY and BETWEEN.
•Understanding B-tree internals enables reasoning about query performance, index sizing, and maintenance operations.

What's Next:

With B-tree mastery established, we'll explore an alternative index structure in the next page: Hash indexes. You'll understand when hash indexes outperform B-trees, their fundamental limitations, and why they're used sparingly despite their O(1) lookup promise.

B-tree Expertise Acquired

You now possess deep knowledge of the most important data structure in database technology. B-tree understanding is foundational to query optimization, index design, and performance debugging. This knowledge serves you across every relational database system.