B Tree Operations - Learning Module

Loading content...

0/241

Insert Algorithm

Building the Index: The Art of B-Tree Insertion

Every index entry you see in a database arrived there through B-tree insertion. When you execute INSERT INTO orders VALUES (...), the database must update every index on that table—each update invoking the B-tree insertion algorithm to place the new key in exactly the right location while preserving all structural invariants.

B-tree insertion is more complex than search because it must maintain balance. Unlike a naive binary search tree that can degenerate into a linked list with skewed insertions, a B-tree guarantees logarithmic height regardless of insertion order. This guarantee comes from a beautiful mechanism: node splitting.

What You Will Master

By completing this page, you will understand: (1) The complete B-tree insertion algorithm with formal correctness, (2) The mechanics of inserting into nodes at various fill levels, (3) When and how node splitting maintains balance, (4) Proactive vs. reactive splitting strategies, (5) How insertion affects tree height, and (6) Production implementation patterns including concurrent insertion handling.

Insertion Algorithm Overview

B-tree insertion follows a two-phase approach:

Search Phase: Navigate from root to the appropriate leaf node where the key should reside
Insert Phase: Add the key to the leaf, splitting nodes as necessary to maintain size invariants

The Core Invariants Insertion Must Preserve:

Ordering: Keys remain sorted within each node and across the tree
Balance: All leaves remain at the same depth
Occupancy: Each non-root node contains between ⌈t/2⌉-1 and t-1 keys (for minimum degree t)
Root Occupancy: The root contains between 1 and t-1 keys

The challenge is that inserting a key into a full node would violate the maximum occupancy invariant. The solution is splitting: dividing an overfull node into two nodes of acceptable size.

High-Level Algorithm:

BTREE-INSERT(tree, key, value):
    // Case 1: Root is full - split it first (tree grows taller)
    IF tree.root.numKeys = 2t - 1:
        newRoot ← AllocateNode()
        newRoot.isLeaf ← FALSE
        newRoot.children[0] ← tree.root
        SPLIT-CHILD(newRoot, 0)  // Split old root
        tree.root ← newRoot
    
    // Now root is guaranteed to have space
    INSERT-NONFULL(tree.root, key, value)

The elegantly simple structure handles the special case where the root is full before descending, ensuring we never need to propagate splits upward after completing insertion—a technique called proactive splitting or preemptive splitting.

Why Root Splitting is Special

The root is the only node that can have fewer than ⌈t/2⌉-1 keys (it can have just 1 key). When the root splits, the tree's height increases by 1. This is the only operation that increases tree height—splits at lower levels simply redistribute keys horizontally.

Inserting into Non-Full Nodes

The core insertion logic handles two cases: inserting into a leaf (base case) and recursively descending through internal nodes.

Algorithm: INSERT-NONFULL

INSERT-NONFULL(node, key, value):
    i ← node.numKeys - 1
    
    IF node.isLeaf:
        // Case A: Insert into leaf node
        // Shift keys right to make room
        WHILE i >= 0 AND key < node.keys[i]:
            node.keys[i+1] ← node.keys[i]
            node.values[i+1] ← node.values[i]
            i ← i - 1
        
        // Insert new key at position i+1
        node.keys[i+1] ← key
        node.values[i+1] ← value
        node.numKeys ← node.numKeys + 1
        DISK-WRITE(node)
    
    ELSE:
        // Case B: Internal node - find correct child
        WHILE i >= 0 AND key < node.keys[i]:
            i ← i - 1
        i ← i + 1  // i is now the index of child to descend into
        
        DISK-READ(node.children[i])
        
        // Proactive splitting: if child is full, split it now
        IF node.children[i].numKeys = 2t - 1:
            SPLIT-CHILD(node, i)
            // After split, determine which half to insert into
            IF key > node.keys[i]:
                i ← i + 1
        
        INSERT-NONFULL(node.children[i], key, value)

Let's trace through each component:

Algorithm Components

•Leaf Insertion (Case A): Standard sorted array insertion. Shift existing keys right to maintain order, then place the new key. Time: O(t) for the shift operation within one node.
•Child Selection (Case B): Binary search (or linear scan) to find which child subtree should receive the new key. Same logic as search.
•Proactive Split Check: Before descending, check if the target child is full. If so, split it now. This ensures we never encounter a full node during descent.
•Post-Split Adjustment: After splitting a child, the median key moves up. If our insert key is greater than this new median, we must adjust our descent to the right sibling.

The Proactive Splitting Insight

By splitting full nodes on the way down, we guarantee that when we reach the leaf, there's space to insert. This single-pass approach is more efficient than the alternative (insert, then propagate splits upward) because we never need to re-traverse the path or maintain a stack of ancestors.

Node Splitting Mechanics

Node splitting is the heart of B-tree balancing. When a node becomes full (has 2t-1 keys), we cannot insert another key without violating the size invariant. Splitting divides the node into two nodes, each with acceptable key counts.

The SPLIT-CHILD Algorithm:

SPLIT-CHILD(parent, i):
    // parent.children[i] is full (has 2t-1 keys)
    fullChild ← parent.children[i]
    newChild ← AllocateNode()
    newChild.isLeaf ← fullChild.isLeaf
    
    // New child gets the upper half of keys (t-1 keys)
    newChild.numKeys ← t - 1
    FOR j ← 0 TO t - 2:
        newChild.keys[j] ← fullChild.keys[t + j]
        newChild.values[j] ← fullChild.values[t + j]  // For leaves
    
    // If not a leaf, also copy child pointers
    IF NOT fullChild.isLeaf:
        FOR j ← 0 TO t - 1:
            newChild.children[j] ← fullChild.children[t + j]
    
    // Full child keeps the lower half (t-1 keys)
    fullChild.numKeys ← t - 1
    
    // Make room in parent for new child pointer
    FOR j ← parent.numKeys DOWNTO i + 1:
        parent.children[j + 1] ← parent.children[j]
    parent.children[i + 1] ← newChild
    
    // Make room in parent for median key
    FOR j ← parent.numKeys - 1 DOWNTO i:
        parent.keys[j + 1] ← parent.keys[j]
        parent.values[j + 1] ← parent.values[j]
    
    // Median key moves up to parent
    parent.keys[i] ← fullChild.keys[t - 1]  // The median
    parent.numKeys ← parent.numKeys + 1
    
    // Write all modified nodes to disk
    DISK-WRITE(fullChild)
    DISK-WRITE(newChild)
    DISK-WRITE(parent)

Visual Example: Splitting a Node with t=3

Before split (node with 5 keys is full):

Parent: [... | 30 | ...]
              |
        [10, 20, 25, 28, 29]  ← Full child

After split (median 25 moves up):

Parent: [... | 25 | 30 | ...]
              |    |
       [10, 20]  [28, 29]  ← Two half-full children

Key observations:

The full child keeps keys less than the median (indices 0 to t-2)
The new child gets keys greater than the median (indices t to 2t-2)
The median key (index t-1) moves to the parent
Both resulting nodes have exactly t-1 keys—the minimum allowed

Disk I/O During Split

A single split requires writing 3 nodes: the original child (now half-sized), the new sibling, and the parent (with new key and pointer). In the worst case of cascading splits up to the root, we write O(h) nodes where h is tree height. This is rare—most insertions don't cause splits at all.

Step-by-Step Insertion Examples

Let's trace through a series of insertions into a B-tree with minimum degree t=2 (each node holds 1-3 keys). Starting from an empty tree, we'll insert: 10, 20, 30, 40, 25, 15, 35.

Insert 10 (empty tree):

Create root as leaf: [10]

Insert 20:

Insert into root (has space): [10, 20]

Insert 30:

Insert into root (has space): [10, 20, 30]  ← Root now full

Insert 40: Root is full! Must split before inserting:

Step 1: Create new root
Step 2: Split old root at median (20)
Step 3: Tree structure:
           [20]
          /    \
      [10]    [30]

Step 4: Insert 40 into right leaf
           [20]
          /    \
      [10]    [30, 40]

Insert 25: Descend: 25 > 20, go right → [30, 40] has space

           [20]
          /    \
      [10]    [25, 30, 40]  ← Right leaf now full

Insert 15: Descend: 15 < 20, go left → [10] has space

           [20]
          /    \
   [10, 15]    [25, 30, 40]

Insert 35: Descend: 35 > 20, must go right. But wait—right child [25, 30, 40] is full!

Proactive split at [25, 30, 40]:
- Median 30 moves to root
- Left: [25], Right: [40]

Tree after split:
           [20, 30]
          /    |    \
   [10, 15]  [25]  [40]

Now insert 35: goes into right child (35 > 30)
           [20, 30]
          /    |    \
   [10, 15]  [25]  [35, 40]

Insertion Operation Summary
Operation	Split Required?	Nodes Modified	Height Change
Insert 10	No (new tree)	1 (new root)	0→1
Insert 20	No	1 (root)	Same
Insert 30	No	1 (root)	Same
Insert 40	Yes (root split)	3 (old root, new root, sibling)	1→2
Insert 25	No	1 (leaf)	Same
Insert 15	No	1 (leaf)	Same
Insert 35	Yes (leaf split)	3 (leaf, sibling, parent)	Same

Splits Are Relatively Rare

In this example, only 2 of 7 insertions required splits. As t increases (typical values: 50-200), nodes hold more keys before filling, making splits even rarer. A node with t=100 needs 199 insertions before splitting, and most will be distributed across many nodes.

B+-Tree Insertion Specifics

B+-tree insertion differs from B-tree insertion in several important ways due to the structural difference: data resides only in leaves, and internal nodes contain separator keys only.

Key Differences:

Insertion always targets a leaf: We always descend to a leaf node before inserting, regardless of whether we encounter the key as a separator in internal nodes.
Separator key handling during splits: When a leaf splits, we must copy the median key up to the parent (it remains in the leaf). When an internal node splits, we promote the median key (it moves, not copies).
Leaf linking: After a leaf split, we must update sibling pointers to maintain the linked list structure.

bplus-tree-split.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// B+-Tree leaf split: median is COPIED up
function splitLeafNode(parent: InternalNode, childIndex: number): void {
    const fullLeaf = parent.children[childIndex] as LeafNode;
    const newLeaf = createLeafNode();
    
    // Transfer upper half of keys AND values to new leaf
    const midpoint = Math.floor(fullLeaf.keys.length / 2);
    newLeaf.keys = fullLeaf.keys.splice(midpoint);
    newLeaf.values = fullLeaf.values.splice(midpoint);
    
    // COPY (not move) the first key of new leaf to parent
    // This key remains in the new leaf!
    const separatorKey = newLeaf.keys[0];
    insertIntoParent(parent, childIndex, separatorKey, newLeaf);
    
    // Maintain leaf linked list
    newLeaf.nextLeaf = fullLeaf.nextLeaf;
    fullLeaf.nextLeaf = newLeaf;
    if (newLeaf.nextLeaf) {
        newLeaf.nextLeaf.prevLeaf = newLeaf;
    }
    newLeaf.prevLeaf = fullLeaf;
}
 
// B+-Tree internal node split: median is PROMOTED up
function splitInternalNode(parent: InternalNode, childIndex: number): void {
    const fullNode = parent.children[childIndex] as InternalNode;
    const newNode = createInternalNode();
    
    // Find the median key to promote
    const midpoint = Math.floor(fullNode.keys.length / 2);
    const promotedKey = fullNode.keys[midpoint];  // This moves UP
    
    // Upper half of keys goes to new node (excluding the promoted key)
    newNode.keys = fullNode.keys.splice(midpoint + 1);
    newNode.children = fullNode.children.splice(midpoint + 1);
    
    // Remove the promoted key from fullNode
    fullNode.keys.pop();  // Remove the median
    
    // Insert promoted key and new child into parent
    insertIntoParent(parent, childIndex, promotedKey, newNode);
}

B-Tree Split (Copy-Up)

•Median key copied to parent
•Key remains in original (now left) node
•Both data and separators at all levels

B+-Tree Split

•Leaf split: copy separator up
•Internal split: promote separator up
•Data only in leaves; separators in internal nodes

Copy vs. Promote Matters

Getting copy vs. promote wrong leads to subtle bugs: if you promote (move) a key from a leaf, range queries will skip that key. If you copy a key from an internal node, you'll have duplicate separators causing incorrect routing. This is a common implementation error.

Time and Space Complexity

Understanding insertion complexity requires analyzing both the common case (no split) and worst case (cascading splits to root).

Time Complexity:

Search phase: O(t · log_t n) comparisons
- Traverse O(log_t n) levels
- At each level, O(t) comparisons (or O(log t) with binary search)
Insert phase (no split): O(t)
- Shift O(t) keys within the leaf to make room
Insert phase (with splits): O(t · log_t n) in worst case
- Each split affects O(t) keys
- At most O(log_t n) splits (one per level)

Amortized Analysis:

While a single insertion can trigger O(log_t n) splits, this is rare. Using aggregate analysis:

A split creates two nodes from one, each half full
Before another split can occur on these nodes, at least (t-1) insertions must target them
Total splits across n insertions: O(n/t)
Amortized cost per insertion: O(1) splits

Insertion Complexity Summary
Metric	Best Case	Worst Case	Amortized
Disk Reads	O(log_t n)	O(log_t n)	O(log_t n)
Disk Writes	O(1)	O(log_t n)	O(1)
CPU Comparisons	O(log_t n · log t)	O(log_t n · t)	O(log n)
Nodes Modified	1 (leaf only)	O(log_t n)	O(1)
New Nodes Created	0	O(log_t n)	O(1/t)

Space Complexity:

Auxiliary space during insertion: O(log_t n) for recursion stack
Space per new key: O(1) additional space in existing node
Space increase from splits: One new node per split, amortized O(1/t) per insertion

Disk I/O Analysis:

For a B-tree with t=100 containing 1 billion keys:

Height: ~5 levels
Non-split insertion: 5 reads + 1 write = 6 disk I/Os
Split at leaf: 5 reads + 3 writes = 8 disk I/Os
Cascade to root (extremely rare): 5 reads + 15 writes = 20 disk I/Os

With upper levels cached in memory, typical insertion requires 1-2 disk reads and 1-3 writes.

Write Amplification

Insertion can have 'write amplification'—one logical insert causing multiple physical writes. Understanding this is crucial for SSD-based systems where write cycles are limited. Some systems batch insertions or use write-ahead logging to reduce write amplification.

Proactive vs. Reactive Splitting Strategies

There are two fundamental approaches to handling full nodes during insertion:

Proactive (Preemptive) Splitting:

Split full nodes on the way down, before they're needed:

descend(node, key):
    FOR each child on path:
        IF child is full:
            split child now
        descend into appropriate child
    insert into leaf (guaranteed non-full)

Advantages:

Single pass—never backtrack
Simpler implementation—no need to track ancestors
Works well with latch coupling (can release parent latch immediately)

Disadvantages:

May split nodes unnecessarily (if actual insertion goes elsewhere)
Slightly lower average space utilization

Reactive Splitting:

Descend to leaf, insert, then propagate splits upward:

descend(node, key):
    path ← []  // Stack of ancestors
    WHILE not at leaf:
        path.push(node)
        descend to appropriate child
    
    insert into leaf
    IF leaf overflows:
        propagate splits up using path stack

Advantages:

Only splits when actually necessary
Better space utilization

Disadvantages:

Requires ancestor tracking (stack)
Must hold latches on all ancestors during propagation (concurrency issues)
Two-pass structure more complex

Splitting Strategy Comparison
Aspect	Proactive	Reactive
Passes	Single	One down, possibly one up
Ancestor Tracking	Not needed	Required (stack)
Latch Holding	Parent + child only	All ancestors until done
Unnecessary Splits	Possible	Never
Space Utilization	Slightly lower	Optimal
Implementation	Simpler	More complex
Concurrency	Better (shorter latch holding)	Worse (longer latch chains)

Industry Preference

Most production databases use proactive splitting due to its concurrency advantages. The slightly higher space usage is acceptable, and the simplicity reduces bug potential. PostgreSQL's B-tree implementation, for example, uses proactive splitting.

Concurrent Insertion Handling

Production databases handle hundreds or thousands of concurrent insertions. Without proper synchronization, concurrent modifications corrupt the tree. Several strategies exist:

1. Latch Crabbing (Latch Coupling):

Basic protocol for safe concurrent access:

void insertWithCrabbing(Key key, Value value) {
    root->writeLatch();  // Start with write latch on root
    Node* current = root;
    
    while (!current->isLeaf) {
        Node* child = current->getChild(key);
        child->writeLatch();
        
        // Safe release: if child won't split, release parent
        if (!child->willSplitOnInsert()) {
            current->writeUnlatch();
        }
        // If child might split, keep parent latched
        
        current = child;
    }
    
    // Now holding latches on path from last potentially-splitting node to leaf
    performInsertAndSplits();
    releaseAllLatches();
}

This protocol ensures that if a split propagates upward, we still hold the necessary latches.

2. Optimistic Latch Coupling:

Assume splits are rare—use read latches on the way down, then verify:

void optimisticInsert(Key key, Value value) {
    // Phase 1: Optimistic descent with read latches
    readLatchPath();
    Node* leaf = findLeaf(key);
    
    IF leaf->hasSpace():
        // Common case: upgrade leaf latch, insert, done
        leaf->upgradeToWriteLatch();
        insert(leaf, key, value);
        return;
    
    // Leaf is full - need pessimistic restart
    releaseAllLatches();
    pessimisticInsertWithCrabbing(key, value);
}

This dramatically reduces latch contention because most insertions don't cause splits.

3. B-Link Trees:

A structural modification that adds sibling pointers at all levels:

Each node has a pointer to its right sibling
Splits are performed right-to-left (new node on right)
Readers can follow sibling links if they land on a just-split node
Allows lock-free reads with minimal locking for writes

This is used in PostgreSQL and several other production systems.

Deadlock Prevention

B-tree latch protocols must prevent deadlock. The standard rule: always acquire latches in root-to-leaf order. Never hold a latch on a child and then try to acquire one on its parent. This total ordering prevents cycles.

Bulk Loading Optimization

Inserting millions of keys one at a time is inefficient—every insertion traverses the tree from root. Bulk loading builds a B-tree from sorted data in O(n) time with optimal space utilization.

Algorithm: Bottom-Up Bulk Loading

BULK-LOAD(sortedKeys, sortedValues):
    // Step 1: Create all leaf nodes, packed to just under full
    leaves ← []
    FOR each chunk of (2t-1) keys:
        leaf ← CreateLeafNode(chunk)
        leaves.append(leaf)
        LinkLeafToList(leaf)
    
    // Step 2: Build internal levels bottom-up
    currentLevel ← leaves
    WHILE len(currentLevel) > 1:
        parentLevel ← []
        FOR each chunk of 2t children:
            parent ← CreateInternalNode()
            parent.children ← chunk
            parent.keys ← [rightmost key of each child except last]
            parentLevel.append(parent)
        currentLevel ← parentLevel
    
    // Step 3: currentLevel[0] is the root
    RETURN currentLevel[0]

Advantages of Bulk Loading:

Time: O(n) vs O(n log n) for repeated insertions
I/O: Sequential writes vs random writes
Space: Optimal packing (nodes nearly full) vs ~69% average with random insertions
Parallelism: Easily parallelizable—different threads build different subtrees

Bulk Load vs. Repeated Insertion (1 million keys, t=100)
Metric	Repeated Insertion	Bulk Load
CPU Operations	~20 million	~1 million
Disk Writes	~5 million	~5,000
Final Tree Height	3-4	3
Space Utilization	~69%	~95%+
Time (SSD)	~10 minutes	~5 seconds

When to Use Bulk Loading

Use bulk loading for: initial index creation, rebuilding corrupted indexes, data warehouse loads, and any scenario with >10,000 sorted insertions. Most databases automatically detect sorted insert patterns and switch to bulk-load mode.

Summary: Mastering B-Tree Insertion

B-tree insertion combines elegant algorithms with practical engineering considerations. Let's consolidate the essential knowledge:

Key Takeaways

•Insertion guarantees balance — Through node splitting, B-trees maintain uniform leaf depth regardless of insertion order or key distribution.
•Proactive splitting simplifies implementation — Splitting full nodes on descent ensures leaves always have space, avoiding backtracking.
•Splits are rare in practice — With typical t values (50-200), nodes absorb many insertions between splits, making amortized cost nearly O(1) per operation.
•B+-trees require copy-up for leaves — The separator in internal nodes is a copy for leaf splits but a promotion for internal splits.
•Concurrent insertion requires careful latching — Latch crabbing and optimistic protocols balance correctness with performance.
•Bulk loading transforms insertion performance — For large sorted datasets, bottom-up construction is orders of magnitude faster than repeated single insertions.

Ready for Node Splitting Deep Dive

You now understand the B-tree insertion algorithm comprehensively. The next page examines node splitting in greater detail—exploring edge cases, optimization techniques, and the mathematical properties that guarantee balance.

Insert Algorithm

Building the Index: The Art of B-Tree Insertion

What You Will Master

Insertion Algorithm Overview

B-tree insertion follows a two-phase approach:

Search Phase: Navigate from root to the appropriate leaf node where the key should reside
Insert Phase: Add the key to the leaf, splitting nodes as necessary to maintain size invariants

The Core Invariants Insertion Must Preserve:

Ordering: Keys remain sorted within each node and across the tree
Balance: All leaves remain at the same depth
Occupancy: Each non-root node contains between ⌈t/2⌉-1 and t-1 keys (for minimum degree t)
Root Occupancy: The root contains between 1 and t-1 keys

The challenge is that inserting a key into a full node would violate the maximum occupancy invariant. The solution is splitting: dividing an overfull node into two nodes of acceptable size.

High-Level Algorithm:

BTREE-INSERT(tree, key, value):
    // Case 1: Root is full - split it first (tree grows taller)
    IF tree.root.numKeys = 2t - 1:
        newRoot ← AllocateNode()
        newRoot.isLeaf ← FALSE
        newRoot.children[0] ← tree.root
        SPLIT-CHILD(newRoot, 0)  // Split old root
        tree.root ← newRoot
    
    // Now root is guaranteed to have space
    INSERT-NONFULL(tree.root, key, value)

Why Root Splitting is Special

Inserting into Non-Full Nodes

The core insertion logic handles two cases: inserting into a leaf (base case) and recursively descending through internal nodes.

Algorithm: INSERT-NONFULL

INSERT-NONFULL(node, key, value):
    i ← node.numKeys - 1
    
    IF node.isLeaf:
        // Case A: Insert into leaf node
        // Shift keys right to make room
        WHILE i >= 0 AND key < node.keys[i]:
            node.keys[i+1] ← node.keys[i]
            node.values[i+1] ← node.values[i]
            i ← i - 1
        
        // Insert new key at position i+1
        node.keys[i+1] ← key
        node.values[i+1] ← value
        node.numKeys ← node.numKeys + 1
        DISK-WRITE(node)
    
    ELSE:
        // Case B: Internal node - find correct child
        WHILE i >= 0 AND key < node.keys[i]:
            i ← i - 1
        i ← i + 1  // i is now the index of child to descend into
        
        DISK-READ(node.children[i])
        
        // Proactive splitting: if child is full, split it now
        IF node.children[i].numKeys = 2t - 1:
            SPLIT-CHILD(node, i)
            // After split, determine which half to insert into
            IF key > node.keys[i]:
                i ← i + 1
        
        INSERT-NONFULL(node.children[i], key, value)

Let's trace through each component:

Algorithm Components

•Leaf Insertion (Case A): Standard sorted array insertion. Shift existing keys right to maintain order, then place the new key. Time: O(t) for the shift operation within one node.
•Child Selection (Case B): Binary search (or linear scan) to find which child subtree should receive the new key. Same logic as search.
•Proactive Split Check: Before descending, check if the target child is full. If so, split it now. This ensures we never encounter a full node during descent.
•Post-Split Adjustment: After splitting a child, the median key moves up. If our insert key is greater than this new median, we must adjust our descent to the right sibling.

The Proactive Splitting Insight

Node Splitting Mechanics

The SPLIT-CHILD Algorithm:

SPLIT-CHILD(parent, i):
    // parent.children[i] is full (has 2t-1 keys)
    fullChild ← parent.children[i]
    newChild ← AllocateNode()
    newChild.isLeaf ← fullChild.isLeaf
    
    // New child gets the upper half of keys (t-1 keys)
    newChild.numKeys ← t - 1
    FOR j ← 0 TO t - 2:
        newChild.keys[j] ← fullChild.keys[t + j]
        newChild.values[j] ← fullChild.values[t + j]  // For leaves
    
    // If not a leaf, also copy child pointers
    IF NOT fullChild.isLeaf:
        FOR j ← 0 TO t - 1:
            newChild.children[j] ← fullChild.children[t + j]
    
    // Full child keeps the lower half (t-1 keys)
    fullChild.numKeys ← t - 1
    
    // Make room in parent for new child pointer
    FOR j ← parent.numKeys DOWNTO i + 1:
        parent.children[j + 1] ← parent.children[j]
    parent.children[i + 1] ← newChild
    
    // Make room in parent for median key
    FOR j ← parent.numKeys - 1 DOWNTO i:
        parent.keys[j + 1] ← parent.keys[j]
        parent.values[j + 1] ← parent.values[j]
    
    // Median key moves up to parent
    parent.keys[i] ← fullChild.keys[t - 1]  // The median
    parent.numKeys ← parent.numKeys + 1
    
    // Write all modified nodes to disk
    DISK-WRITE(fullChild)
    DISK-WRITE(newChild)
    DISK-WRITE(parent)

Visual Example: Splitting a Node with t=3

Before split (node with 5 keys is full):

Parent: [... | 30 | ...]
              |
        [10, 20, 25, 28, 29]  ← Full child

After split (median 25 moves up):

Parent: [... | 25 | 30 | ...]
              |    |
       [10, 20]  [28, 29]  ← Two half-full children

Key observations:

The full child keeps keys less than the median (indices 0 to t-2)
The new child gets keys greater than the median (indices t to 2t-2)
The median key (index t-1) moves to the parent
Both resulting nodes have exactly t-1 keys—the minimum allowed

Disk I/O During Split

Step-by-Step Insertion Examples

Let's trace through a series of insertions into a B-tree with minimum degree t=2 (each node holds 1-3 keys). Starting from an empty tree, we'll insert: 10, 20, 30, 40, 25, 15, 35.

Insert 10 (empty tree):

Create root as leaf: [10]

Insert 20:

Insert into root (has space): [10, 20]

Insert 30:

Insert into root (has space): [10, 20, 30]  ← Root now full

Insert 40: Root is full! Must split before inserting:

Step 1: Create new root
Step 2: Split old root at median (20)
Step 3: Tree structure:
           [20]
          /    \
      [10]    [30]

Step 4: Insert 40 into right leaf
           [20]
          /    \
      [10]    [30, 40]

Insert 25: Descend: 25 > 20, go right → [30, 40] has space

           [20]
          /    \
      [10]    [25, 30, 40]  ← Right leaf now full

Insert 15: Descend: 15 < 20, go left → [10] has space

           [20]
          /    \
   [10, 15]    [25, 30, 40]

Insert 35: Descend: 35 > 20, must go right. But wait—right child [25, 30, 40] is full!

Proactive split at [25, 30, 40]:
- Median 30 moves to root
- Left: [25], Right: [40]

Tree after split:
           [20, 30]
          /    |    \
   [10, 15]  [25]  [40]

Now insert 35: goes into right child (35 > 30)
           [20, 30]
          /    |    \
   [10, 15]  [25]  [35, 40]

Insertion Operation Summary
Operation	Split Required?	Nodes Modified	Height Change
Insert 10	No (new tree)	1 (new root)	0→1
Insert 20	No	1 (root)	Same
Insert 30	No	1 (root)	Same
Insert 40	Yes (root split)	3 (old root, new root, sibling)	1→2
Insert 25	No	1 (leaf)	Same
Insert 15	No	1 (leaf)	Same
Insert 35	Yes (leaf split)	3 (leaf, sibling, parent)	Same

Splits Are Relatively Rare

B+-Tree Insertion Specifics

B+-tree insertion differs from B-tree insertion in several important ways due to the structural difference: data resides only in leaves, and internal nodes contain separator keys only.

Key Differences:

Insertion always targets a leaf: We always descend to a leaf node before inserting, regardless of whether we encounter the key as a separator in internal nodes.
Separator key handling during splits: When a leaf splits, we must copy the median key up to the parent (it remains in the leaf). When an internal node splits, we promote the median key (it moves, not copies).
Leaf linking: After a leaf split, we must update sibling pointers to maintain the linked list structure.

bplus-tree-split.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// B+-Tree leaf split: median is COPIED up
function splitLeafNode(parent: InternalNode, childIndex: number): void {
    const fullLeaf = parent.children[childIndex] as LeafNode;
    const newLeaf = createLeafNode();
    
    // Transfer upper half of keys AND values to new leaf
    const midpoint = Math.floor(fullLeaf.keys.length / 2);
    newLeaf.keys = fullLeaf.keys.splice(midpoint);
    newLeaf.values = fullLeaf.values.splice(midpoint);
    
    // COPY (not move) the first key of new leaf to parent
    // This key remains in the new leaf!
    const separatorKey = newLeaf.keys[0];
    insertIntoParent(parent, childIndex, separatorKey, newLeaf);
    
    // Maintain leaf linked list
    newLeaf.nextLeaf = fullLeaf.nextLeaf;
    fullLeaf.nextLeaf = newLeaf;
    if (newLeaf.nextLeaf) {
        newLeaf.nextLeaf.prevLeaf = newLeaf;
    }
    newLeaf.prevLeaf = fullLeaf;
}
 
// B+-Tree internal node split: median is PROMOTED up
function splitInternalNode(parent: InternalNode, childIndex: number): void {
    const fullNode = parent.children[childIndex] as InternalNode;
    const newNode = createInternalNode();
    
    // Find the median key to promote
    const midpoint = Math.floor(fullNode.keys.length / 2);
    const promotedKey = fullNode.keys[midpoint];  // This moves UP
    
    // Upper half of keys goes to new node (excluding the promoted key)
    newNode.keys = fullNode.keys.splice(midpoint + 1);
    newNode.children = fullNode.children.splice(midpoint + 1);
    
    // Remove the promoted key from fullNode
    fullNode.keys.pop();  // Remove the median
    
    // Insert promoted key and new child into parent
    insertIntoParent(parent, childIndex, promotedKey, newNode);
}

B-Tree Split (Copy-Up)

•Median key copied to parent
•Key remains in original (now left) node
•Both data and separators at all levels

B+-Tree Split

•Leaf split: copy separator up
•Internal split: promote separator up
•Data only in leaves; separators in internal nodes

Copy vs. Promote Matters

Time and Space Complexity

Understanding insertion complexity requires analyzing both the common case (no split) and worst case (cascading splits to root).

Time Complexity:

Search phase: O(t · log_t n) comparisons
- Traverse O(log_t n) levels
- At each level, O(t) comparisons (or O(log t) with binary search)
Insert phase (no split): O(t)
- Shift O(t) keys within the leaf to make room
Insert phase (with splits): O(t · log_t n) in worst case
- Each split affects O(t) keys
- At most O(log_t n) splits (one per level)

Amortized Analysis:

While a single insertion can trigger O(log_t n) splits, this is rare. Using aggregate analysis:

A split creates two nodes from one, each half full
Before another split can occur on these nodes, at least (t-1) insertions must target them
Total splits across n insertions: O(n/t)
Amortized cost per insertion: O(1) splits

Insertion Complexity Summary
Metric	Best Case	Worst Case	Amortized
Disk Reads	O(log_t n)	O(log_t n)	O(log_t n)
Disk Writes	O(1)	O(log_t n)	O(1)
CPU Comparisons	O(log_t n · log t)	O(log_t n · t)	O(log n)
Nodes Modified	1 (leaf only)	O(log_t n)	O(1)
New Nodes Created	0	O(log_t n)	O(1/t)

Space Complexity:

Auxiliary space during insertion: O(log_t n) for recursion stack
Space per new key: O(1) additional space in existing node
Space increase from splits: One new node per split, amortized O(1/t) per insertion

Disk I/O Analysis:

For a B-tree with t=100 containing 1 billion keys:

Height: ~5 levels
Non-split insertion: 5 reads + 1 write = 6 disk I/Os
Split at leaf: 5 reads + 3 writes = 8 disk I/Os
Cascade to root (extremely rare): 5 reads + 15 writes = 20 disk I/Os

With upper levels cached in memory, typical insertion requires 1-2 disk reads and 1-3 writes.

Write Amplification

Proactive vs. Reactive Splitting Strategies

There are two fundamental approaches to handling full nodes during insertion:

Proactive (Preemptive) Splitting:

Split full nodes on the way down, before they're needed:

descend(node, key):
    FOR each child on path:
        IF child is full:
            split child now
        descend into appropriate child
    insert into leaf (guaranteed non-full)

Advantages:

Single pass—never backtrack
Simpler implementation—no need to track ancestors
Works well with latch coupling (can release parent latch immediately)

Disadvantages:

May split nodes unnecessarily (if actual insertion goes elsewhere)
Slightly lower average space utilization

Reactive Splitting:

Descend to leaf, insert, then propagate splits upward:

descend(node, key):
    path ← []  // Stack of ancestors
    WHILE not at leaf:
        path.push(node)
        descend to appropriate child
    
    insert into leaf
    IF leaf overflows:
        propagate splits up using path stack

Advantages:

Only splits when actually necessary
Better space utilization

Disadvantages:

Requires ancestor tracking (stack)
Must hold latches on all ancestors during propagation (concurrency issues)
Two-pass structure more complex

Splitting Strategy Comparison
Aspect	Proactive	Reactive
Passes	Single	One down, possibly one up
Ancestor Tracking	Not needed	Required (stack)
Latch Holding	Parent + child only	All ancestors until done
Unnecessary Splits	Possible	Never
Space Utilization	Slightly lower	Optimal
Implementation	Simpler	More complex
Concurrency	Better (shorter latch holding)	Worse (longer latch chains)

Industry Preference

Concurrent Insertion Handling

Production databases handle hundreds or thousands of concurrent insertions. Without proper synchronization, concurrent modifications corrupt the tree. Several strategies exist:

1. Latch Crabbing (Latch Coupling):

Basic protocol for safe concurrent access:

void insertWithCrabbing(Key key, Value value) {
    root->writeLatch();  // Start with write latch on root
    Node* current = root;
    
    while (!current->isLeaf) {
        Node* child = current->getChild(key);
        child->writeLatch();
        
        // Safe release: if child won't split, release parent
        if (!child->willSplitOnInsert()) {
            current->writeUnlatch();
        }
        // If child might split, keep parent latched
        
        current = child;
    }
    
    // Now holding latches on path from last potentially-splitting node to leaf
    performInsertAndSplits();
    releaseAllLatches();
}

This protocol ensures that if a split propagates upward, we still hold the necessary latches.

2. Optimistic Latch Coupling:

Assume splits are rare—use read latches on the way down, then verify:

void optimisticInsert(Key key, Value value) {
    // Phase 1: Optimistic descent with read latches
    readLatchPath();
    Node* leaf = findLeaf(key);
    
    IF leaf->hasSpace():
        // Common case: upgrade leaf latch, insert, done
        leaf->upgradeToWriteLatch();
        insert(leaf, key, value);
        return;
    
    // Leaf is full - need pessimistic restart
    releaseAllLatches();
    pessimisticInsertWithCrabbing(key, value);
}

This dramatically reduces latch contention because most insertions don't cause splits.

3. B-Link Trees:

A structural modification that adds sibling pointers at all levels:

Each node has a pointer to its right sibling
Splits are performed right-to-left (new node on right)
Readers can follow sibling links if they land on a just-split node
Allows lock-free reads with minimal locking for writes

This is used in PostgreSQL and several other production systems.

Deadlock Prevention

Bulk Loading Optimization

Inserting millions of keys one at a time is inefficient—every insertion traverses the tree from root. Bulk loading builds a B-tree from sorted data in O(n) time with optimal space utilization.

Algorithm: Bottom-Up Bulk Loading

BULK-LOAD(sortedKeys, sortedValues):
    // Step 1: Create all leaf nodes, packed to just under full
    leaves ← []
    FOR each chunk of (2t-1) keys:
        leaf ← CreateLeafNode(chunk)
        leaves.append(leaf)
        LinkLeafToList(leaf)
    
    // Step 2: Build internal levels bottom-up
    currentLevel ← leaves
    WHILE len(currentLevel) > 1:
        parentLevel ← []
        FOR each chunk of 2t children:
            parent ← CreateInternalNode()
            parent.children ← chunk
            parent.keys ← [rightmost key of each child except last]
            parentLevel.append(parent)
        currentLevel ← parentLevel
    
    // Step 3: currentLevel[0] is the root
    RETURN currentLevel[0]

Advantages of Bulk Loading:

Time: O(n) vs O(n log n) for repeated insertions
I/O: Sequential writes vs random writes
Space: Optimal packing (nodes nearly full) vs ~69% average with random insertions
Parallelism: Easily parallelizable—different threads build different subtrees

Bulk Load vs. Repeated Insertion (1 million keys, t=100)
Metric	Repeated Insertion	Bulk Load
CPU Operations	~20 million	~1 million
Disk Writes	~5 million	~5,000
Final Tree Height	3-4	3
Space Utilization	~69%	~95%+
Time (SSD)	~10 minutes	~5 seconds

When to Use Bulk Loading

Summary: Mastering B-Tree Insertion

B-tree insertion combines elegant algorithms with practical engineering considerations. Let's consolidate the essential knowledge:

Key Takeaways

•Insertion guarantees balance — Through node splitting, B-trees maintain uniform leaf depth regardless of insertion order or key distribution.
•Proactive splitting simplifies implementation — Splitting full nodes on descent ensures leaves always have space, avoiding backtracking.
•Splits are rare in practice — With typical t values (50-200), nodes absorb many insertions between splits, making amortized cost nearly O(1) per operation.
•B+-trees require copy-up for leaves — The separator in internal nodes is a copy for leaf splits but a promotion for internal splits.
•Concurrent insertion requires careful latching — Latch crabbing and optimistic protocols balance correctness with performance.
•Bulk loading transforms insertion performance — For large sorted datasets, bottom-up construction is orders of magnitude faster than repeated single insertions.

Ready for Node Splitting Deep Dive