Database Management SystemsB-Tree Operations

B-Tree Operations: Search, Insert, and Delete

LevelIntermediate

Duration90 mins

TopicB-Tree Operations

3 / 5

Node Splitting

The Balancing Act: Node Splitting in Depth

Node splitting is the defining operation that gives B-trees their self-balancing property. Unlike naive binary search trees that can degenerate into linked lists with skewed insertions, B-trees distribute keys across the tree through splitting, guaranteeing logarithmic height regardless of insertion patterns.

While we introduced splitting conceptually in the insertion algorithm, this page explores the operation in exhaustive detail—examining the mechanics, edge cases, optimizations, and failure modes that production implementations must handle.

What You Will Master

By completing this page, you will understand: (1) The complete splitting algorithm with all edge cases, (2) How splitting propagates through the tree, (3) Differences between leaf and internal node splits, (4) Space efficiency and alternative splitting strategies, (5) Atomicity requirements and crash recovery, and (6) Advanced optimizations like deferred and batched splitting.

Splitting Mechanics in Detail

When a node exceeds its maximum capacity of 2t-1 keys (for minimum degree t), it must be split into two nodes, each containing an acceptable number of keys. Let's dissect this operation component by component.

Pre-conditions for splitting:

The node has exactly 2t-1 keys (full state)
The parent node exists and has fewer than 2t-1 keys (guaranteed by proactive splitting)
Or, if splitting the root, there is no parent yet

Post-conditions after splitting:

Original node: contains t-1 keys (indices 0 to t-2)
New sibling: contains t-1 keys (indices t to 2t-2)
Parent: gains 1 new key (the median) and 1 new child pointer
Ordering invariant: preserved

The Mathematics of Even Division:

With 2t-1 keys and a split:

Keys to stay in left node: t-1 (indices 0 through t-2)
Key to promote: 1 (index t-1, the median)
Keys to move to right node: t-1 (indices t through 2t-2)

Verification: (t-1) + 1 + (t-1) = 2t-1 ✓

This perfect division ensures both resulting nodes satisfy the minimum occupancy requirement of t-1 keys.

For child pointers (internal nodes): With 2t children (indices 0 through 2t-1) before split:

Children to stay: t (indices 0 through t-1)
Children to move: t (indices t through 2t-1)

Verification: t + t = 2t ✓

Each node retains one more child pointer than keys, maintaining the key-pointer relationship.

Why Minimum Occupancy Matters

The minimum occupancy of t-1 keys (50% full) ensures that after splitting, both nodes can accept additional insertions before needing to split again. Without this minimum, repeated splits could create nearly-empty nodes, wasting space and increasing tree height.

The Complete Split Algorithm

Here is the complete, production-quality split algorithm with all necessary operations:

SPLIT-CHILD(parent, childIndex):
    Input: parent     - the parent node of the node being split
           childIndex - index of the child to split in parent.children[]
    
    // Step 1: Get the full child and allocate new sibling
    fullChild ← parent.children[childIndex]
    newSibling ← AllocateNode()
    newSibling.isLeaf ← fullChild.isLeaf
    newSibling.numKeys ← t - 1
    
    // Step 2: Copy upper half of keys to new sibling
    FOR j ← 0 TO t - 2:
        newSibling.keys[j] ← fullChild.keys[t + j]
        // For B+-tree leaves, also copy values
        IF fullChild.isLeaf:
            newSibling.values[j] ← fullChild.values[t + j]
    
    // Step 3: For internal nodes, copy child pointers
    IF NOT fullChild.isLeaf:
        FOR j ← 0 TO t - 1:
            newSibling.children[j] ← fullChild.children[t + j]
    
    // Step 4: Truncate the full child
    fullChild.numKeys ← t - 1
    
    // Step 5: Make room in parent for new child pointer
    FOR j ← parent.numKeys DOWNTO childIndex + 1:
        parent.children[j + 1] ← parent.children[j]
    parent.children[childIndex + 1] ← newSibling
    
    // Step 6: Make room in parent for median key
    FOR j ← parent.numKeys - 1 DOWNTO childIndex:
        parent.keys[j + 1] ← parent.keys[j]
    
    // Step 7: Moving the median key to parent
    parent.keys[childIndex] ← fullChild.keys[t - 1]
    parent.numKeys ← parent.numKeys + 1
    
    // Step 8: For B+-tree leaves, maintain sibling pointers
    IF fullChild.isLeaf:
        newSibling.nextLeaf ← fullChild.nextLeaf
        IF fullChild.nextLeaf ≠ NULL:
            fullChild.nextLeaf.prevLeaf ← newSibling
        fullChild.nextLeaf ← newSibling
        newSibling.prevLeaf ← fullChild
    
    // Step 9: Write all modified nodes to disk
    DISK-WRITE(fullChild)
    DISK-WRITE(newSibling)
    DISK-WRITE(parent)

Algorithm Step Analysis

•Steps 1-3: Data Transfer — Create new node and copy upper half of keys (and values/children as applicable). Time: O(t) for key copies.
•Step 4: Left Node Truncation — Simply adjust the key count; actual key data remains but is now ignored. This is faster than clearing individual entries.
•Steps 5-6: Parent Adjustment — Shift existing keys and children right to make room. Time: O(t) for shifts.
•Step 7: Median Promotion — The median key becomes a separator in the parent. Note: For B+-tree leaves, this is a copy, not a move.
•Step 8: Leaf Pointer Maintenance — Insert the new sibling into the doubly-linked list of leaves. Critical for range query support.
•Step 9: Persistence — Write all three modified nodes atomically (or with proper logging for crash safety).

The Atomicity Requirement

If a crash occurs after writing one or two of the three nodes, the tree is inconsistent. Production systems use write-ahead logging (WAL) to ensure atomicity: log the split operation before modifying nodes, replay from the log if a crash occurs mid-split.

The Special Case: Root Splitting

Root splitting is unique because it's the only operation that increases the tree's height. When the root is full and needs to split, we create a new root above it.

Algorithm: SPLIT-ROOT

SPLIT-ROOT(tree):
    Input: tree - the B-tree with a full root
    
    // Step 1: Allocate new root
    oldRoot ← tree.root
    newRoot ← AllocateNode()
    newRoot.isLeaf ← FALSE
    newRoot.numKeys ← 0
    
    // Step 2: Old root becomes first child of new root
    newRoot.children[0] ← oldRoot
    
    // Step 3: Split the old root (now a child of new root)
    SPLIT-CHILD(newRoot, 0)
    
    // Step 4: Update tree's root reference
    tree.root ← newRoot
    DISK-WRITE(newRoot)

Before root split:

  [10 | 20 | 30 | 40 | 50]  ← Full root (5 keys, t=3)

After root split:

        [30]  ← New root (height increased by 1)
       /    \
   [10|20]  [40|50]  ← Old root split into two children

Key Insights about Root Splitting:

Height Growth is Rare: Tree height only increases through root splits. For a tree with t=100 and starting from 1 key, you need approximately 100, 10,000, 1,000,000, 100,000,000 keys to reach heights 1, 2, 3, 4 respectively. Root splits happen infrequently at scale.
Root Exceptions: The root is the only node allowed to have fewer than t-1 keys. After a root split, the new root has exactly 1 key.
Height Uniformity: All paths from root to leaves have the same length. Root splitting maintains this because it adds exactly one level above the entire previous tree.
Logarithmic Guarantee: Because height increases by 1 only when the number of keys roughly multiplies by t, height remains O(log_t n).

Tree Height vs. Key Capacity (t=100)
Height	Minimum Keys	Maximum Keys	Example Scale
1	1	199	Tens to hundreds
2	~200	~40,000	Thousands
3	~20,000	~8,000,000	Millions
4	~2,000,000	~1.6 billion	Industrial databases
5	~200,000,000	~320 billion	Massive data warehouses

Root Always in Memory

Database systems always cache the root node in memory. For typical trees (height ≤ 5), keeping the top 2-3 levels cached eliminates most disk reads. A root split doesn't significantly impact this caching strategy—it just adds one more always-cached node.

Splitting Propagation (Cascading Splits)

When a node splits, its median key moves to the parent. If the parent was already nearly full, it might exceed capacity and need to split as well. This is cascading or propagating splits.

Example of Cascading Splits (t=2, max 3 keys per node):

Initial state:

             [30 | 50 | 70]  ← Internal node (full)
            /    |    |    \
     [10|20] [35|40|45] [55|60] [75|80|85]
                ↑ Full leaf - inserting 42 will cause cascade

Step 1: Insert 42 causes leaf split

             [30 | 50 | 70]  ← Still full, needs split
            /    |    |    \
     [10|20] [35|40] [45|50] [55|60] [75|80|85]
               ↑       ↑
          Left keeps  Right takes upper half
          40 goes to parent

But wait—40 needs to go to parent, which is full!

Step 2: Parent (internal node) splits

                   [50]  ← New separator at this level
                  /    \
          [30|40]      [70]
         /  |  \        |  \
    [10|20] [35] [42|45] [55|60] [75|80|85]

Now the parent of [50] (the root) needs to receive this median. If the root was full too, it would split and grow the tree height.

Proactive Splitting Prevents Cascades:

With proactive splitting, we split any full node encountered during descent, before we need it:

Insert 42:
1. Root [30|50|70] is full → Split it first
   Result: new root [50], children [30] and [70]
   
2. Descend: 42 > 30, 42 < 50 → go to left subtree
   [30|40|45] is the child (after leaf split from previous ops)
   This node has space → proceed
   
3. At leaf [35|40|45]: full → Split during descent
   Result: [35|40] and [45], median 40 goes to parent [30|40]
   
4. Now insert 42 into [45|48...] → space available, done

Proactive splitting ensures that when we need to insert a median into a parent, the parent always has room—because we already split it if it was full.

Maximum Cascade Length

In the worst case (without proactive splitting), a single insertion can trigger splits at every level from leaf to root—O(log_t n) splits. With proactive splitting, splits still occur but are performed during descent, not as a separate upward phase. The total work is the same; the execution pattern differs.

Leaf Node vs. Internal Node Splits

While the splitting principle is the same, leaf and internal node splits differ in important implementation details, especially for B+-trees.

Leaf Node Splits:

Contain actual data (key-value pairs)
For B+-trees: median key is copied up (remains in leaf for data access)
Sibling pointers must be updated to maintain linked list
May trigger prefetching of adjacent leaves for range queries

Internal Node Splits:

Contain only keys and child pointers (separators)
Median key is moved up (does not remain in the split node)
No sibling linking required (B-link trees are an exception)
May affect routing decisions for concurrent operations

split-comparison.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// B+-Tree Leaf Split: Median is COPIED
function splitLeafNode(leaf: LeafNode): { newLeaf: LeafNode; separatorKey: Key } {
    const midpoint = Math.floor(leaf.keys.length / 2);
    const newLeaf = new LeafNode();
    
    // Upper half goes to new leaf (keys AND values)
    newLeaf.keys = leaf.keys.splice(midpoint);
    newLeaf.values = leaf.values.splice(midpoint);
    
    // The separator is the FIRST key of the new (right) leaf
    // It stays in the leaf—we're only COPYING it up
    const separatorKey = newLeaf.keys[0];  // Not removed!
    
    // Maintain sibling pointers
    newLeaf.nextLeaf = leaf.nextLeaf;
    leaf.nextLeaf = newLeaf;
    
    return { newLeaf, separatorKey };
}
 
// B+-Tree Internal Node Split: Median is PROMOTED (moved)
function splitInternalNode(node: InternalNode): { newNode: InternalNode; promotedKey: Key } {
    const midpoint = Math.floor(node.keys.length / 2);
    const newNode = new InternalNode();
    
    // Key at midpoint moves UP—it's the promoted separator
    const promotedKey = node.keys[midpoint];
    
    // Keys AFTER midpoint go to new node
    newNode.keys = node.keys.splice(midpoint + 1);  // +1: skip the promoted key
    newNode.children = node.children.splice(midpoint + 1);
    
    // Remove the promoted key from original node
    node.keys.pop();  // The midpoint key was already spliced, just cleanup
    
    return { newNode, promotedKey };
}

B+-Tree Split Differences
Aspect	Leaf Split	Internal Split
Median key handling	Copy to parent	Move to parent
Key remains in node?	Yes (for data access)	No (becomes pure separator)
Values/data	Split with keys	N/A (no values)
Child pointers	N/A (leaves have no children)	Split with internal keys
Sibling pointers	Must update linked list	Not applicable (usually)
Resulting left node key count	⌊n/2⌋	⌊n/2⌋ - 1

Common Implementation Bug

Treating leaf and internal splits identically causes bugs. If you 'move' (instead of 'copy') the median from a leaf, that key's data becomes inaccessible—searches for it will fail. If you 'copy' from internal nodes, you get duplicate separators causing routing confusion.

Space Efficiency and Fill Factor

Node splitting directly impacts space utilization. Let's analyze how full nodes typically are and techniques to optimize space usage.

Theoretical Analysis:

After a split, both nodes are exactly 50% full (t-1 keys out of 2t-1 capacity)
As insertions continue, nodes fill from 50% toward 100%
Average fill factor with random insertions: approximately ln(2) ≈ 69%

This 69% figure comes from a classic analysis: if we model node contents as a random variable between t-1 and 2t-1 with uniform splitting, the expected value converges to about 69% of capacity.

Space Utilization Formula:

$$\text{Average fill} = \frac{\text{Average keys per node}}{\text{Maximum keys per node}} = \frac{\ln(2) \cdot (2t-1)}{2t-1} \approx 69%$$

Techniques to Improve Fill Factor:

Delayed Splitting (Redistribution First): When a node is full, try to redistribute keys with a sibling before splitting.
```
IF sibling has room:
    Transfer some keys from full node to sibling
    Update parent separator
    No split needed
ELSE:
    Proceed with split
```
This can push fill factor to ~75-80%.
B-Trees (2/3 Full Minimum):*
- Before splitting, redistribute keys with both siblings
- Split three full nodes into four 2/3-full nodes
- Minimum occupancy: 2t/3 instead of t/2
- Fill factor: approximately 81%
Bulk Loading:
- Pack nodes to near-100% capacity initially
- Subsequent insertions may cause splits, but overall density remains high
Index Rebuilding:
- Periodically rebuild indexes to restore optimal packing
- Common in data warehouse batch processing

Fill Factor by Technique
Technique	Average Fill Factor	Implementation Complexity
Standard B-tree splits	~69%	Low
Redistribution before split	~75-80%	Medium
B*-tree (2/3 splitting)	~81%	High
Bulk loading	~95-100%	Low (initial build only)
Periodic rebuild	~95-100%	Low (background process)

Fill Factor Trade-offs

Higher fill factors save space but increase split frequency for write-heavy workloads. Read-heavy workloads benefit from higher fill (fewer nodes to traverse). Write-heavy workloads may prefer lower fill to reduce split overhead. Some databases let you configure target fill factor.

Atomicity and Crash Recovery

A split modifies three nodes: the original (now left), the new sibling (right), and the parent. If a system crash occurs mid-operation, the tree could be left in an inconsistent state. Production systems employ several strategies to ensure atomicity.

Failure Scenarios:

Crash after writing left node only:
- Left node truncated but no right sibling exists
- Keys in upper half are lost
- Tree internally consistent but data lost
Crash after writing left and right nodes:
- Both child nodes exist but parent doesn't reference right sibling
- Right sibling is orphaned (unreachable)
- Tree internally consistent but right sibling is a space leak
Crash after writing all but updating tree metadata:
- All nodes written but root pointer unchanged (for root split)
- New root exists but isn't reachable

Solution 1: Write-Ahead Logging (WAL)

The standard approach in databases:

SPLIT-WITH-WAL(parent, childIndex):
    // Step 1: Log the intent
    logRecord ← CreateLogRecord(
        type: SPLIT,
        parentPageId: parent.pageId,
        childIndex: childIndex,
        originalKeys: fullChild.keys,
        timestamp: now()
    )
    WRITE-TO-WAL(logRecord)
    FLUSH-WAL()  // Ensure log is on stable storage
    
    // Step 2: Perform the actual split
    SPLIT-CHILD(parent, childIndex)
    
    // Step 3: Commit
    commitRecord ← CreateLogRecord(type: COMMIT, splitLogId: logRecord.id)
    WRITE-TO-WAL(commitRecord)

Recovery process:

Scan log for uncommitted split operations
For each uncommitted split: either redo completely or undo completely based on state
Guarantee: tree is always in a valid state after recovery

Solution 2: Shadow Paging

Create new versions of modified pages without overwriting originals:

SPLIT-WITH-SHADOW-PAGES(parent, childIndex):
    // Allocate new pages for all modifications
    newParent ← CopyPage(parent)
    newLeft ← CopyPage(fullChild)  // Will be truncated
    newRight ← AllocatePage()       // New sibling
    
    // Perform split on new copies
    PerformSplit(newParent, newLeft, newRight)
    
    // Atomic update: change one pointer to swap trees
    parentOfParent.children[idx] ← newParent  // Single atomic write
    
    // Mark old pages as free (after ensuring durability)
    DeferredFree(parent, fullChild)

Shadow paging avoids in-place updates but can fragment storage.

Solution 3: B-Link Tree Approach

B-link trees add high-key markers and sibling pointers that make splits visible even if the parent hasn't been updated:

Each node stores its "high key" (maximum key that belongs in this subtree)
Each node has a pointer to its right sibling
A search that encounters a key > high key follows the sibling pointer
Splits can be non-atomic because readers can "fix up" their path

Industry Practice

Most production databases (PostgreSQL, MySQL, SQLite) use write-ahead logging for B-tree modifications. The ARIES recovery algorithm provides a well-understood framework for handling splits, merges, and all other tree modifications with full crash safety.

Advanced Splitting Optimizations

Production systems employ various optimizations to reduce the overhead of splitting operations.

1. Asymmetric Splits:

Instead of always splitting 50/50, adapt to access patterns:

// For sequential inserts (e.g., auto-increment IDs)
IF insertPattern == ASCENDING:
    // Keep most keys in left (already searched), move few to right
    splitPoint ← 0.9 * numKeys  // 90/10 split
ELSE IF insertPattern == DESCENDING:
    splitPoint ← 0.1 * numKeys  // 10/90 split
ELSE:
    splitPoint ← 0.5 * numKeys  // Standard 50/50

Benefit: For sequential inserts, asymmetric splits achieve ~90%+ fill factor and reduce split frequency.

2. Suffix Truncation:

For separator keys in internal nodes, store only enough bytes to distinguish subtrees:

Left leaf:  [..., "postgresql"]
Right leaf: ["python", ...]
Full separator: "python" (6 bytes)
Truncated separator: "py" (2 bytes) - sufficient to distinguish

Reduces internal node size, increasing fanout and reducing tree height.

3. Prefix Compression:

Store common prefixes once per node:

Uncompressed keys: ["company_sales_2024", "company_sales_2025", "company_sales_2026"]
Prefix: "company_sales_20"
Compressed: ["24", "25", "26"]

Saves significant space for keys with shared prefixes (common in database tables).

4. Batched Split Decisions:

Rather than splitting immediately when full, buffer insertions and batch splits:

IF node.numKeys >= MAX_KEYS:
    pendingSplits.add(node)
    
IF pendingSplits.size >= BATCH_THRESHOLD OR timeSinceLast >= BATCH_INTERVAL:
    FOR each node in pendingSplits:
        performSplit(node)
    writeAllDirtyPages()  // Single flush for multiple splits

Reduces I/O amplification by combining multiple log writes.

5. Copy-on-Write (COW) Splits:

Never modify pages in place—create new copies:

Enables lock-free reads (readers see consistent old version)
Simplifies concurrent access
Used by LMDB, Couchbase, and some modern databases

Monitoring Split Performance

Track split operations in production: frequency, latency, and patterns. High split rates may indicate suboptimal page size, sequential insert patterns that could benefit from asymmetric splits, or the need for bulk loading instead of individual inserts.

Debugging Split-Related Issues

Split bugs are among the most difficult B-tree issues to diagnose because they can cause subtle, delayed symptoms. Here's a systematic debugging approach:

Common Symptoms of Split Bugs:

Symptom	Likely Cause
Keys disappearing	Median not copied up (B+-tree leaf split bug)
Duplicate results	Median copied instead of moved (internal split)
Tree height decreasing	Split allocating wrong node type (leaf vs internal)
Crash after restart	Incomplete WAL recovery for split operations
Performance degradation	Orphaned nodes from incomplete splits
Infinite loops in range scans	Sibling pointer corruption during split

split-verification.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
interface SplitVerificationResult {
    valid: boolean;
    errors: string[];
}
 
function verifySplit(
    parent: BTreeNode,
    leftChild: BTreeNode,
    rightChild: BTreeNode,
    separatorIndex: number,
    t: number
): SplitVerificationResult {
    const errors: string[] = [];
    
    // Verify left child key count
    if (leftChild.keys.length !== t - 1) {
        errors.push(`Left child has ${leftChild.keys.length} keys, expected ${t - 1}`);
    }
    
    // Verify right child key count
    if (rightChild.keys.length !== t - 1) {
        errors.push(`Right child has ${rightChild.keys.length} keys, expected ${t - 1}`);
    }
    
    // Verify separator is between left's max and right's min
    const separator = parent.keys[separatorIndex];
    const leftMax = leftChild.keys[leftChild.keys.length - 1];
    const rightMin = rightChild.keys[0];
    
    if (separator < leftMax) {
        errors.push(`Separator ${separator} is less than left max ${leftMax}`);
    }
    if (separator > rightMin && !rightChild.isLeaf) {
        // For internal nodes, separator should be < right's min
        errors.push(`Separator ${separator} is greater than right min ${rightMin} in internal split`);
    }
    if (separator !== rightMin && rightChild.isLeaf) {
        // For B+-tree leaves, separator should equal right's min
        errors.push(`Separator ${separator} doesn't match right min ${rightMin} in leaf split`);
    }
    
    // Verify parent has correct number of children
    if (parent.children.length !== parent.keys.length + 1) {
        errors.push(`Parent has ${parent.children.length} children for ${parent.keys.length} keys`);
    }
    
    // Verify child pointers are correct
    if (parent.children[separatorIndex] !== leftChild) {
        errors.push(`Parent's child[${separatorIndex}] doesn't point to left child`);
    }
    if (parent.children[separatorIndex + 1] !== rightChild) {
        errors.push(`Parent's child[${separatorIndex + 1}] doesn't point to right child`);
    }
    
    return {
        valid: errors.length === 0,
        errors
    };
}

Testing Split Edge Cases

Thoroughly test: (1) Splitting the root when it's the only node, (2) Cascading splits to the root, (3) Splitting with minimum t (t=2), (4) Concurrent splits on sibling nodes, (5) Crash recovery at each step of the split process. These edge cases reveal most split bugs.

Summary: Mastering Node Splitting

Node splitting is the elegant mechanism that keeps B-trees balanced regardless of insertion order. Let's consolidate the essential knowledge:

Key Takeaways

•Splitting divides overflowing nodes evenly — A full node of 2t-1 keys becomes two nodes of t-1 keys each, with the median promoted to the parent.
•Root splitting is the only height-increasing operation — When the root splits, a new root is created above it, increasing tree height by exactly one.
•Leaf vs. internal splits differ in median handling — B+-tree leaf splits copy the median up (key remains for data access); internal splits move it (becomes pure separator).
•Proactive splitting ensures single-pass insertion — By splitting full nodes during descent, we guarantee space at the target leaf without backtracking.
•Crash safety requires atomicity mechanisms — WAL, shadow paging, or B-link structures ensure splits complete fully or not at all.
•Optimizations address real workload patterns — Asymmetric splits, suffix truncation, and batching improve performance for specific scenarios.

Ready for Deletion

You now understand node splitting at a depth suitable for implementation and debugging. The next page explores the delete algorithm—where the inverse of splitting, node merging, maintains balance when keys are removed.

3 / 5

Loading learning content...

Database Management SystemsB-Tree Operations

B-Tree Operations: Search, Insert, and Delete

LevelIntermediate

Duration90 mins

TopicB-Tree Operations

3 / 5

Node Splitting

The Balancing Act: Node Splitting in Depth

What You Will Master

Splitting Mechanics in Detail

Pre-conditions for splitting:

The node has exactly 2t-1 keys (full state)
The parent node exists and has fewer than 2t-1 keys (guaranteed by proactive splitting)
Or, if splitting the root, there is no parent yet

Post-conditions after splitting:

Original node: contains t-1 keys (indices 0 to t-2)
New sibling: contains t-1 keys (indices t to 2t-2)
Parent: gains 1 new key (the median) and 1 new child pointer
Ordering invariant: preserved

The Mathematics of Even Division:

With 2t-1 keys and a split:

Keys to stay in left node: t-1 (indices 0 through t-2)
Key to promote: 1 (index t-1, the median)
Keys to move to right node: t-1 (indices t through 2t-2)

Verification: (t-1) + 1 + (t-1) = 2t-1 ✓

This perfect division ensures both resulting nodes satisfy the minimum occupancy requirement of t-1 keys.

For child pointers (internal nodes): With 2t children (indices 0 through 2t-1) before split:

Children to stay: t (indices 0 through t-1)
Children to move: t (indices t through 2t-1)

Verification: t + t = 2t ✓

Each node retains one more child pointer than keys, maintaining the key-pointer relationship.

Why Minimum Occupancy Matters

The Complete Split Algorithm

Here is the complete, production-quality split algorithm with all necessary operations:

SPLIT-CHILD(parent, childIndex):
    Input: parent     - the parent node of the node being split
           childIndex - index of the child to split in parent.children[]
    
    // Step 1: Get the full child and allocate new sibling
    fullChild ← parent.children[childIndex]
    newSibling ← AllocateNode()
    newSibling.isLeaf ← fullChild.isLeaf
    newSibling.numKeys ← t - 1
    
    // Step 2: Copy upper half of keys to new sibling
    FOR j ← 0 TO t - 2:
        newSibling.keys[j] ← fullChild.keys[t + j]
        // For B+-tree leaves, also copy values
        IF fullChild.isLeaf:
            newSibling.values[j] ← fullChild.values[t + j]
    
    // Step 3: For internal nodes, copy child pointers
    IF NOT fullChild.isLeaf:
        FOR j ← 0 TO t - 1:
            newSibling.children[j] ← fullChild.children[t + j]
    
    // Step 4: Truncate the full child
    fullChild.numKeys ← t - 1
    
    // Step 5: Make room in parent for new child pointer
    FOR j ← parent.numKeys DOWNTO childIndex + 1:
        parent.children[j + 1] ← parent.children[j]
    parent.children[childIndex + 1] ← newSibling
    
    // Step 6: Make room in parent for median key
    FOR j ← parent.numKeys - 1 DOWNTO childIndex:
        parent.keys[j + 1] ← parent.keys[j]
    
    // Step 7: Moving the median key to parent
    parent.keys[childIndex] ← fullChild.keys[t - 1]
    parent.numKeys ← parent.numKeys + 1
    
    // Step 8: For B+-tree leaves, maintain sibling pointers
    IF fullChild.isLeaf:
        newSibling.nextLeaf ← fullChild.nextLeaf
        IF fullChild.nextLeaf ≠ NULL:
            fullChild.nextLeaf.prevLeaf ← newSibling
        fullChild.nextLeaf ← newSibling
        newSibling.prevLeaf ← fullChild
    
    // Step 9: Write all modified nodes to disk
    DISK-WRITE(fullChild)
    DISK-WRITE(newSibling)
    DISK-WRITE(parent)

Algorithm Step Analysis

•Steps 1-3: Data Transfer — Create new node and copy upper half of keys (and values/children as applicable). Time: O(t) for key copies.
•Step 4: Left Node Truncation — Simply adjust the key count; actual key data remains but is now ignored. This is faster than clearing individual entries.
•Steps 5-6: Parent Adjustment — Shift existing keys and children right to make room. Time: O(t) for shifts.
•Step 7: Median Promotion — The median key becomes a separator in the parent. Note: For B+-tree leaves, this is a copy, not a move.
•Step 8: Leaf Pointer Maintenance — Insert the new sibling into the doubly-linked list of leaves. Critical for range query support.
•Step 9: Persistence — Write all three modified nodes atomically (or with proper logging for crash safety).

The Atomicity Requirement

The Special Case: Root Splitting

Root splitting is unique because it's the only operation that increases the tree's height. When the root is full and needs to split, we create a new root above it.

Algorithm: SPLIT-ROOT

SPLIT-ROOT(tree):
    Input: tree - the B-tree with a full root
    
    // Step 1: Allocate new root
    oldRoot ← tree.root
    newRoot ← AllocateNode()
    newRoot.isLeaf ← FALSE
    newRoot.numKeys ← 0
    
    // Step 2: Old root becomes first child of new root
    newRoot.children[0] ← oldRoot
    
    // Step 3: Split the old root (now a child of new root)
    SPLIT-CHILD(newRoot, 0)
    
    // Step 4: Update tree's root reference
    tree.root ← newRoot
    DISK-WRITE(newRoot)

Before root split:

  [10 | 20 | 30 | 40 | 50]  ← Full root (5 keys, t=3)

After root split:

        [30]  ← New root (height increased by 1)
       /    \
   [10|20]  [40|50]  ← Old root split into two children

Key Insights about Root Splitting:

Height Growth is Rare: Tree height only increases through root splits. For a tree with t=100 and starting from 1 key, you need approximately 100, 10,000, 1,000,000, 100,000,000 keys to reach heights 1, 2, 3, 4 respectively. Root splits happen infrequently at scale.
Root Exceptions: The root is the only node allowed to have fewer than t-1 keys. After a root split, the new root has exactly 1 key.
Height Uniformity: All paths from root to leaves have the same length. Root splitting maintains this because it adds exactly one level above the entire previous tree.
Logarithmic Guarantee: Because height increases by 1 only when the number of keys roughly multiplies by t, height remains O(log_t n).

Tree Height vs. Key Capacity (t=100)
Height	Minimum Keys	Maximum Keys	Example Scale
1	1	199	Tens to hundreds
2	~200	~40,000	Thousands
3	~20,000	~8,000,000	Millions
4	~2,000,000	~1.6 billion	Industrial databases
5	~200,000,000	~320 billion	Massive data warehouses

Root Always in Memory

Splitting Propagation (Cascading Splits)

When a node splits, its median key moves to the parent. If the parent was already nearly full, it might exceed capacity and need to split as well. This is cascading or propagating splits.

Example of Cascading Splits (t=2, max 3 keys per node):

Initial state:

             [30 | 50 | 70]  ← Internal node (full)
            /    |    |    \
     [10|20] [35|40|45] [55|60] [75|80|85]
                ↑ Full leaf - inserting 42 will cause cascade

Step 1: Insert 42 causes leaf split

             [30 | 50 | 70]  ← Still full, needs split
            /    |    |    \
     [10|20] [35|40] [45|50] [55|60] [75|80|85]
               ↑       ↑
          Left keeps  Right takes upper half
          40 goes to parent

But wait—40 needs to go to parent, which is full!

Step 2: Parent (internal node) splits

                   [50]  ← New separator at this level
                  /    \
          [30|40]      [70]
         /  |  \        |  \
    [10|20] [35] [42|45] [55|60] [75|80|85]

Now the parent of [50] (the root) needs to receive this median. If the root was full too, it would split and grow the tree height.

Proactive Splitting Prevents Cascades:

With proactive splitting, we split any full node encountered during descent, before we need it:

Insert 42:
1. Root [30|50|70] is full → Split it first
   Result: new root [50], children [30] and [70]
   
2. Descend: 42 > 30, 42 < 50 → go to left subtree
   [30|40|45] is the child (after leaf split from previous ops)
   This node has space → proceed
   
3. At leaf [35|40|45]: full → Split during descent
   Result: [35|40] and [45], median 40 goes to parent [30|40]
   
4. Now insert 42 into [45|48...] → space available, done

Proactive splitting ensures that when we need to insert a median into a parent, the parent always has room—because we already split it if it was full.

Maximum Cascade Length

Leaf Node vs. Internal Node Splits

While the splitting principle is the same, leaf and internal node splits differ in important implementation details, especially for B+-trees.

Leaf Node Splits:

Contain actual data (key-value pairs)
For B+-trees: median key is copied up (remains in leaf for data access)
Sibling pointers must be updated to maintain linked list
May trigger prefetching of adjacent leaves for range queries

Internal Node Splits:

Contain only keys and child pointers (separators)
Median key is moved up (does not remain in the split node)
No sibling linking required (B-link trees are an exception)
May affect routing decisions for concurrent operations

split-comparison.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// B+-Tree Leaf Split: Median is COPIED
function splitLeafNode(leaf: LeafNode): { newLeaf: LeafNode; separatorKey: Key } {
    const midpoint = Math.floor(leaf.keys.length / 2);
    const newLeaf = new LeafNode();
    
    // Upper half goes to new leaf (keys AND values)
    newLeaf.keys = leaf.keys.splice(midpoint);
    newLeaf.values = leaf.values.splice(midpoint);
    
    // The separator is the FIRST key of the new (right) leaf
    // It stays in the leaf—we're only COPYING it up
    const separatorKey = newLeaf.keys[0];  // Not removed!
    
    // Maintain sibling pointers
    newLeaf.nextLeaf = leaf.nextLeaf;
    leaf.nextLeaf = newLeaf;
    
    return { newLeaf, separatorKey };
}
 
// B+-Tree Internal Node Split: Median is PROMOTED (moved)
function splitInternalNode(node: InternalNode): { newNode: InternalNode; promotedKey: Key } {
    const midpoint = Math.floor(node.keys.length / 2);
    const newNode = new InternalNode();
    
    // Key at midpoint moves UP—it's the promoted separator
    const promotedKey = node.keys[midpoint];
    
    // Keys AFTER midpoint go to new node
    newNode.keys = node.keys.splice(midpoint + 1);  // +1: skip the promoted key
    newNode.children = node.children.splice(midpoint + 1);
    
    // Remove the promoted key from original node
    node.keys.pop();  // The midpoint key was already spliced, just cleanup
    
    return { newNode, promotedKey };
}

B+-Tree Split Differences
Aspect	Leaf Split	Internal Split
Median key handling	Copy to parent	Move to parent
Key remains in node?	Yes (for data access)	No (becomes pure separator)
Values/data	Split with keys	N/A (no values)
Child pointers	N/A (leaves have no children)	Split with internal keys
Sibling pointers	Must update linked list	Not applicable (usually)
Resulting left node key count	⌊n/2⌋	⌊n/2⌋ - 1

Common Implementation Bug

Space Efficiency and Fill Factor

Node splitting directly impacts space utilization. Let's analyze how full nodes typically are and techniques to optimize space usage.

Theoretical Analysis:

After a split, both nodes are exactly 50% full (t-1 keys out of 2t-1 capacity)
As insertions continue, nodes fill from 50% toward 100%
Average fill factor with random insertions: approximately ln(2) ≈ 69%

This 69% figure comes from a classic analysis: if we model node contents as a random variable between t-1 and 2t-1 with uniform splitting, the expected value converges to about 69% of capacity.

Space Utilization Formula:

$$\text{Average fill} = \frac{\text{Average keys per node}}{\text{Maximum keys per node}} = \frac{\ln(2) \cdot (2t-1)}{2t-1} \approx 69%$$

Techniques to Improve Fill Factor:

Delayed Splitting (Redistribution First): When a node is full, try to redistribute keys with a sibling before splitting.
```
IF sibling has room:
    Transfer some keys from full node to sibling
    Update parent separator
    No split needed
ELSE:
    Proceed with split
```
This can push fill factor to ~75-80%.
B-Trees (2/3 Full Minimum):*
- Before splitting, redistribute keys with both siblings
- Split three full nodes into four 2/3-full nodes
- Minimum occupancy: 2t/3 instead of t/2
- Fill factor: approximately 81%
Bulk Loading:
- Pack nodes to near-100% capacity initially
- Subsequent insertions may cause splits, but overall density remains high
Index Rebuilding:
- Periodically rebuild indexes to restore optimal packing
- Common in data warehouse batch processing

Fill Factor by Technique
Technique	Average Fill Factor	Implementation Complexity
Standard B-tree splits	~69%	Low
Redistribution before split	~75-80%	Medium
B*-tree (2/3 splitting)	~81%	High
Bulk loading	~95-100%	Low (initial build only)
Periodic rebuild	~95-100%	Low (background process)

Fill Factor Trade-offs

Atomicity and Crash Recovery

Failure Scenarios:

Crash after writing left node only:
- Left node truncated but no right sibling exists
- Keys in upper half are lost
- Tree internally consistent but data lost
Crash after writing left and right nodes:
- Both child nodes exist but parent doesn't reference right sibling
- Right sibling is orphaned (unreachable)
- Tree internally consistent but right sibling is a space leak
Crash after writing all but updating tree metadata:
- All nodes written but root pointer unchanged (for root split)
- New root exists but isn't reachable

Solution 1: Write-Ahead Logging (WAL)

The standard approach in databases:

SPLIT-WITH-WAL(parent, childIndex):
    // Step 1: Log the intent
    logRecord ← CreateLogRecord(
        type: SPLIT,
        parentPageId: parent.pageId,
        childIndex: childIndex,
        originalKeys: fullChild.keys,
        timestamp: now()
    )
    WRITE-TO-WAL(logRecord)
    FLUSH-WAL()  // Ensure log is on stable storage
    
    // Step 2: Perform the actual split
    SPLIT-CHILD(parent, childIndex)
    
    // Step 3: Commit
    commitRecord ← CreateLogRecord(type: COMMIT, splitLogId: logRecord.id)
    WRITE-TO-WAL(commitRecord)

Recovery process:

Scan log for uncommitted split operations
For each uncommitted split: either redo completely or undo completely based on state
Guarantee: tree is always in a valid state after recovery

Solution 2: Shadow Paging

Create new versions of modified pages without overwriting originals:

SPLIT-WITH-SHADOW-PAGES(parent, childIndex):
    // Allocate new pages for all modifications
    newParent ← CopyPage(parent)
    newLeft ← CopyPage(fullChild)  // Will be truncated
    newRight ← AllocatePage()       // New sibling
    
    // Perform split on new copies
    PerformSplit(newParent, newLeft, newRight)
    
    // Atomic update: change one pointer to swap trees
    parentOfParent.children[idx] ← newParent  // Single atomic write
    
    // Mark old pages as free (after ensuring durability)
    DeferredFree(parent, fullChild)

Shadow paging avoids in-place updates but can fragment storage.

Solution 3: B-Link Tree Approach

B-link trees add high-key markers and sibling pointers that make splits visible even if the parent hasn't been updated:

Each node stores its "high key" (maximum key that belongs in this subtree)
Each node has a pointer to its right sibling
A search that encounters a key > high key follows the sibling pointer
Splits can be non-atomic because readers can "fix up" their path

Industry Practice

Advanced Splitting Optimizations

Production systems employ various optimizations to reduce the overhead of splitting operations.

1. Asymmetric Splits:

Instead of always splitting 50/50, adapt to access patterns:

// For sequential inserts (e.g., auto-increment IDs)
IF insertPattern == ASCENDING:
    // Keep most keys in left (already searched), move few to right
    splitPoint ← 0.9 * numKeys  // 90/10 split
ELSE IF insertPattern == DESCENDING:
    splitPoint ← 0.1 * numKeys  // 10/90 split
ELSE:
    splitPoint ← 0.5 * numKeys  // Standard 50/50

Benefit: For sequential inserts, asymmetric splits achieve ~90%+ fill factor and reduce split frequency.

2. Suffix Truncation:

For separator keys in internal nodes, store only enough bytes to distinguish subtrees:

Left leaf:  [..., "postgresql"]
Right leaf: ["python", ...]
Full separator: "python" (6 bytes)
Truncated separator: "py" (2 bytes) - sufficient to distinguish

Reduces internal node size, increasing fanout and reducing tree height.

3. Prefix Compression:

Store common prefixes once per node:

Uncompressed keys: ["company_sales_2024", "company_sales_2025", "company_sales_2026"]
Prefix: "company_sales_20"
Compressed: ["24", "25", "26"]

Saves significant space for keys with shared prefixes (common in database tables).

4. Batched Split Decisions:

Rather than splitting immediately when full, buffer insertions and batch splits:

IF node.numKeys >= MAX_KEYS:
    pendingSplits.add(node)
    
IF pendingSplits.size >= BATCH_THRESHOLD OR timeSinceLast >= BATCH_INTERVAL:
    FOR each node in pendingSplits:
        performSplit(node)
    writeAllDirtyPages()  // Single flush for multiple splits

Reduces I/O amplification by combining multiple log writes.

5. Copy-on-Write (COW) Splits:

Never modify pages in place—create new copies:

Enables lock-free reads (readers see consistent old version)
Simplifies concurrent access
Used by LMDB, Couchbase, and some modern databases

Monitoring Split Performance

Debugging Split-Related Issues

Split bugs are among the most difficult B-tree issues to diagnose because they can cause subtle, delayed symptoms. Here's a systematic debugging approach:

Common Symptoms of Split Bugs:

Symptom	Likely Cause
Keys disappearing	Median not copied up (B+-tree leaf split bug)
Duplicate results	Median copied instead of moved (internal split)
Tree height decreasing	Split allocating wrong node type (leaf vs internal)
Crash after restart	Incomplete WAL recovery for split operations
Performance degradation	Orphaned nodes from incomplete splits
Infinite loops in range scans	Sibling pointer corruption during split

split-verification.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
interface SplitVerificationResult {
    valid: boolean;
    errors: string[];
}
 
function verifySplit(
    parent: BTreeNode,
    leftChild: BTreeNode,
    rightChild: BTreeNode,
    separatorIndex: number,
    t: number
): SplitVerificationResult {
    const errors: string[] = [];
    
    // Verify left child key count
    if (leftChild.keys.length !== t - 1) {
        errors.push(`Left child has ${leftChild.keys.length} keys, expected ${t - 1}`);
    }
    
    // Verify right child key count
    if (rightChild.keys.length !== t - 1) {
        errors.push(`Right child has ${rightChild.keys.length} keys, expected ${t - 1}`);
    }
    
    // Verify separator is between left's max and right's min
    const separator = parent.keys[separatorIndex];
    const leftMax = leftChild.keys[leftChild.keys.length - 1];
    const rightMin = rightChild.keys[0];
    
    if (separator < leftMax) {
        errors.push(`Separator ${separator} is less than left max ${leftMax}`);
    }
    if (separator > rightMin && !rightChild.isLeaf) {
        // For internal nodes, separator should be < right's min
        errors.push(`Separator ${separator} is greater than right min ${rightMin} in internal split`);
    }
    if (separator !== rightMin && rightChild.isLeaf) {
        // For B+-tree leaves, separator should equal right's min
        errors.push(`Separator ${separator} doesn't match right min ${rightMin} in leaf split`);
    }
    
    // Verify parent has correct number of children
    if (parent.children.length !== parent.keys.length + 1) {
        errors.push(`Parent has ${parent.children.length} children for ${parent.keys.length} keys`);
    }
    
    // Verify child pointers are correct
    if (parent.children[separatorIndex] !== leftChild) {
        errors.push(`Parent's child[${separatorIndex}] doesn't point to left child`);
    }
    if (parent.children[separatorIndex + 1] !== rightChild) {
        errors.push(`Parent's child[${separatorIndex + 1}] doesn't point to right child`);
    }
    
    return {
        valid: errors.length === 0,
        errors
    };
}

Testing Split Edge Cases

Summary: Mastering Node Splitting

Node splitting is the elegant mechanism that keeps B-trees balanced regardless of insertion order. Let's consolidate the essential knowledge:

Key Takeaways

•Splitting divides overflowing nodes evenly — A full node of 2t-1 keys becomes two nodes of t-1 keys each, with the median promoted to the parent.
•Root splitting is the only height-increasing operation — When the root splits, a new root is created above it, increasing tree height by exactly one.
•Leaf vs. internal splits differ in median handling — B+-tree leaf splits copy the median up (key remains for data access); internal splits move it (becomes pure separator).
•Proactive splitting ensures single-pass insertion — By splitting full nodes during descent, we guarantee space at the target leaf without backtracking.
•Crash safety requires atomicity mechanisms — WAL, shadow paging, or B-link structures ensure splits complete fully or not at all.
•Optimizations address real workload patterns — Asymmetric splits, suffix truncation, and batching improve performance for specific scenarios.

Ready for Deletion

3 / 5