Loading learning content...
Node splitting is the defining operation that gives B-trees their self-balancing property. Unlike naive binary search trees that can degenerate into linked lists with skewed insertions, B-trees distribute keys across the tree through splitting, guaranteeing logarithmic height regardless of insertion patterns.
While we introduced splitting conceptually in the insertion algorithm, this page explores the operation in exhaustive detail—examining the mechanics, edge cases, optimizations, and failure modes that production implementations must handle.
By completing this page, you will understand: (1) The complete splitting algorithm with all edge cases, (2) How splitting propagates through the tree, (3) Differences between leaf and internal node splits, (4) Space efficiency and alternative splitting strategies, (5) Atomicity requirements and crash recovery, and (6) Advanced optimizations like deferred and batched splitting.
When a node exceeds its maximum capacity of 2t-1 keys (for minimum degree t), it must be split into two nodes, each containing an acceptable number of keys. Let's dissect this operation component by component.
Pre-conditions for splitting:
Post-conditions after splitting:
The Mathematics of Even Division:
With 2t-1 keys and a split:
Verification: (t-1) + 1 + (t-1) = 2t-1 ✓
This perfect division ensures both resulting nodes satisfy the minimum occupancy requirement of t-1 keys.
For child pointers (internal nodes): With 2t children (indices 0 through 2t-1) before split:
Verification: t + t = 2t ✓
Each node retains one more child pointer than keys, maintaining the key-pointer relationship.
The minimum occupancy of t-1 keys (50% full) ensures that after splitting, both nodes can accept additional insertions before needing to split again. Without this minimum, repeated splits could create nearly-empty nodes, wasting space and increasing tree height.
Here is the complete, production-quality split algorithm with all necessary operations:
SPLIT-CHILD(parent, childIndex):
Input: parent - the parent node of the node being split
childIndex - index of the child to split in parent.children[]
// Step 1: Get the full child and allocate new sibling
fullChild ← parent.children[childIndex]
newSibling ← AllocateNode()
newSibling.isLeaf ← fullChild.isLeaf
newSibling.numKeys ← t - 1
// Step 2: Copy upper half of keys to new sibling
FOR j ← 0 TO t - 2:
newSibling.keys[j] ← fullChild.keys[t + j]
// For B+-tree leaves, also copy values
IF fullChild.isLeaf:
newSibling.values[j] ← fullChild.values[t + j]
// Step 3: For internal nodes, copy child pointers
IF NOT fullChild.isLeaf:
FOR j ← 0 TO t - 1:
newSibling.children[j] ← fullChild.children[t + j]
// Step 4: Truncate the full child
fullChild.numKeys ← t - 1
// Step 5: Make room in parent for new child pointer
FOR j ← parent.numKeys DOWNTO childIndex + 1:
parent.children[j + 1] ← parent.children[j]
parent.children[childIndex + 1] ← newSibling
// Step 6: Make room in parent for median key
FOR j ← parent.numKeys - 1 DOWNTO childIndex:
parent.keys[j + 1] ← parent.keys[j]
// Step 7: Moving the median key to parent
parent.keys[childIndex] ← fullChild.keys[t - 1]
parent.numKeys ← parent.numKeys + 1
// Step 8: For B+-tree leaves, maintain sibling pointers
IF fullChild.isLeaf:
newSibling.nextLeaf ← fullChild.nextLeaf
IF fullChild.nextLeaf ≠ NULL:
fullChild.nextLeaf.prevLeaf ← newSibling
fullChild.nextLeaf ← newSibling
newSibling.prevLeaf ← fullChild
// Step 9: Write all modified nodes to disk
DISK-WRITE(fullChild)
DISK-WRITE(newSibling)
DISK-WRITE(parent)
If a crash occurs after writing one or two of the three nodes, the tree is inconsistent. Production systems use write-ahead logging (WAL) to ensure atomicity: log the split operation before modifying nodes, replay from the log if a crash occurs mid-split.
Root splitting is unique because it's the only operation that increases the tree's height. When the root is full and needs to split, we create a new root above it.
Algorithm: SPLIT-ROOT
SPLIT-ROOT(tree):
Input: tree - the B-tree with a full root
// Step 1: Allocate new root
oldRoot ← tree.root
newRoot ← AllocateNode()
newRoot.isLeaf ← FALSE
newRoot.numKeys ← 0
// Step 2: Old root becomes first child of new root
newRoot.children[0] ← oldRoot
// Step 3: Split the old root (now a child of new root)
SPLIT-CHILD(newRoot, 0)
// Step 4: Update tree's root reference
tree.root ← newRoot
DISK-WRITE(newRoot)
Before root split:
[10 | 20 | 30 | 40 | 50] ← Full root (5 keys, t=3)
After root split:
[30] ← New root (height increased by 1)
/ \
[10|20] [40|50] ← Old root split into two children
Key Insights about Root Splitting:
Height Growth is Rare: Tree height only increases through root splits. For a tree with t=100 and starting from 1 key, you need approximately 100, 10,000, 1,000,000, 100,000,000 keys to reach heights 1, 2, 3, 4 respectively. Root splits happen infrequently at scale.
Root Exceptions: The root is the only node allowed to have fewer than t-1 keys. After a root split, the new root has exactly 1 key.
Height Uniformity: All paths from root to leaves have the same length. Root splitting maintains this because it adds exactly one level above the entire previous tree.
Logarithmic Guarantee: Because height increases by 1 only when the number of keys roughly multiplies by t, height remains O(log_t n).
| Height | Minimum Keys | Maximum Keys | Example Scale |
|---|---|---|---|
| 1 | 1 | 199 | Tens to hundreds |
| 2 | ~200 | ~40,000 | Thousands |
| 3 | ~20,000 | ~8,000,000 | Millions |
| 4 | ~2,000,000 | ~1.6 billion | Industrial databases |
| 5 | ~200,000,000 | ~320 billion | Massive data warehouses |
Database systems always cache the root node in memory. For typical trees (height ≤ 5), keeping the top 2-3 levels cached eliminates most disk reads. A root split doesn't significantly impact this caching strategy—it just adds one more always-cached node.
When a node splits, its median key moves to the parent. If the parent was already nearly full, it might exceed capacity and need to split as well. This is cascading or propagating splits.
Example of Cascading Splits (t=2, max 3 keys per node):
Initial state:
[30 | 50 | 70] ← Internal node (full)
/ | | \
[10|20] [35|40|45] [55|60] [75|80|85]
↑ Full leaf - inserting 42 will cause cascade
Step 1: Insert 42 causes leaf split
[30 | 50 | 70] ← Still full, needs split
/ | | \
[10|20] [35|40] [45|50] [55|60] [75|80|85]
↑ ↑
Left keeps Right takes upper half
40 goes to parent
But wait—40 needs to go to parent, which is full!
Step 2: Parent (internal node) splits
[50] ← New separator at this level
/ \
[30|40] [70]
/ | \ | \
[10|20] [35] [42|45] [55|60] [75|80|85]
Now the parent of [50] (the root) needs to receive this median. If the root was full too, it would split and grow the tree height.
Proactive Splitting Prevents Cascades:
With proactive splitting, we split any full node encountered during descent, before we need it:
Insert 42:
1. Root [30|50|70] is full → Split it first
Result: new root [50], children [30] and [70]
2. Descend: 42 > 30, 42 < 50 → go to left subtree
[30|40|45] is the child (after leaf split from previous ops)
This node has space → proceed
3. At leaf [35|40|45]: full → Split during descent
Result: [35|40] and [45], median 40 goes to parent [30|40]
4. Now insert 42 into [45|48...] → space available, done
Proactive splitting ensures that when we need to insert a median into a parent, the parent always has room—because we already split it if it was full.
In the worst case (without proactive splitting), a single insertion can trigger splits at every level from leaf to root—O(log_t n) splits. With proactive splitting, splits still occur but are performed during descent, not as a separate upward phase. The total work is the same; the execution pattern differs.
While the splitting principle is the same, leaf and internal node splits differ in important implementation details, especially for B+-trees.
Leaf Node Splits:
Internal Node Splits:
12345678910111213141516171819202122232425262728293031323334353637
// B+-Tree Leaf Split: Median is COPIEDfunction splitLeafNode(leaf: LeafNode): { newLeaf: LeafNode; separatorKey: Key } { const midpoint = Math.floor(leaf.keys.length / 2); const newLeaf = new LeafNode(); // Upper half goes to new leaf (keys AND values) newLeaf.keys = leaf.keys.splice(midpoint); newLeaf.values = leaf.values.splice(midpoint); // The separator is the FIRST key of the new (right) leaf // It stays in the leaf—we're only COPYING it up const separatorKey = newLeaf.keys[0]; // Not removed! // Maintain sibling pointers newLeaf.nextLeaf = leaf.nextLeaf; leaf.nextLeaf = newLeaf; return { newLeaf, separatorKey };} // B+-Tree Internal Node Split: Median is PROMOTED (moved)function splitInternalNode(node: InternalNode): { newNode: InternalNode; promotedKey: Key } { const midpoint = Math.floor(node.keys.length / 2); const newNode = new InternalNode(); // Key at midpoint moves UP—it's the promoted separator const promotedKey = node.keys[midpoint]; // Keys AFTER midpoint go to new node newNode.keys = node.keys.splice(midpoint + 1); // +1: skip the promoted key newNode.children = node.children.splice(midpoint + 1); // Remove the promoted key from original node node.keys.pop(); // The midpoint key was already spliced, just cleanup return { newNode, promotedKey };}| Aspect | Leaf Split | Internal Split |
|---|---|---|
| Median key handling | Copy to parent | Move to parent |
| Key remains in node? | Yes (for data access) | No (becomes pure separator) |
| Values/data | Split with keys | N/A (no values) |
| Child pointers | N/A (leaves have no children) | Split with internal keys |
| Sibling pointers | Must update linked list | Not applicable (usually) |
| Resulting left node key count | ⌊n/2⌋ | ⌊n/2⌋ - 1 |
Treating leaf and internal splits identically causes bugs. If you 'move' (instead of 'copy') the median from a leaf, that key's data becomes inaccessible—searches for it will fail. If you 'copy' from internal nodes, you get duplicate separators causing routing confusion.
Node splitting directly impacts space utilization. Let's analyze how full nodes typically are and techniques to optimize space usage.
Theoretical Analysis:
This 69% figure comes from a classic analysis: if we model node contents as a random variable between t-1 and 2t-1 with uniform splitting, the expected value converges to about 69% of capacity.
Space Utilization Formula:
$$\text{Average fill} = \frac{\text{Average keys per node}}{\text{Maximum keys per node}} = \frac{\ln(2) \cdot (2t-1)}{2t-1} \approx 69%$$
Techniques to Improve Fill Factor:
Delayed Splitting (Redistribution First): When a node is full, try to redistribute keys with a sibling before splitting.
IF sibling has room:
Transfer some keys from full node to sibling
Update parent separator
No split needed
ELSE:
Proceed with split
This can push fill factor to ~75-80%.
B-Trees (2/3 Full Minimum):*
Bulk Loading:
Index Rebuilding:
| Technique | Average Fill Factor | Implementation Complexity |
|---|---|---|
| Standard B-tree splits | ~69% | Low |
| Redistribution before split | ~75-80% | Medium |
| B*-tree (2/3 splitting) | ~81% | High |
| Bulk loading | ~95-100% | Low (initial build only) |
| Periodic rebuild | ~95-100% | Low (background process) |
Higher fill factors save space but increase split frequency for write-heavy workloads. Read-heavy workloads benefit from higher fill (fewer nodes to traverse). Write-heavy workloads may prefer lower fill to reduce split overhead. Some databases let you configure target fill factor.
A split modifies three nodes: the original (now left), the new sibling (right), and the parent. If a system crash occurs mid-operation, the tree could be left in an inconsistent state. Production systems employ several strategies to ensure atomicity.
Failure Scenarios:
Crash after writing left node only:
Crash after writing left and right nodes:
Crash after writing all but updating tree metadata:
Solution 1: Write-Ahead Logging (WAL)
The standard approach in databases:
SPLIT-WITH-WAL(parent, childIndex):
// Step 1: Log the intent
logRecord ← CreateLogRecord(
type: SPLIT,
parentPageId: parent.pageId,
childIndex: childIndex,
originalKeys: fullChild.keys,
timestamp: now()
)
WRITE-TO-WAL(logRecord)
FLUSH-WAL() // Ensure log is on stable storage
// Step 2: Perform the actual split
SPLIT-CHILD(parent, childIndex)
// Step 3: Commit
commitRecord ← CreateLogRecord(type: COMMIT, splitLogId: logRecord.id)
WRITE-TO-WAL(commitRecord)
Recovery process:
Solution 2: Shadow Paging
Create new versions of modified pages without overwriting originals:
SPLIT-WITH-SHADOW-PAGES(parent, childIndex):
// Allocate new pages for all modifications
newParent ← CopyPage(parent)
newLeft ← CopyPage(fullChild) // Will be truncated
newRight ← AllocatePage() // New sibling
// Perform split on new copies
PerformSplit(newParent, newLeft, newRight)
// Atomic update: change one pointer to swap trees
parentOfParent.children[idx] ← newParent // Single atomic write
// Mark old pages as free (after ensuring durability)
DeferredFree(parent, fullChild)
Shadow paging avoids in-place updates but can fragment storage.
Solution 3: B-Link Tree Approach
B-link trees add high-key markers and sibling pointers that make splits visible even if the parent hasn't been updated:
Most production databases (PostgreSQL, MySQL, SQLite) use write-ahead logging for B-tree modifications. The ARIES recovery algorithm provides a well-understood framework for handling splits, merges, and all other tree modifications with full crash safety.
Production systems employ various optimizations to reduce the overhead of splitting operations.
1. Asymmetric Splits:
Instead of always splitting 50/50, adapt to access patterns:
// For sequential inserts (e.g., auto-increment IDs)
IF insertPattern == ASCENDING:
// Keep most keys in left (already searched), move few to right
splitPoint ← 0.9 * numKeys // 90/10 split
ELSE IF insertPattern == DESCENDING:
splitPoint ← 0.1 * numKeys // 10/90 split
ELSE:
splitPoint ← 0.5 * numKeys // Standard 50/50
Benefit: For sequential inserts, asymmetric splits achieve ~90%+ fill factor and reduce split frequency.
2. Suffix Truncation:
For separator keys in internal nodes, store only enough bytes to distinguish subtrees:
Left leaf: [..., "postgresql"]
Right leaf: ["python", ...]
Full separator: "python" (6 bytes)
Truncated separator: "py" (2 bytes) - sufficient to distinguish
Reduces internal node size, increasing fanout and reducing tree height.
3. Prefix Compression:
Store common prefixes once per node:
Uncompressed keys: ["company_sales_2024", "company_sales_2025", "company_sales_2026"]
Prefix: "company_sales_20"
Compressed: ["24", "25", "26"]
Saves significant space for keys with shared prefixes (common in database tables).
4. Batched Split Decisions:
Rather than splitting immediately when full, buffer insertions and batch splits:
IF node.numKeys >= MAX_KEYS:
pendingSplits.add(node)
IF pendingSplits.size >= BATCH_THRESHOLD OR timeSinceLast >= BATCH_INTERVAL:
FOR each node in pendingSplits:
performSplit(node)
writeAllDirtyPages() // Single flush for multiple splits
Reduces I/O amplification by combining multiple log writes.
5. Copy-on-Write (COW) Splits:
Never modify pages in place—create new copies:
Track split operations in production: frequency, latency, and patterns. High split rates may indicate suboptimal page size, sequential insert patterns that could benefit from asymmetric splits, or the need for bulk loading instead of individual inserts.
Split bugs are among the most difficult B-tree issues to diagnose because they can cause subtle, delayed symptoms. Here's a systematic debugging approach:
Common Symptoms of Split Bugs:
| Symptom | Likely Cause |
|---|---|
| Keys disappearing | Median not copied up (B+-tree leaf split bug) |
| Duplicate results | Median copied instead of moved (internal split) |
| Tree height decreasing | Split allocating wrong node type (leaf vs internal) |
| Crash after restart | Incomplete WAL recovery for split operations |
| Performance degradation | Orphaned nodes from incomplete splits |
| Infinite loops in range scans | Sibling pointer corruption during split |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
interface SplitVerificationResult { valid: boolean; errors: string[];} function verifySplit( parent: BTreeNode, leftChild: BTreeNode, rightChild: BTreeNode, separatorIndex: number, t: number): SplitVerificationResult { const errors: string[] = []; // Verify left child key count if (leftChild.keys.length !== t - 1) { errors.push(`Left child has ${leftChild.keys.length} keys, expected ${t - 1}`); } // Verify right child key count if (rightChild.keys.length !== t - 1) { errors.push(`Right child has ${rightChild.keys.length} keys, expected ${t - 1}`); } // Verify separator is between left's max and right's min const separator = parent.keys[separatorIndex]; const leftMax = leftChild.keys[leftChild.keys.length - 1]; const rightMin = rightChild.keys[0]; if (separator < leftMax) { errors.push(`Separator ${separator} is less than left max ${leftMax}`); } if (separator > rightMin && !rightChild.isLeaf) { // For internal nodes, separator should be < right's min errors.push(`Separator ${separator} is greater than right min ${rightMin} in internal split`); } if (separator !== rightMin && rightChild.isLeaf) { // For B+-tree leaves, separator should equal right's min errors.push(`Separator ${separator} doesn't match right min ${rightMin} in leaf split`); } // Verify parent has correct number of children if (parent.children.length !== parent.keys.length + 1) { errors.push(`Parent has ${parent.children.length} children for ${parent.keys.length} keys`); } // Verify child pointers are correct if (parent.children[separatorIndex] !== leftChild) { errors.push(`Parent's child[${separatorIndex}] doesn't point to left child`); } if (parent.children[separatorIndex + 1] !== rightChild) { errors.push(`Parent's child[${separatorIndex + 1}] doesn't point to right child`); } return { valid: errors.length === 0, errors };}Thoroughly test: (1) Splitting the root when it's the only node, (2) Cascading splits to the root, (3) Splitting with minimum t (t=2), (4) Concurrent splits on sibling nodes, (5) Crash recovery at each step of the split process. These edge cases reveal most split bugs.
Node splitting is the elegant mechanism that keeps B-trees balanced regardless of insertion order. Let's consolidate the essential knowledge:
You now understand node splitting at a depth suitable for implementation and debugging. The next page explores the delete algorithm—where the inverse of splitting, node merging, maintains balance when keys are removed.