B Plus Tree Operations - Learning Module

Loading content...

0/252

Insert with Splitting

Growing the Tree from Within

A B+-tree is a living, breathing data structure. Every INSERT statement triggers a cascade of operations that must maintain strict invariants while integrating new data. Unlike simpler structures where insertion is straightforward, B+-tree insertion involves finding the correct location, potentially reorganizing nodes, and sometimes growing the tree upward through a process called splitting.

Understanding B+-tree insertion is critical for database professionals. Insertion patterns directly impact index fragmentation, page splits affect write latency, and the mechanics of splitting influence how concurrent transactions interact. A Principal Engineer must understand these dynamics to design tables, choose fill factors, and diagnose write performance issues.

What You Will Learn

By the end of this page, you will master B+-tree insertion including: locating the target leaf, simple insertion into non-full leaves, splitting overflowing leaves, propagating splits upward through internal nodes, handling root splits that increase tree height, and understanding the performance implications of these operations.

Insertion: The Big Picture

B+-tree insertion follows a conceptually simple but mechanically rich process:

Step 1: Find the Target Leaf Use the standard search algorithm to locate the leaf node where the new key belongs. This is identical to point query search—we find the leaf that would contain the key.

Step 2: Insert into the Leaf Add the new key (and its record pointer) to the leaf in sorted order.

Step 3: Handle Overflow (if needed) If the leaf now exceeds its maximum capacity, it must be split into two leaves, with a separator key promoted to the parent.

Step 4: Propagate Upward (if needed) If the parent internal node overflows due to the new separator key, split it too. This may cascade all the way to the root.

Step 5: Handle Root Split (if needed) If the root overflows, create a new root with the old root's two halves as children. This is the only way the tree's height increases.

Converting Mermaid diagram...

Height Only Increases at the Root

Unlike binary search trees where new nodes attach at the leaves (bottom), B+-trees grow at the top. When the root splits, a new root is created above it. This ensures all leaves remain at the same depth—the defining characteristic that guarantees O(log N) worst-case performance.

Simple Insertion: When There's Room

When the target leaf has fewer than the maximum allowed keys, insertion is straightforward:

Find the correct position within the leaf (where key would maintain sorted order)
Shift existing keys right to make room
Insert the new key and record pointer
Update the key count

No structural changes are needed—the tree's shape remains identical.

Simple Leaf Insertion
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
FUNCTION InsertIntoLeaf(leaf, newKey, recordPointer)
    // Precondition: leaf has space (keyCount < maxKeys)
    
    // Find the position to insert (maintain sorted order)
    insertPos ← 0
    WHILE insertPos < leaf.keyCount AND leaf.keys[insertPos] < newKey DO
        insertPos ← insertPos + 1
    END WHILE
    
    // Shift existing keys and pointers to make room
    FOR i ← leaf.keyCount DOWNTO insertPos + 1 DO
        leaf.keys[i] ← leaf.keys[i - 1]
        leaf.recordPointers[i] ← leaf.recordPointers[i - 1]
    END FOR
    
    // Insert the new key and pointer
    leaf.keys[insertPos] ← newKey
    leaf.recordPointers[insertPos] ← recordPointer
    leaf.keyCount ← leaf.keyCount + 1
    
    // Mark page as dirty for write-back
    MarkDirty(leaf)
END FUNCTION

Example: Insert Key = 24 into Leaf [20, 22, 28]

Leaf has maximum capacity of 4 keys (currently has 3):

Find position: 24 > 22 and 24 < 28, so position = 2
Shift: Move 28 from position 2 to position 3
Insert: Place 24 at position 2
Result: Leaf now contains [20, 22, 24, 28]

No split necessary—the operation completes with a single page write.

Simple Insertion Operation Costs
Operation	I/O Cost	Explanation
Find target leaf	O(height)	Tree descent via search
Insert into leaf	0 (in-memory)	Modify cached page
Write dirty page	1	Persist modified leaf
Total	O(height) + 1	Typically 4-6 I/Os

Leaf Node Splitting

When a leaf node is already full (has the maximum number of keys) and we need to insert another key, the node must split into two nodes. This is the most complex part of B+-tree insertion.

Splitting Process:

Create a new (empty) leaf node
Place the new key in the correct sorted position among existing keys
Divide the keys between the original leaf and the new leaf
Adjust sibling pointers to maintain the linked list
Promote a separator key to the parent internal node

Leaf Node Split Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
FUNCTION SplitLeafNode(leaf, newKey, recordPointer)
    // leaf is full (keyCount = maxKeys)
    // We need to insert newKey and split
    
    // Step 1: Create temporary array with all keys including new one
    allKeys ← new Array[maxKeys + 1]
    allPointers ← new Array[maxKeys + 1]
    
    insertPos ← FindInsertPosition(leaf, newKey)
    
    // Copy existing keys, inserting new key at correct position
    j ← 0
    FOR i ← 0 TO maxKeys DO
        IF i = insertPos THEN
            allKeys[i] ← newKey
            allPointers[i] ← recordPointer
        ELSE
            allKeys[i] ← leaf.keys[j]
            allPointers[i] ← leaf.recordPointers[j]
            j ← j + 1
        END IF
    END FOR
    
    // Step 2: Create new leaf node
    newLeaf ← AllocateNewPage()
    
    // Step 3: Distribute keys between original and new leaf
    // For B+-tree of order n, each leaf should have at least ⌈n/2⌉ keys
    // Split point: keep first ⌈(maxKeys+1)/2⌉ in original, rest in new
    splitPoint ← CEILING((maxKeys + 1) / 2)
    
    // Keys [0, splitPoint-1] stay in original leaf
    leaf.keyCount ← splitPoint
    FOR i ← 0 TO splitPoint - 1 DO
        leaf.keys[i] ← allKeys[i]
        leaf.recordPointers[i] ← allPointers[i]
    END FOR
    
    // Keys [splitPoint, maxKeys] go to new leaf
    newLeaf.keyCount ← (maxKeys + 1) - splitPoint
    FOR i ← splitPoint TO maxKeys DO
        newLeaf.keys[i - splitPoint] ← allKeys[i]
        newLeaf.recordPointers[i - splitPoint] ← allPointers[i]
    END FOR
    
    // Step 4: Update sibling pointers
    newLeaf.nextSibling ← leaf.nextSibling
    newLeaf.prevSibling ← leaf
    IF leaf.nextSibling ≠ NULL THEN
        leaf.nextSibling.prevSibling ← newLeaf
    END IF
    leaf.nextSibling ← newLeaf
    
    // Step 5: Determine separator key (smallest key in new leaf)
    separatorKey ← newLeaf.keys[0]
    
    RETURN (newLeaf, separatorKey)
END FUNCTION

Separator Key Selection

For B+-tree leaf splits, the separator key promoted to the parent is the smallest key in the new (right) leaf. This key is COPIED to the parent—it remains in the leaf because B+-tree leaves contain all the data. This is different from B-tree splits where the middle key MOVES to the parent.

Worked Example: Complete Leaf Split

Let's trace a complete leaf split with a concrete example.

Setup:

B+-tree with order 4 (max 3 keys per leaf, max 4 children per internal node)
Leaf node [20, 22, 28] is full
Insert key = 25

Step 1: Create temporary array with all keys

Original leaf keys: [20, 22, 28]
New key to insert: 25

Sorted combined: [20, 22, 25, 28] (4 keys)

Step 2: Determine split point

maxKeys = 3, so (maxKeys + 1) = 4
splitPoint = ⌈4/2⌉ = 2

Step 3: Distribute keys

Original leaf (positions 0, 1): [20, 22]
New leaf (positions 2, 3): [25, 28]

Step 4: Determine separator key

Separator = smallest key in new leaf = 25

Converting Mermaid diagram...

Leaf Split Step-by-Step Summary
Step	Action	Result
1	Combine original keys + new key	[20, 22, 25, 28]
2	Calculate split point	2 (first 2 keys stay)
3	Create new leaf	Empty new leaf allocated
4	Distribute keys 0-1 to original	Original: [20, 22]
5	Distribute keys 2-3 to new leaf	New: [25, 28]
6	Link new leaf (prev/next)	Original ↔ New ↔ (old sibling)
7	Identify separator key	25 (smallest in new leaf)
8	Insert separator in parent	Parent gets key 25

Sibling Pointer Updates

The new leaf must be inserted into the linked list: its prev points to the original leaf, its next points to the original's former next sibling. The original leaf's next now points to the new leaf. If there was a subsequent sibling, its prev must be updated to point to the new leaf. These pointer updates are critical for range query correctness.

Internal Node Splitting

When a separator key is inserted into an internal node and that node is already full, the internal node must split too. Internal node splitting differs from leaf splitting in one crucial way:

In leaf splits: The separator key is copied to the parent (it remains in the leaf) In internal node splits: The middle key is pushed up to the parent (it doesn't stay in either child)

This distinction exists because internal node keys are separators (routing guides), not actual data. There's no need to keep a copy in the child.

Internal Node Split Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
FUNCTION SplitInternalNode(node, newKey, newChildPointer)
    // node is a full internal node
    // newKey is the separator to insert
    // newChildPointer is the right child for newKey
    
    // Step 1: Create temporary arrays with all keys and pointers
    // An internal node with m children has m-1 keys
    // Full node: (n-1) keys, n children (where n is the order)
    // After insertion: n keys, n+1 children
    
    allKeys ← new Array[n]        // One more key than max
    allChildren ← new Array[n+1]  // Two more children than max
    
    // Insert new key and child in correct position
    insertPos ← FindInsertPosition(node, newKey)
    
    // Copy keys
    j ← 0
    FOR i ← 0 TO n-1 DO
        IF i = insertPos THEN
            allKeys[i] ← newKey
        ELSE
            allKeys[i] ← node.keys[j]
            j ← j + 1
        END IF
    END FOR
    
    // Copy children
    j ← 0
    FOR i ← 0 TO n DO
        IF i = insertPos + 1 THEN
            allChildren[i] ← newChildPointer
        ELSE
            allChildren[i] ← node.children[j]
            j ← j + 1
        END IF
    END FOR
    
    // Step 2: Determine split point
    // Middle key will be pushed up, not kept in either child
    midPoint ← n / 2    // Integer division
    pushUpKey ← allKeys[midPoint]
    
    // Step 3: Create new internal node
    newNode ← AllocateNewPage()
    
    // Step 4: Distribute keys and children
    // Left node: keys[0..midPoint-1], children[0..midPoint]
    node.keyCount ← midPoint
    FOR i ← 0 TO midPoint - 1 DO
        node.keys[i] ← allKeys[i]
    END FOR
    FOR i ← 0 TO midPoint DO
        node.children[i] ← allChildren[i]
    END FOR
    
    // Right node: keys[midPoint+1..n-1], children[midPoint+1..n]
    newNode.keyCount ← n - midPoint - 1
    FOR i ← 0 TO newNode.keyCount - 1 DO
        newNode.keys[i] ← allKeys[midPoint + 1 + i]
    END FOR
    FOR i ← 0 TO newNode.keyCount DO
        newNode.children[i] ← allChildren[midPoint + 1 + i]
    END FOR
    
    RETURN (newNode, pushUpKey)
END FUNCTION

Critical Difference: Push Up vs Copy Up

In internal node splits, the middle key is REMOVED from both children and pushed to the parent. This is because internal keys are separators—their only purpose is routing. In contrast, leaf splits COPY the separator because the key must remain in the leaf (it's actual data). Confusing these two mechanisms is a common implementation error.

Root Splitting: Growing the Tree Upward

When the propagation of splits reaches the root and the root overflows, the tree must grow by one level. This is the only situation where B+-tree height increases.

Root Split Process:

Split the root using the internal node split algorithm
Create a new root node
The new root has exactly one key (the pushed-up separator) and two children (the old root's two halves)

The tree now has one additional level, and all leaves are still at the same depth.

Root Split Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
FUNCTION SplitRoot(root, newKey, newChildPointer)
    // root is full and we need to insert newKey
    
    // Step 1: Split the root as an internal node
    (newRightChild, pushUpKey) ← SplitInternalNode(root, newKey, newChildPointer)
    
    // Step 2: The old root becomes the left child
    leftChild ← root
    
    // Step 3: Create a new root
    newRoot ← AllocateNewPage()
    newRoot.isLeaf ← FALSE
    newRoot.keyCount ← 1
    newRoot.keys[0] ← pushUpKey
    newRoot.children[0] ← leftChild
    newRoot.children[1] ← newRightChild
    
    // Step 4: Update tree's root pointer
    tree.root ← newRoot
    tree.height ← tree.height + 1
    
    RETURN newRoot
END FUNCTION

Converting Mermaid diagram...

Height Increase is Rare

Root splits are rare events. Consider a B+-tree with fanout 100 storing 10 billion keys. The height is about 6. For the height to increase to 7, the tree would need to grow from 10 billion to 1 trillion keys—adding 990 billion keys with only one root split! Each root split multiplies capacity by the fanout.

The Complete Insertion Algorithm

Now let's see the complete insertion algorithm that ties all the pieces together:

Complete B+-Tree Insertion
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
FUNCTION BPlusTreeInsert(tree, newKey, recordPointer)
    // ================================================
    // PHASE 1: FIND TARGET LEAF
    // ================================================
    
    // Use search to find the leaf where newKey belongs
    // Also track the path from root to leaf for potential splits
    path ← []    // Stack of (node, childIndex) pairs
    currentNode ← tree.root
    
    WHILE NOT currentNode.isLeaf DO
        childIndex ← FindChildIndex(currentNode, newKey)
        path.push((currentNode, childIndex))
        currentNode ← currentNode.children[childIndex]
    END WHILE
    
    targetLeaf ← currentNode
    
    // ================================================
    // PHASE 2: INSERT INTO LEAF
    // ================================================
    
    IF targetLeaf.keyCount < maxLeafKeys THEN
        // Simple case: leaf has room
        InsertIntoLeaf(targetLeaf, newKey, recordPointer)
        RETURN SUCCESS
    END IF
    
    // Leaf is full - need to split
    (newLeaf, separatorKey) ← SplitLeafNode(targetLeaf, newKey, recordPointer)
    
    // ================================================
    // PHASE 3: PROPAGATE SPLIT UPWARD
    // ================================================
    
    keyToInsert ← separatorKey
    rightChild ← newLeaf
    
    WHILE path is not empty DO
        (parentNode, childIndex) ← path.pop()
        
        IF parentNode.keyCount < maxInternalKeys THEN
            // Parent has room - insert and done
            InsertIntoInternal(parentNode, keyToInsert, rightChild, childIndex)
            RETURN SUCCESS
        END IF
        
        // Parent is full - need to split it too
        (newInternalNode, pushUpKey) ← SplitInternalNode(
            parentNode, keyToInsert, rightChild
        )
        
        // Prepare for next iteration
        keyToInsert ← pushUpKey
        rightChild ← newInternalNode
    END WHILE
    
    // ================================================
    // PHASE 4: ROOT SPLIT (if we propagated to root)
    // ================================================
    
    // If we get here, we split all the way to the root
    SplitRoot(tree.root, keyToInsert, rightChild)
    
    RETURN SUCCESS
END FUNCTION
 
FUNCTION InsertIntoInternal(node, key, rightChild, position)
    // Insert key at position, rightChild becomes children[position+1]
    
    // Shift keys right
    FOR i ← node.keyCount DOWNTO position + 1 DO
        node.keys[i] ← node.keys[i - 1]
    END FOR
    node.keys[position] ← key
    
    // Shift children right
    FOR i ← node.keyCount + 1 DOWNTO position + 2 DO
        node.children[i] ← node.children[i - 1]
    END FOR
    node.children[position + 1] ← rightChild
    
    node.keyCount ← node.keyCount + 1
    MarkDirty(node)
END FUNCTION

Path Tracking for Efficiency

The algorithm tracks the path from root to leaf during descent. This avoids retraversing the tree when propagating splits upward. Each split needs the parent node, which we already have on the stack. Without path tracking, we'd need parent pointers in each node or repeated root-to-node searches.

Insertion Complexity Analysis

Let's rigorously analyze the cost of B+-tree insertion:

B+-Tree Insertion Cost Breakdown
Operation	Worst Case	Typical Case	Notes
Find target leaf	O(log N)	O(log N)	Standard tree descent
Insert in leaf	O(n)	O(n)	Shift within node, n = order
Leaf split	O(n)	Rare	Redistribute n keys between nodes
Internal splits	O(h × n)	Very rare	At most h splits in worst case
Root split	O(n)	Extremely rare	Only when root overflows

I/O Complexity:

Best Case (no split): h reads (find leaf) + 1 write (modified leaf) = O(h)
Worst Case (splits to root): h reads + 2h+1 writes (all splits) = O(h)

Since h = O(log N), insertion is O(log N) I/Os in the worst case.

Why Splits Are Rare:

Assuming random insertions into a tree with nodes at 50% minimum occupancy:

Each node can absorb ~n/2 insertions before splitting
Probability of a single insertion causing a split ≈ 1/n
Probability of cascading to level k ≈ (1/n)^k

For n = 100:

Leaf split: 1% of insertions
Split reaching height 2: 0.01% of insertions
Root split: ~0.0001% of insertions

Insertion Performance Strengths

•Guaranteed O(log N) — even worst case is logarithmic
•Splits are localized — only affected nodes are modified
•Most insertions are simple — no split needed
•Amortized cost is lower — splits spread node creation over many inserts
•Predictable latency — no degenerative cases like BSTs

Insertion Overhead Factors

•Write amplification — splits write 3 pages (2 new + 1 parent)
•Concurrency complexity — splits may block concurrent readers
•Page allocation overhead — new pages need allocation and initialization
•Logging overhead — each modification must be logged for recovery
•Sequential insert pathology — always-increasing keys split the rightmost leaf

The Sequential Insert Problem

When keys are inserted in sorted order (like auto-incrementing IDs), every insertion goes to the rightmost leaf, causing it to split repeatedly. This leads to leaves that are only 50% full—significant space waste. Solutions include: bulk loading (covered later), using fill factors, or randomizing keys with hashing.

Summary: Insertion Mastery

Let's consolidate the key concepts of B+-tree insertion with splitting:

Key Takeaways

•Search First, Then Insert — Every insertion begins with a search to find the target leaf. The search path is cached for split propagation.
•Simple Insertion is Common — Most insertions find a leaf with room and complete with a single page modification.
•Leaf Splits Copy the Separator — When a leaf overflows, it splits and the smallest key of the new leaf is COPIED to the parent.
•Internal Splits Push the Separator — When an internal node overflows, the middle key is REMOVED from both children and pushed to the parent.
•Root Splits Increase Height — When splits reach the root, a new root is created. This is the only way tree height increases.
•O(log N) Guarantee — Even with cascading splits, insertion never exceeds logarithmic I/O complexity.

What's Next:

Insertion is only half of the maintenance story. The next page covers deletion with redistribution—the intricate process of removing keys while maintaining minimum occupancy invariants, including key redistribution and node merging.

Page Complete

You now understand the complete mechanics of B+-tree insertion including splitting, propagation, and root growth. This knowledge is essential for understanding write performance, index fragmentation, and the trade-offs inherent in choosing B+-trees for write-heavy workloads.

Insert with Splitting

Growing the Tree from Within

What You Will Learn

Insertion: The Big Picture

B+-tree insertion follows a conceptually simple but mechanically rich process:

Step 2: Insert into the Leaf Add the new key (and its record pointer) to the leaf in sorted order.

Step 3: Handle Overflow (if needed) If the leaf now exceeds its maximum capacity, it must be split into two leaves, with a separator key promoted to the parent.

Step 4: Propagate Upward (if needed) If the parent internal node overflows due to the new separator key, split it too. This may cascade all the way to the root.

Step 5: Handle Root Split (if needed) If the root overflows, create a new root with the old root's two halves as children. This is the only way the tree's height increases.

Converting Mermaid diagram...

Height Only Increases at the Root

Simple Insertion: When There's Room

When the target leaf has fewer than the maximum allowed keys, insertion is straightforward:

Find the correct position within the leaf (where key would maintain sorted order)
Shift existing keys right to make room
Insert the new key and record pointer
Update the key count

No structural changes are needed—the tree's shape remains identical.

Simple Leaf Insertion
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
FUNCTION InsertIntoLeaf(leaf, newKey, recordPointer)
    // Precondition: leaf has space (keyCount < maxKeys)
    
    // Find the position to insert (maintain sorted order)
    insertPos ← 0
    WHILE insertPos < leaf.keyCount AND leaf.keys[insertPos] < newKey DO
        insertPos ← insertPos + 1
    END WHILE
    
    // Shift existing keys and pointers to make room
    FOR i ← leaf.keyCount DOWNTO insertPos + 1 DO
        leaf.keys[i] ← leaf.keys[i - 1]
        leaf.recordPointers[i] ← leaf.recordPointers[i - 1]
    END FOR
    
    // Insert the new key and pointer
    leaf.keys[insertPos] ← newKey
    leaf.recordPointers[insertPos] ← recordPointer
    leaf.keyCount ← leaf.keyCount + 1
    
    // Mark page as dirty for write-back
    MarkDirty(leaf)
END FUNCTION

Example: Insert Key = 24 into Leaf [20, 22, 28]

Leaf has maximum capacity of 4 keys (currently has 3):

Find position: 24 > 22 and 24 < 28, so position = 2
Shift: Move 28 from position 2 to position 3
Insert: Place 24 at position 2
Result: Leaf now contains [20, 22, 24, 28]

No split necessary—the operation completes with a single page write.

Simple Insertion Operation Costs
Operation	I/O Cost	Explanation
Find target leaf	O(height)	Tree descent via search
Insert into leaf	0 (in-memory)	Modify cached page
Write dirty page	1	Persist modified leaf
Total	O(height) + 1	Typically 4-6 I/Os

Leaf Node Splitting

When a leaf node is already full (has the maximum number of keys) and we need to insert another key, the node must split into two nodes. This is the most complex part of B+-tree insertion.

Splitting Process:

Create a new (empty) leaf node
Place the new key in the correct sorted position among existing keys
Divide the keys between the original leaf and the new leaf
Adjust sibling pointers to maintain the linked list
Promote a separator key to the parent internal node

Leaf Node Split Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
FUNCTION SplitLeafNode(leaf, newKey, recordPointer)
    // leaf is full (keyCount = maxKeys)
    // We need to insert newKey and split
    
    // Step 1: Create temporary array with all keys including new one
    allKeys ← new Array[maxKeys + 1]
    allPointers ← new Array[maxKeys + 1]
    
    insertPos ← FindInsertPosition(leaf, newKey)
    
    // Copy existing keys, inserting new key at correct position
    j ← 0
    FOR i ← 0 TO maxKeys DO
        IF i = insertPos THEN
            allKeys[i] ← newKey
            allPointers[i] ← recordPointer
        ELSE
            allKeys[i] ← leaf.keys[j]
            allPointers[i] ← leaf.recordPointers[j]
            j ← j + 1
        END IF
    END FOR
    
    // Step 2: Create new leaf node
    newLeaf ← AllocateNewPage()
    
    // Step 3: Distribute keys between original and new leaf
    // For B+-tree of order n, each leaf should have at least ⌈n/2⌉ keys
    // Split point: keep first ⌈(maxKeys+1)/2⌉ in original, rest in new
    splitPoint ← CEILING((maxKeys + 1) / 2)
    
    // Keys [0, splitPoint-1] stay in original leaf
    leaf.keyCount ← splitPoint
    FOR i ← 0 TO splitPoint - 1 DO
        leaf.keys[i] ← allKeys[i]
        leaf.recordPointers[i] ← allPointers[i]
    END FOR
    
    // Keys [splitPoint, maxKeys] go to new leaf
    newLeaf.keyCount ← (maxKeys + 1) - splitPoint
    FOR i ← splitPoint TO maxKeys DO
        newLeaf.keys[i - splitPoint] ← allKeys[i]
        newLeaf.recordPointers[i - splitPoint] ← allPointers[i]
    END FOR
    
    // Step 4: Update sibling pointers
    newLeaf.nextSibling ← leaf.nextSibling
    newLeaf.prevSibling ← leaf
    IF leaf.nextSibling ≠ NULL THEN
        leaf.nextSibling.prevSibling ← newLeaf
    END IF
    leaf.nextSibling ← newLeaf
    
    // Step 5: Determine separator key (smallest key in new leaf)
    separatorKey ← newLeaf.keys[0]
    
    RETURN (newLeaf, separatorKey)
END FUNCTION

Separator Key Selection

Worked Example: Complete Leaf Split

Let's trace a complete leaf split with a concrete example.

Setup:

B+-tree with order 4 (max 3 keys per leaf, max 4 children per internal node)
Leaf node [20, 22, 28] is full
Insert key = 25

Step 1: Create temporary array with all keys

Original leaf keys: [20, 22, 28]
New key to insert: 25

Sorted combined: [20, 22, 25, 28] (4 keys)

Step 2: Determine split point

maxKeys = 3, so (maxKeys + 1) = 4
splitPoint = ⌈4/2⌉ = 2

Step 3: Distribute keys

Original leaf (positions 0, 1): [20, 22]
New leaf (positions 2, 3): [25, 28]

Step 4: Determine separator key

Separator = smallest key in new leaf = 25

Converting Mermaid diagram...

Leaf Split Step-by-Step Summary
Step	Action	Result
1	Combine original keys + new key	[20, 22, 25, 28]
2	Calculate split point	2 (first 2 keys stay)
3	Create new leaf	Empty new leaf allocated
4	Distribute keys 0-1 to original	Original: [20, 22]
5	Distribute keys 2-3 to new leaf	New: [25, 28]
6	Link new leaf (prev/next)	Original ↔ New ↔ (old sibling)
7	Identify separator key	25 (smallest in new leaf)
8	Insert separator in parent	Parent gets key 25

Sibling Pointer Updates

Internal Node Splitting

When a separator key is inserted into an internal node and that node is already full, the internal node must split too. Internal node splitting differs from leaf splitting in one crucial way:

In leaf splits: The separator key is copied to the parent (it remains in the leaf) In internal node splits: The middle key is pushed up to the parent (it doesn't stay in either child)

This distinction exists because internal node keys are separators (routing guides), not actual data. There's no need to keep a copy in the child.

Internal Node Split Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
FUNCTION SplitInternalNode(node, newKey, newChildPointer)
    // node is a full internal node
    // newKey is the separator to insert
    // newChildPointer is the right child for newKey
    
    // Step 1: Create temporary arrays with all keys and pointers
    // An internal node with m children has m-1 keys
    // Full node: (n-1) keys, n children (where n is the order)
    // After insertion: n keys, n+1 children
    
    allKeys ← new Array[n]        // One more key than max
    allChildren ← new Array[n+1]  // Two more children than max
    
    // Insert new key and child in correct position
    insertPos ← FindInsertPosition(node, newKey)
    
    // Copy keys
    j ← 0
    FOR i ← 0 TO n-1 DO
        IF i = insertPos THEN
            allKeys[i] ← newKey
        ELSE
            allKeys[i] ← node.keys[j]
            j ← j + 1
        END IF
    END FOR
    
    // Copy children
    j ← 0
    FOR i ← 0 TO n DO
        IF i = insertPos + 1 THEN
            allChildren[i] ← newChildPointer
        ELSE
            allChildren[i] ← node.children[j]
            j ← j + 1
        END IF
    END FOR
    
    // Step 2: Determine split point
    // Middle key will be pushed up, not kept in either child
    midPoint ← n / 2    // Integer division
    pushUpKey ← allKeys[midPoint]
    
    // Step 3: Create new internal node
    newNode ← AllocateNewPage()
    
    // Step 4: Distribute keys and children
    // Left node: keys[0..midPoint-1], children[0..midPoint]
    node.keyCount ← midPoint
    FOR i ← 0 TO midPoint - 1 DO
        node.keys[i] ← allKeys[i]
    END FOR
    FOR i ← 0 TO midPoint DO
        node.children[i] ← allChildren[i]
    END FOR
    
    // Right node: keys[midPoint+1..n-1], children[midPoint+1..n]
    newNode.keyCount ← n - midPoint - 1
    FOR i ← 0 TO newNode.keyCount - 1 DO
        newNode.keys[i] ← allKeys[midPoint + 1 + i]
    END FOR
    FOR i ← 0 TO newNode.keyCount DO
        newNode.children[i] ← allChildren[midPoint + 1 + i]
    END FOR
    
    RETURN (newNode, pushUpKey)
END FUNCTION

Critical Difference: Push Up vs Copy Up

Root Splitting: Growing the Tree Upward

When the propagation of splits reaches the root and the root overflows, the tree must grow by one level. This is the only situation where B+-tree height increases.

Root Split Process:

Split the root using the internal node split algorithm
Create a new root node
The new root has exactly one key (the pushed-up separator) and two children (the old root's two halves)

The tree now has one additional level, and all leaves are still at the same depth.

Root Split Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
FUNCTION SplitRoot(root, newKey, newChildPointer)
    // root is full and we need to insert newKey
    
    // Step 1: Split the root as an internal node
    (newRightChild, pushUpKey) ← SplitInternalNode(root, newKey, newChildPointer)
    
    // Step 2: The old root becomes the left child
    leftChild ← root
    
    // Step 3: Create a new root
    newRoot ← AllocateNewPage()
    newRoot.isLeaf ← FALSE
    newRoot.keyCount ← 1
    newRoot.keys[0] ← pushUpKey
    newRoot.children[0] ← leftChild
    newRoot.children[1] ← newRightChild
    
    // Step 4: Update tree's root pointer
    tree.root ← newRoot
    tree.height ← tree.height + 1
    
    RETURN newRoot
END FUNCTION

Converting Mermaid diagram...

Height Increase is Rare

The Complete Insertion Algorithm

Now let's see the complete insertion algorithm that ties all the pieces together:

Complete B+-Tree Insertion
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
FUNCTION BPlusTreeInsert(tree, newKey, recordPointer)
    // ================================================
    // PHASE 1: FIND TARGET LEAF
    // ================================================
    
    // Use search to find the leaf where newKey belongs
    // Also track the path from root to leaf for potential splits
    path ← []    // Stack of (node, childIndex) pairs
    currentNode ← tree.root
    
    WHILE NOT currentNode.isLeaf DO
        childIndex ← FindChildIndex(currentNode, newKey)
        path.push((currentNode, childIndex))
        currentNode ← currentNode.children[childIndex]
    END WHILE
    
    targetLeaf ← currentNode
    
    // ================================================
    // PHASE 2: INSERT INTO LEAF
    // ================================================
    
    IF targetLeaf.keyCount < maxLeafKeys THEN
        // Simple case: leaf has room
        InsertIntoLeaf(targetLeaf, newKey, recordPointer)
        RETURN SUCCESS
    END IF
    
    // Leaf is full - need to split
    (newLeaf, separatorKey) ← SplitLeafNode(targetLeaf, newKey, recordPointer)
    
    // ================================================
    // PHASE 3: PROPAGATE SPLIT UPWARD
    // ================================================
    
    keyToInsert ← separatorKey
    rightChild ← newLeaf
    
    WHILE path is not empty DO
        (parentNode, childIndex) ← path.pop()
        
        IF parentNode.keyCount < maxInternalKeys THEN
            // Parent has room - insert and done
            InsertIntoInternal(parentNode, keyToInsert, rightChild, childIndex)
            RETURN SUCCESS
        END IF
        
        // Parent is full - need to split it too
        (newInternalNode, pushUpKey) ← SplitInternalNode(
            parentNode, keyToInsert, rightChild
        )
        
        // Prepare for next iteration
        keyToInsert ← pushUpKey
        rightChild ← newInternalNode
    END WHILE
    
    // ================================================
    // PHASE 4: ROOT SPLIT (if we propagated to root)
    // ================================================
    
    // If we get here, we split all the way to the root
    SplitRoot(tree.root, keyToInsert, rightChild)
    
    RETURN SUCCESS
END FUNCTION
 
FUNCTION InsertIntoInternal(node, key, rightChild, position)
    // Insert key at position, rightChild becomes children[position+1]
    
    // Shift keys right
    FOR i ← node.keyCount DOWNTO position + 1 DO
        node.keys[i] ← node.keys[i - 1]
    END FOR
    node.keys[position] ← key
    
    // Shift children right
    FOR i ← node.keyCount + 1 DOWNTO position + 2 DO
        node.children[i] ← node.children[i - 1]
    END FOR
    node.children[position + 1] ← rightChild
    
    node.keyCount ← node.keyCount + 1
    MarkDirty(node)
END FUNCTION

Path Tracking for Efficiency

Insertion Complexity Analysis

Let's rigorously analyze the cost of B+-tree insertion:

B+-Tree Insertion Cost Breakdown
Operation	Worst Case	Typical Case	Notes
Find target leaf	O(log N)	O(log N)	Standard tree descent
Insert in leaf	O(n)	O(n)	Shift within node, n = order
Leaf split	O(n)	Rare	Redistribute n keys between nodes
Internal splits	O(h × n)	Very rare	At most h splits in worst case
Root split	O(n)	Extremely rare	Only when root overflows

I/O Complexity:

Best Case (no split): h reads (find leaf) + 1 write (modified leaf) = O(h)
Worst Case (splits to root): h reads + 2h+1 writes (all splits) = O(h)

Since h = O(log N), insertion is O(log N) I/Os in the worst case.

Why Splits Are Rare:

Assuming random insertions into a tree with nodes at 50% minimum occupancy:

Each node can absorb ~n/2 insertions before splitting
Probability of a single insertion causing a split ≈ 1/n
Probability of cascading to level k ≈ (1/n)^k

For n = 100:

Leaf split: 1% of insertions
Split reaching height 2: 0.01% of insertions
Root split: ~0.0001% of insertions

Insertion Performance Strengths

•Guaranteed O(log N) — even worst case is logarithmic
•Splits are localized — only affected nodes are modified
•Most insertions are simple — no split needed
•Amortized cost is lower — splits spread node creation over many inserts
•Predictable latency — no degenerative cases like BSTs

Insertion Overhead Factors

•Write amplification — splits write 3 pages (2 new + 1 parent)
•Concurrency complexity — splits may block concurrent readers
•Page allocation overhead — new pages need allocation and initialization
•Logging overhead — each modification must be logged for recovery
•Sequential insert pathology — always-increasing keys split the rightmost leaf

The Sequential Insert Problem

Summary: Insertion Mastery

Let's consolidate the key concepts of B+-tree insertion with splitting:

Key Takeaways

•Search First, Then Insert — Every insertion begins with a search to find the target leaf. The search path is cached for split propagation.
•Simple Insertion is Common — Most insertions find a leaf with room and complete with a single page modification.
•Leaf Splits Copy the Separator — When a leaf overflows, it splits and the smallest key of the new leaf is COPIED to the parent.
•Internal Splits Push the Separator — When an internal node overflows, the middle key is REMOVED from both children and pushed to the parent.
•Root Splits Increase Height — When splits reach the root, a new root is created. This is the only way tree height increases.
•O(log N) Guarantee — Even with cascading splits, insertion never exceeds logarithmic I/O complexity.

What's Next:

Page Complete