Database Management SystemsB-Trees & B+-Trees

B-Tree Concept

LevelIntermediate

Duration60 mins

TopicB-Trees & B+-Trees

4 / 5

Balanced Property

The Self-Balancing Guarantee

The most remarkable property of B-trees is that they are perfectly balanced—every leaf node sits at exactly the same depth from the root. Unlike binary search trees that can degrade into linked lists, or even AVL and Red-Black trees that allow some height variation, B-trees maintain uniform depth regardless of insertion order.

This isn't magic; it's the result of careful algorithmic design. B-trees achieve balance not through rotations after the fact, but through a proactive balancing strategy built into the insert and delete operations themselves. Understanding this mechanism is key to understanding why B-trees provide such reliable performance.

What You Will Learn

By the end of this page, you will understand what makes B-trees perfectly balanced, how node splits during insertion maintain balance, how merges during deletion preserve balance, and why this design guarantees O(log n) operations in all cases.

What Perfect Balance Means

Let's precisely define what we mean by 'balanced' in the context of B-trees.

Definition: Perfect Balance

A tree is perfectly balanced if and only if all leaf nodes have the same depth from the root.

Perfectly Balanced (B-tree):         NOT Perfectly Balanced (BST):

        [50]                                  [50]
       /    \                                /    \
    [25]    [75]                          [25]    [75]
   /   \    /   \                        /          \
  L    L   L    L                      [10]         [90]
                                       /
                                     [5]

All leaves at depth 2                 Leaves at depths 2, 3, and 4

Contrast with Other Balance Definitions

Tree Type	Balance Definition	Guarantee
Unbalanced BST	None	O(n) worst case
AVL Tree	Height difference ≤ 1	O(1.44 log n) height
Red-Black Tree	Black-depth equal	O(2 log n) height
B-tree	All leaves at same depth	O(log n) exact

Why Perfect Balance Matters

Predictable Performance: Every search traverses exactly h nodes where h is the tree height. No variance, no worst cases.
Optimal Height: Perfect balance achieves the theoretical minimum height for a given fanout.
I/O Consistency: Database query planners can precisely predict disk access counts. This enables accurate query cost estimation.
Fair Resource Usage: All queries pay the same cost. No queries are penalized by unlucky data distribution.

The Core Insight

B-trees maintain perfect balance through a simple principle:

The tree only grows (or shrinks) at the root.

When the root splits, all leaves move one level deeper simultaneously. When children of the root merge completely, all leaves move one level shallower. No operation ever changes the depth of individual leaves.

The Elegant Invariant

Perfect balance is maintained not by complex rebalancing procedures, but by structural rules during insert and delete. Splits propagate up to the root; merges propagate up to the root. Height changes happen globally, never locally.

How Balance Is Maintained

B-tree balance emerges from two complementary mechanisms that operate during modifications:

Mechanism 1: Node Splitting (During Insert)

When a node becomes too full (exceeds m-1 keys):

Split the node into two nodes of approximately equal size
Promote the median key to the parent
If the parent overflows, recursively split it
If the root splits, create a new root with the promoted key

Mechanism 2: Node Merging/Redistribution (During Delete)

When a node becomes too empty (fewer than ⌈m/2⌉ - 1 keys):

Try to borrow a key from an adjacent sibling (redistribution)
If siblings can't spare keys, merge with a sibling
Pull down the separating key from the parent
If the parent underflows, recursively merge/redistribute
If the root becomes empty, the single child becomes the new root

balance-maintenance.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Simplified balance maintenance pseudocode
 
function insert(key):
    leaf = findLeaf(key)
    insertIntoLeaf(leaf, key)
    
    current = leaf
    while current.isFull():         // More than m-1 keys
        (left, median, right) = splitNode(current)
        parent = current.parent
        
        if parent == null:           // current is root
            newRoot = createNode()
            newRoot.addKey(median)
            newRoot.addChild(left)
            newRoot.addChild(right)
            tree.root = newRoot
            return                   // Height increased by 1!
        else:
            insertIntoInternal(parent, median, right)
            current = parent         // Check parent next
 
function delete(key):
    node = findNode(key)
    deleteFromNode(node, key)
    
    current = node
    while current.isUnderflow() and current != root:
        if canRedistribute(current):
            redistribute(current)    // Borrow from sibling
        else:
            merge(current)           // Merge with sibling
        current = current.parent     // Check parent next
    
    if root.numKeys == 0 and root has one child:
        tree.root = root.child       // Height decreased by 1!

Key Observation

Notice that height changes ONLY happen at the root—either a new root is created (height +1) or the root is removed (height -1). Operations at other levels move keys around but never change leaf depths. This is why all leaves stay at the same level.

The Node Split Operation

Node splitting is the fundamental operation that enables B-tree growth while maintaining balance. Let's examine it in detail.

When Does a Split Occur?

A split is triggered when inserting a key would cause a node to exceed its maximum capacity (m-1 keys for order m).

The Split Process

Given a full node with m-1 keys that needs to accommodate one more:

split-example.txt
Order: 5 (max 4 keys per node)
Original node: [10 | 20 | 30 | 40]  (full - 4 keys)
Inserting: 25
 
Step 1: Temporarily insert, creating overflow
         [10 | 20 | 25 | 30 | 40]  (overflow - 5 keys!)
 
Step 2: Find median (middle key)
         Median = 25 (index 2 of 0-4)
 
Step 3: Split into two nodes
         Left:  [10 | 20]      (2 keys)
         Right: [30 | 40]      (2 keys)
         Median (25) promoted to parent
 
Step 4: Update parent
         Before: Parent [ ... | X | ... ]
                       / \  ↓  \ \
                      ...  [full node] ...
 
         After:  Parent [ ... | X | 25 | ... ]
                       / \  ↓  ↓  ↓  \ \
                      ...  [10|20] [30|40] ...
 
Step 5: If parent overflows, recursively split parent

Visualizing the Split

Converting Mermaid diagram...

Split Properties

Property	Value	Reason
Keys in left after split	⌊(m-1)/2⌋	Even distribution
Keys in right after split	⌈(m-1)/2⌉	Even distribution
Key promoted to parent	Median	Maintains ordering
Operations per split	O(m)	Copy half the keys
Disk writes	3	Left node, right node, parent

Root Split: Tree Growth

When the root itself splits, we create a new root:

Before (root is full):
   [10 | 20 | 30 | 40]     <- root, height = 0

After inserting 25:
        [25]                <- new root, height = 1
       /    \
   [10|20]  [30|40]        <- old root split into two

This is the only way tree height increases. Since both children came from the same node at the same level, all paths from new root to leaves are exactly one level longer. Perfect balance is preserved.

Cascading Splits

In the worst case, a split can cascade all the way to the root. Inserting one key might cause splits at every level: leaf splits → parent splits → grandparent splits → ... → root splits. This is rare (requires a perfectly full path from root to leaf) but handled correctly by the algorithm.

The Node Merge Operation

Node merging is the inverse of splitting—it consolidates underfull nodes to maintain the minimum occupancy invariant.

When Does a Merge Occur?

A merge is needed when:

Deleting a key causes a node to have fewer than ⌈m/2⌉ - 1 keys
Redistribution is not possible (sibling also at minimum)

The Merge Process

merge-example.txt
Order: 5 (minimum 2 keys for non-root nodes)
 
Before deletion:
        [30 | 60]
       /    |    \
   [10|20] [40]  [70|80]
 
Delete 40:
        [30 | 60]             <- parent
       /    |    \
   [10|20] [ ]   [70|80]      <- middle node now empty (underflow!)
 
Check redistribution: neighbors have exactly 2 keys, can't spare
 
Merge with right sibling:
1. Combine middle node (empty) + separator (60) + right sibling [70|80]
2. Pull down separator 60 from parent
3. Merged node: [60|70|80]
4. Parent loses key 60: [30]
 
Result:
        [30]
       /    \
   [10|20]  [60|70|80]
 
Parent might now underflow... if so, continue merging up!

Redistribution Before Merge

Before merging, B-trees attempt redistribution (also called rotation or borrowing):

Before (after deletion, left child underflows):
        [30 | 60]
       /    |    \
    [10]  [40|50]  [70|80]   <- left has 1 key (underflow), middle has 2

Redistribute from middle sibling:
1. Move 30 (separator) down to left child
2. Move 40 (first key of middle) up to parent

Result:
        [40 | 60]
       /    |    \
   [10|30] [50]  [70|80]     <- left now has 2 keys ✓

Redistribution is preferred over merging because:

It doesn't reduce node count
It's a local operation (no cascading)
Both nodes remain at minimum capacity or above

When Redistribution Fails

Redistribution requires a sibling with more than minimum keys. If all siblings are at exactly minimum capacity, there's no key to borrow. In this case, we must merge: combine the underflowing node with a minimum-capacity sibling, plus the separator from the parent. The merged node will have at most m-1 keys (valid!).

Root Shrinking: Tree Height Decrease

If merges propagate up and the root is left with zero keys and one child:

Before (after merge reduced root to one child):
         [ ]           <- root with 0 keys, 1 child (invalid!)
          |
      [30 | 60]

After (remove empty root):
      [30 | 60]        <- this becomes the new root

This is the only way tree height decreases. The old root is discarded, and its single child becomes the new root. Since all leaves were at depth h, they're now at depth h-1. Perfect balance is preserved.

Mathematical Proof of Balance

Let's formally prove that B-trees maintain perfect balance.

Theorem: In a B-tree, all leaves have the same depth.

Proof by Induction on Operations:

Base Case: An empty B-tree has height 0 (root is a leaf). A B-tree with one key has the single-node root as its only leaf. Trivially, all leaves (there's only one) are at the same depth.

Induction Hypothesis: Assume that before operation k, all leaves are at depth h.

Inductive Step: We prove the hypothesis holds after operation k.

Case 1: Insertion without root split

After insertion, the key is placed in an existing leaf. If splits occur, they happen at internal nodes. New nodes created by splitting are siblings of existing nodes (same depth). No leaf changes depth.

Result: All leaves still at depth h. ✓

Case 2: Insertion with root split

When the root splits:

A new root is created at depth 0
The old root's two halves become depth 1
All other nodes shift down one level
All leaves (previously at depth h) are now at depth h+1

Result: All leaves at depth h+1. ✓

Case 3: Deletion without root shrink

Merges and redistributions happen at internal nodes. Merged nodes are at the same depth as their siblings. No leaf changes depth.

Result: All leaves still at depth h. ✓

Case 4: Deletion with root shrink

When the root shrinks (old root removed):

Root's single child becomes new root
All nodes shift up one level
All leaves (previously at depth h) are now at depth h-1

Result: All leaves at depth h-1. ✓

Conclusion: By induction, all leaves always have the same depth. ∎

The Key Insight

Height changes are GLOBAL operations that affect all leaves equally. Non-root operations (splits, merges, redistributions) are HORIZONTAL—they reorganize nodes at the same level but never change any node's depth. This structural invariant makes balance maintenance trivial.

Height Bound Analysis

Perfect balance combined with minimum occupancy gives us a precise height bound.

Theorem: For a B-tree of order m with n keys, the height h satisfies:

$$h \leq \log_{\lceil m/2 \rceil} \left( \frac{n+1}{2} \right)$$

Proof:

At each level, nodes have at least ⌈m/2⌉ children (except root, which has at least 2).

Level	Minimum Nodes
0 (root)	1
1	2
2	2 × ⌈m/2⌉
3	2 × ⌈m/2⌉²
...	...
h	2 × ⌈m/2⌉^(h-1)

The number of leaves is at least 2 × ⌈m/2⌉^(h-1).

Each leaf has at least ⌈m/2⌉ - 1 keys, and there are n+1 leaf pointers for n keys.

Thus:

n + 1 ≥ 2 × ⌈m/2⌉^(h-1)

Solving for h:
h - 1 ≤ log_{⌈m/2⌉}((n+1)/2)
h ≤ 1 + log_{⌈m/2⌉}((n+1)/2)

This proves the logarithmic height bound. ∎

Height for Various Tree Sizes (Order 100)
Number of Keys (n)	Maximum Height	Maximum Disk Accesses
100	2	2
10,000	3	3
1,000,000	4	4
100,000,000	5	5
10,000,000,000	6	6

Practical Implications

For a typical database B-tree with order 100-500:

Million-record tables: 3-4 disk accesses per search
Billion-record tables: 4-5 disk accesses per search
Trillion-record tables: 5-6 disk accesses per search

The logarithmic growth is so slow that even massive tables require only a handful of I/Os. This is why B-trees scale so well.

Comparison with Binary Trees

Binary Tree (height log₂ n)

•1 million records: 20 levels
•1 billion records: 30 levels
•Each level = 1 disk access
•30 disk accesses = 300+ ms

B-tree (height log₁₀₀ n)

•1 million records: 3 levels
•1 billion records: 5 levels
•Each level = 1 disk access
•5 disk accesses = 50 ms

Balance vs Performance Trade-offs

While perfect balance is desirable, maintaining it has costs. Let's examine the trade-offs.

The Cost of Maintaining Balance

Operation	Cost Without Balance Maintenance	Cost With Balance Maintenance
Insert	O(h) traversal + O(1) insert	O(h) + potential O(h) splits
Delete	O(h) traversal + O(1) delete	O(h) + potential O(h) merges
Search	O(h) but h could be O(n)	O(log n) guaranteed

Balance maintenance adds overhead to modifications but provides guaranteed search performance.

When Splits Cascade

Theoretically, an insert could cause splits at every level:

Worst case: Insert into a fully-full path

Before: Every node from root to leaf has m-1 keys

[full root]
     ↓
[full internal]
     ↓
[full internal]
     ↓
[full leaf] ← insert here

Result: Split cascades all the way up
- Split leaf
- Split parent
- Split grandparent
- Split root → new root created

Cost: O(h) splits = O(log n) work

This sounds expensive, but actually it's rare and amortized constant per insertion.

Amortized Analysis

Using potential function analysis, we can prove:

Theorem: The amortized cost of insertion (including splits) is O(1) per key.

Intuition:

Each split produces two half-full nodes from one full node
A half-full node needs ⌈m/2⌉ - 1 more insertions before it can split again
Each insertion adds 'credit' that pays for future splits
Credits accumulated over many insertions fund the occasional cascading split

Practical Implications

In production databases:

Most insertions don't cause splits (nodes aren't full)
Most splits are single-level (only the leaf splits)
Cascading splits to root are extremely rare
Average overhead per insert: <1.5 disk writes (excluding WAL)

Proactive vs Reactive Splitting

Some implementations use 'proactive splitting' during traversal: if a node is full when passing through it, split it preemptively. This guarantees at most one split per insert (no cascading) but may split nodes unnecessarily. The trade-off is between absolute worst case and average case efficiency.

Balance Under Concurrency

In multi-user database systems, multiple transactions modify the B-tree simultaneously. Maintaining balance while allowing concurrency is a significant challenge.

The Concurrency Challenge

Consider what happens if two transactions try to split the same node:

Time 1: Transaction A reads full node [10|20|30|40]
Time 2: Transaction B reads the same full node [10|20|30|40]
Time 3: Transaction A splits → [10|20] and [30|40], promotes 25
Time 4: Transaction B splits → [10|20] and [30|40], promotes 25

Result: Duplicate nodes! Corrupted tree structure.

Solutions for Concurrent Balance Maintenance

Concurrency Control Strategies

•Lock Coupling (Crabbing): Hold lock on parent while locking child. Release parent only if child is 'safe' (won't split/merge). Ensures exclusive access during structure modifications.
•B-link Trees: Add right-sibling pointers and 'high keys'. Allows searches to proceed even during splits by following sibling links. Lehman and Yao's classic algorithm.
•Optimistic Locking: Read without locks, validate before commit. Retry on conflict. Good for read-heavy workloads.
•MVCC + Structure Modification Lock: Readers use snapshot isolation. Writers acquire structure-modification lock for splits/merges. PostgreSQL's approach.
•Right-to-Left Key Range Locking: Lock key ranges rather than nodes. Prevents phantom inserts and range conflicts.

B-link Tree: High Concurrency Variation

The B-link tree (Lehman-Yao, 1981) modifies B-trees for better concurrency:

Right-sibling links: Every node has a pointer to its right sibling
High key: Each node stores the maximum key in its subtree
Split protocol: Complete right node first, then update parent

During a split in B-link tree:

Original: P → [A | B | C | D]

Step 1: Create right sibling with half the keys
         P → [A | B] → [C | D]
         (right sibling linked, parent not yet updated)

Step 2: Update parent to recognize split
         P → [A | B]   P → [C | D]
         (parent now points to both)

Between Step 1 and Step 2:
- A search for 'C' would miss if looking at left node
- BUT: the right-link allows it to find 'C' in the sibling
- Search continues via sibling link until key is found or exceeded

This allows splits to be non-atomic without corrupting searches. Balance is maintained; concurrent access is safe.

Modern Systems

Most production databases (PostgreSQL, MySQL, Oracle) use variants of B-link trees. The Lehman-Yao algorithm is elegantly simple yet provides high concurrency. Understanding it is essential for anyone working on database internals.

Summary: Balanced Property

We've explored the balanced property that makes B-trees exceptional. Let's consolidate:

Key Takeaways

•Perfect balance means all leaves at same depth — No BST-like worst cases, no variance in search paths.
•Height changes globally, never locally — Trees grow/shrink only at the root; all leaves move together.
•Splits maintain balance during insertion — Overflow triggers split; median moves up; if root splits, height increases uniformly.
•Merges/redistributions maintain balance during deletion — Underflow triggers borrowing or merging; empty root removal decreases height uniformly.
•Height is O(log_{m/2} n) — For typical database fanouts, even trillion-record tables need only 5-6 levels.
•Balance maintenance cost is amortized O(1) — Cascading splits are rare; average cost per operation is constant.
•B-link trees enable concurrent balance maintenance — Right-sibling links allow non-blocking access during structural modifications.

What's Next

With the balanced property understood, we proceed to height analysis—deriving precise formulas for tree height, understanding how fanout affects height, and analyzing the relationship between B-tree parameters and performance.

Page Complete

You now understand how B-trees maintain perfect balance through splits and merges, why height changes are global rather than local, and how this guarantees O(log n) performance. You can explain the balance invariant and prove it holds after any sequence of operations. Next, we'll dive deeper into height analysis.

4 / 5

Loading learning content...

Database Management SystemsB-Trees & B+-Trees

B-Tree Concept

LevelIntermediate

Duration60 mins

TopicB-Trees & B+-Trees

4 / 5

Balanced Property

The Self-Balancing Guarantee

What You Will Learn

What Perfect Balance Means

Let's precisely define what we mean by 'balanced' in the context of B-trees.

Definition: Perfect Balance

A tree is perfectly balanced if and only if all leaf nodes have the same depth from the root.

Perfectly Balanced (B-tree):         NOT Perfectly Balanced (BST):

        [50]                                  [50]
       /    \                                /    \
    [25]    [75]                          [25]    [75]
   /   \    /   \                        /          \
  L    L   L    L                      [10]         [90]
                                       /
                                     [5]

All leaves at depth 2                 Leaves at depths 2, 3, and 4

Contrast with Other Balance Definitions

Tree Type	Balance Definition	Guarantee
Unbalanced BST	None	O(n) worst case
AVL Tree	Height difference ≤ 1	O(1.44 log n) height
Red-Black Tree	Black-depth equal	O(2 log n) height
B-tree	All leaves at same depth	O(log n) exact

Why Perfect Balance Matters

Predictable Performance: Every search traverses exactly h nodes where h is the tree height. No variance, no worst cases.
Optimal Height: Perfect balance achieves the theoretical minimum height for a given fanout.
I/O Consistency: Database query planners can precisely predict disk access counts. This enables accurate query cost estimation.
Fair Resource Usage: All queries pay the same cost. No queries are penalized by unlucky data distribution.

The Core Insight

B-trees maintain perfect balance through a simple principle:

The tree only grows (or shrinks) at the root.

The Elegant Invariant

How Balance Is Maintained

B-tree balance emerges from two complementary mechanisms that operate during modifications:

Mechanism 1: Node Splitting (During Insert)

When a node becomes too full (exceeds m-1 keys):

Split the node into two nodes of approximately equal size
Promote the median key to the parent
If the parent overflows, recursively split it
If the root splits, create a new root with the promoted key

Mechanism 2: Node Merging/Redistribution (During Delete)

When a node becomes too empty (fewer than ⌈m/2⌉ - 1 keys):

Try to borrow a key from an adjacent sibling (redistribution)
If siblings can't spare keys, merge with a sibling
Pull down the separating key from the parent
If the parent underflows, recursively merge/redistribute
If the root becomes empty, the single child becomes the new root

balance-maintenance.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Simplified balance maintenance pseudocode
 
function insert(key):
    leaf = findLeaf(key)
    insertIntoLeaf(leaf, key)
    
    current = leaf
    while current.isFull():         // More than m-1 keys
        (left, median, right) = splitNode(current)
        parent = current.parent
        
        if parent == null:           // current is root
            newRoot = createNode()
            newRoot.addKey(median)
            newRoot.addChild(left)
            newRoot.addChild(right)
            tree.root = newRoot
            return                   // Height increased by 1!
        else:
            insertIntoInternal(parent, median, right)
            current = parent         // Check parent next
 
function delete(key):
    node = findNode(key)
    deleteFromNode(node, key)
    
    current = node
    while current.isUnderflow() and current != root:
        if canRedistribute(current):
            redistribute(current)    // Borrow from sibling
        else:
            merge(current)           // Merge with sibling
        current = current.parent     // Check parent next
    
    if root.numKeys == 0 and root has one child:
        tree.root = root.child       // Height decreased by 1!

Key Observation

The Node Split Operation

Node splitting is the fundamental operation that enables B-tree growth while maintaining balance. Let's examine it in detail.

When Does a Split Occur?

A split is triggered when inserting a key would cause a node to exceed its maximum capacity (m-1 keys for order m).

The Split Process

Given a full node with m-1 keys that needs to accommodate one more:

split-example.txt
Order: 5 (max 4 keys per node)
Original node: [10 | 20 | 30 | 40]  (full - 4 keys)
Inserting: 25
 
Step 1: Temporarily insert, creating overflow
         [10 | 20 | 25 | 30 | 40]  (overflow - 5 keys!)
 
Step 2: Find median (middle key)
         Median = 25 (index 2 of 0-4)
 
Step 3: Split into two nodes
         Left:  [10 | 20]      (2 keys)
         Right: [30 | 40]      (2 keys)
         Median (25) promoted to parent
 
Step 4: Update parent
         Before: Parent [ ... | X | ... ]
                       / \  ↓  \ \
                      ...  [full node] ...
 
         After:  Parent [ ... | X | 25 | ... ]
                       / \  ↓  ↓  ↓  \ \
                      ...  [10|20] [30|40] ...
 
Step 5: If parent overflows, recursively split parent

Visualizing the Split

Converting Mermaid diagram...

Split Properties

Property	Value	Reason
Keys in left after split	⌊(m-1)/2⌋	Even distribution
Keys in right after split	⌈(m-1)/2⌉	Even distribution
Key promoted to parent	Median	Maintains ordering
Operations per split	O(m)	Copy half the keys
Disk writes	3	Left node, right node, parent

Root Split: Tree Growth

When the root itself splits, we create a new root:

Before (root is full):
   [10 | 20 | 30 | 40]     <- root, height = 0

After inserting 25:
        [25]                <- new root, height = 1
       /    \
   [10|20]  [30|40]        <- old root split into two

Cascading Splits

The Node Merge Operation

Node merging is the inverse of splitting—it consolidates underfull nodes to maintain the minimum occupancy invariant.

When Does a Merge Occur?

A merge is needed when:

Deleting a key causes a node to have fewer than ⌈m/2⌉ - 1 keys
Redistribution is not possible (sibling also at minimum)

The Merge Process

merge-example.txt
Order: 5 (minimum 2 keys for non-root nodes)
 
Before deletion:
        [30 | 60]
       /    |    \
   [10|20] [40]  [70|80]
 
Delete 40:
        [30 | 60]             <- parent
       /    |    \
   [10|20] [ ]   [70|80]      <- middle node now empty (underflow!)
 
Check redistribution: neighbors have exactly 2 keys, can't spare
 
Merge with right sibling:
1. Combine middle node (empty) + separator (60) + right sibling [70|80]
2. Pull down separator 60 from parent
3. Merged node: [60|70|80]
4. Parent loses key 60: [30]
 
Result:
        [30]
       /    \
   [10|20]  [60|70|80]
 
Parent might now underflow... if so, continue merging up!

Redistribution Before Merge

Before merging, B-trees attempt redistribution (also called rotation or borrowing):

Before (after deletion, left child underflows):
        [30 | 60]
       /    |    \
    [10]  [40|50]  [70|80]   <- left has 1 key (underflow), middle has 2

Redistribute from middle sibling:
1. Move 30 (separator) down to left child
2. Move 40 (first key of middle) up to parent

Result:
        [40 | 60]
       /    |    \
   [10|30] [50]  [70|80]     <- left now has 2 keys ✓

Redistribution is preferred over merging because:

It doesn't reduce node count
It's a local operation (no cascading)
Both nodes remain at minimum capacity or above

When Redistribution Fails

Root Shrinking: Tree Height Decrease

If merges propagate up and the root is left with zero keys and one child:

Before (after merge reduced root to one child):
         [ ]           <- root with 0 keys, 1 child (invalid!)
          |
      [30 | 60]

After (remove empty root):
      [30 | 60]        <- this becomes the new root

Mathematical Proof of Balance

Let's formally prove that B-trees maintain perfect balance.

Theorem: In a B-tree, all leaves have the same depth.

Proof by Induction on Operations:

Base Case: An empty B-tree has height 0 (root is a leaf). A B-tree with one key has the single-node root as its only leaf. Trivially, all leaves (there's only one) are at the same depth.

Induction Hypothesis: Assume that before operation k, all leaves are at depth h.

Inductive Step: We prove the hypothesis holds after operation k.

Case 1: Insertion without root split

Result: All leaves still at depth h. ✓

Case 2: Insertion with root split

When the root splits:

A new root is created at depth 0
The old root's two halves become depth 1
All other nodes shift down one level
All leaves (previously at depth h) are now at depth h+1

Result: All leaves at depth h+1. ✓

Case 3: Deletion without root shrink

Merges and redistributions happen at internal nodes. Merged nodes are at the same depth as their siblings. No leaf changes depth.

Result: All leaves still at depth h. ✓

Case 4: Deletion with root shrink

When the root shrinks (old root removed):

Root's single child becomes new root
All nodes shift up one level
All leaves (previously at depth h) are now at depth h-1

Result: All leaves at depth h-1. ✓

Conclusion: By induction, all leaves always have the same depth. ∎

The Key Insight

Height Bound Analysis

Perfect balance combined with minimum occupancy gives us a precise height bound.

Theorem: For a B-tree of order m with n keys, the height h satisfies:

$$h \leq \log_{\lceil m/2 \rceil} \left( \frac{n+1}{2} \right)$$

Proof:

At each level, nodes have at least ⌈m/2⌉ children (except root, which has at least 2).

Level	Minimum Nodes
0 (root)	1
1	2
2	2 × ⌈m/2⌉
3	2 × ⌈m/2⌉²
...	...
h	2 × ⌈m/2⌉^(h-1)

The number of leaves is at least 2 × ⌈m/2⌉^(h-1).

Each leaf has at least ⌈m/2⌉ - 1 keys, and there are n+1 leaf pointers for n keys.

Thus:

n + 1 ≥ 2 × ⌈m/2⌉^(h-1)

Solving for h:
h - 1 ≤ log_{⌈m/2⌉}((n+1)/2)
h ≤ 1 + log_{⌈m/2⌉}((n+1)/2)

This proves the logarithmic height bound. ∎

Height for Various Tree Sizes (Order 100)
Number of Keys (n)	Maximum Height	Maximum Disk Accesses
100	2	2
10,000	3	3
1,000,000	4	4
100,000,000	5	5
10,000,000,000	6	6

Practical Implications

For a typical database B-tree with order 100-500:

Million-record tables: 3-4 disk accesses per search
Billion-record tables: 4-5 disk accesses per search
Trillion-record tables: 5-6 disk accesses per search

The logarithmic growth is so slow that even massive tables require only a handful of I/Os. This is why B-trees scale so well.

Comparison with Binary Trees

Binary Tree (height log₂ n)

•1 million records: 20 levels
•1 billion records: 30 levels
•Each level = 1 disk access
•30 disk accesses = 300+ ms

B-tree (height log₁₀₀ n)

•1 million records: 3 levels
•1 billion records: 5 levels
•Each level = 1 disk access
•5 disk accesses = 50 ms

Balance vs Performance Trade-offs

While perfect balance is desirable, maintaining it has costs. Let's examine the trade-offs.

The Cost of Maintaining Balance

Operation	Cost Without Balance Maintenance	Cost With Balance Maintenance
Insert	O(h) traversal + O(1) insert	O(h) + potential O(h) splits
Delete	O(h) traversal + O(1) delete	O(h) + potential O(h) merges
Search	O(h) but h could be O(n)	O(log n) guaranteed

Balance maintenance adds overhead to modifications but provides guaranteed search performance.

When Splits Cascade

Theoretically, an insert could cause splits at every level:

Worst case: Insert into a fully-full path

Before: Every node from root to leaf has m-1 keys

[full root]
     ↓
[full internal]
     ↓
[full internal]
     ↓
[full leaf] ← insert here

Result: Split cascades all the way up
- Split leaf
- Split parent
- Split grandparent
- Split root → new root created

Cost: O(h) splits = O(log n) work

This sounds expensive, but actually it's rare and amortized constant per insertion.

Amortized Analysis

Using potential function analysis, we can prove:

Theorem: The amortized cost of insertion (including splits) is O(1) per key.

Intuition:

Each split produces two half-full nodes from one full node
A half-full node needs ⌈m/2⌉ - 1 more insertions before it can split again
Each insertion adds 'credit' that pays for future splits
Credits accumulated over many insertions fund the occasional cascading split

Practical Implications

In production databases:

Most insertions don't cause splits (nodes aren't full)
Most splits are single-level (only the leaf splits)
Cascading splits to root are extremely rare
Average overhead per insert: <1.5 disk writes (excluding WAL)

Proactive vs Reactive Splitting

Balance Under Concurrency

In multi-user database systems, multiple transactions modify the B-tree simultaneously. Maintaining balance while allowing concurrency is a significant challenge.

The Concurrency Challenge

Consider what happens if two transactions try to split the same node:

Time 1: Transaction A reads full node [10|20|30|40]
Time 2: Transaction B reads the same full node [10|20|30|40]
Time 3: Transaction A splits → [10|20] and [30|40], promotes 25
Time 4: Transaction B splits → [10|20] and [30|40], promotes 25

Result: Duplicate nodes! Corrupted tree structure.

Solutions for Concurrent Balance Maintenance

Concurrency Control Strategies

•Lock Coupling (Crabbing): Hold lock on parent while locking child. Release parent only if child is 'safe' (won't split/merge). Ensures exclusive access during structure modifications.
•B-link Trees: Add right-sibling pointers and 'high keys'. Allows searches to proceed even during splits by following sibling links. Lehman and Yao's classic algorithm.
•Optimistic Locking: Read without locks, validate before commit. Retry on conflict. Good for read-heavy workloads.
•MVCC + Structure Modification Lock: Readers use snapshot isolation. Writers acquire structure-modification lock for splits/merges. PostgreSQL's approach.
•Right-to-Left Key Range Locking: Lock key ranges rather than nodes. Prevents phantom inserts and range conflicts.

B-link Tree: High Concurrency Variation

The B-link tree (Lehman-Yao, 1981) modifies B-trees for better concurrency:

Right-sibling links: Every node has a pointer to its right sibling
High key: Each node stores the maximum key in its subtree
Split protocol: Complete right node first, then update parent

During a split in B-link tree:

Original: P → [A | B | C | D]

Step 1: Create right sibling with half the keys
         P → [A | B] → [C | D]
         (right sibling linked, parent not yet updated)

Step 2: Update parent to recognize split
         P → [A | B]   P → [C | D]
         (parent now points to both)

Between Step 1 and Step 2:
- A search for 'C' would miss if looking at left node
- BUT: the right-link allows it to find 'C' in the sibling
- Search continues via sibling link until key is found or exceeded

This allows splits to be non-atomic without corrupting searches. Balance is maintained; concurrent access is safe.

Modern Systems

Summary: Balanced Property

We've explored the balanced property that makes B-trees exceptional. Let's consolidate:

Key Takeaways

•Perfect balance means all leaves at same depth — No BST-like worst cases, no variance in search paths.
•Height changes globally, never locally — Trees grow/shrink only at the root; all leaves move together.
•Splits maintain balance during insertion — Overflow triggers split; median moves up; if root splits, height increases uniformly.
•Merges/redistributions maintain balance during deletion — Underflow triggers borrowing or merging; empty root removal decreases height uniformly.
•Height is O(log_{m/2} n) — For typical database fanouts, even trillion-record tables need only 5-6 levels.
•Balance maintenance cost is amortized O(1) — Cascading splits are rare; average cost per operation is constant.
•B-link trees enable concurrent balance maintenance — Right-sibling links allow non-blocking access during structural modifications.

What's Next

Page Complete

4 / 5