B Plus Tree Variants - Learning Module

Loading content...

0/241

B*-Tree: The Space-Optimized Variant

Beyond Standard B+-Trees

The B+-tree is a remarkable data structure—balanced, efficient, and disk-optimized. Yet like any engineering solution, it represents a set of tradeoffs rather than perfection. One such tradeoff is space utilization: standard B+-trees guarantee only 50% minimum node occupancy after splits, meaning up to half of your carefully allocated disk pages might sit empty.

Enter the B-tree* (pronounced "B-star tree"), an elegant variant that addresses this inefficiency. By modifying the splitting strategy and redistributing keys between siblings before resorting to splits, B*-trees achieve significantly higher space utilization—guaranteeing at least two-thirds (66.7%) occupancy rather than just one-half.

This seemingly modest improvement has profound implications for database systems, affecting tree height, cache efficiency, I/O costs, and overall system performance. Understanding B*-trees illuminates the broader principle that small algorithmic refinements can yield substantial real-world benefits.

What You Will Learn

By the end of this page, you will understand the B*-tree's design philosophy, its redistribution-before-splitting mechanism, the mathematical foundations of its space guarantees, and when this variant provides advantages over standard B+-trees. You'll gain insight into how production databases make similar optimization decisions.

The Space Utilization Problem in B+-Trees

To appreciate the B*-tree's contribution, we must first understand the problem it solves. In a standard B+-tree with order m, each internal node contains between ⌈m/2⌉ and m children, and each leaf contains between ⌈(m-1)/2⌉ and m-1 keys.

This means that in the worst case—immediately after a series of splits—nodes may be only half full. Consider the implications:

Impact of 50% Minimum Occupancy
Metric	Optimal (100% full)	Worst Case (50% full)	Degradation
Keys per node (m=200)	199 keys	100 keys	~2x more nodes needed
Tree height for 10M keys	3 levels	4 levels	33% more I/O per query
Disk space usage	100% efficient	~50% efficient	2x storage overhead
Cache utilization	Maximum density	Half entries wasted	Reduced hit rate

Why does this happen?

When a node overflows (exceeds m keys), the standard B+-tree splitting algorithm:

Allocates a new sibling node
Distributes keys evenly between the old and new node
Promotes a separator key to the parent

This even distribution means both nodes end up approximately half full. While elegant in its simplicity, this approach:

Allocates disk pages that are only 50% utilized
Increases tree height unnecessarily
Creates more nodes to traverse and cache

The fundamental insight of the B*-tree is that splitting should be a last resort, not a first response to overflow.

The Hidden Cost of Wasted Space

In enterprise databases storing billions of records, the difference between 50% and 67% space utilization translates to gigabytes or terabytes of additional storage, hundreds of additional I/O operations per second, and measurably slower query latencies. What seems like a small percentage difference becomes a major operational cost.

B*-Tree Definition and Fundamental Properties

The B-tree* is a height-balanced tree that modifies the standard B+-tree in two critical ways:

Higher minimum occupancy requirement: Each node (except the root) must be at least 2/3 full instead of 1/2 full.
Redistribution before splitting: When a node overflows, the algorithm first attempts to redistribute keys to a sibling before creating a new node.

Formally, for a B*-tree of order m:

B*-Tree Properties

•Root Node: May have between 2 and 2⌊(2m-2)/3⌋+1 children (special rules apply)
•Internal Nodes: Must have between ⌈(2m-1)/3⌉ and m children
•Leaf Nodes: Must contain between ⌈(2m-1)/3⌉-1 and m-1 keys
•All leaves at same depth: The tree remains perfectly height-balanced
•Two-to-three split: When redistribution fails, two nodes split into three

Calculating the bounds:

For a B*-tree of order m = 100:

Standard B+-tree minimum: ⌈100/2⌉ = 50 children (50% occupancy)
B*-tree minimum: ⌈(2×100-1)/3⌉ = ⌈199/3⌉ = 67 children (67% occupancy)

This 17-percentage-point improvement in minimum occupancy compounds across the entire tree, significantly reducing the number of nodes and tree height.

The Two-Thirds Guarantee

The 2/3 occupancy guarantee isn't arbitrary—it's the mathematical consequence of the two-to-three splitting strategy. When two full nodes split into three, each new node is at least 2/3 full. This property is maintained through careful redistribution during insertions and deletions.

Redistribution Before Splitting

The core innovation of the B*-tree is its redistribution-first approach to handling node overflow. Rather than immediately splitting an overflowing node, the algorithm first checks if keys can be redistributed to a sibling.

The Redistribution Algorithm:

b_star_insert.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
function insert(node, key, value):
    if node is leaf:
        insert key-value into node in sorted order
        if node.size > MAX_KEYS:
            handle_overflow(node)
    else:
        child = find_child(node, key)
        insert(child, key, value)
 
function handle_overflow(node):
    # Step 1: Try redistribution with left sibling
    left_sibling = get_left_sibling(node)
    if left_sibling exists AND left_sibling.size < MAX_KEYS:
        redistribute_left(node, left_sibling)
        return  # No split needed!
    
    # Step 2: Try redistribution with right sibling
    right_sibling = get_right_sibling(node)
    if right_sibling exists AND right_sibling.size < MAX_KEYS:
        redistribute_right(node, right_sibling)
        return  # No split needed!
    
    # Step 3: If redistribution impossible, perform two-to-three split
    if left_sibling exists:
        two_to_three_split(left_sibling, node)
    else if right_sibling exists:
        two_to_three_split(node, right_sibling)
    else:
        # Root case - standard split
        standard_split(node)

Redistribution in Action:

Consider two adjacent leaf nodes with a maximum capacity of 4 keys:

BEFORE REDISTRIBUTION:
┌─────────────────┐  ┌─────────────────┐
│ 10 │ 20 │ 30 │ 40 │  │ 50 │    │    │    │
└─────────────────┘  └─────────────────┘
    Node A (FULL)       Node B (1 key)

Insert key 25 into Node A → OVERFLOW!

AFTER REDISTRIBUTION:
┌─────────────────┐  ┌─────────────────┐
│ 10 │ 20 │ 25 │    │  │ 30 │ 40 │ 50 │    │
└─────────────────┘  └─────────────────┘
    Node A (3 keys)     Node B (3 keys)

Parent separator updated: 30

The key insight is that by moving key 30 (and 40) to the sibling, we avoided creating a new node entirely. The parent's separator key must be updated to reflect the new boundary.

Why Redistribution is Preferred

Redistribution requires only updating existing nodes and their parent separators—typically 3 pages to modify. Splitting requires allocating a new page, modifying 3 existing pages, and potentially propagating splits upward. Redistribution is almost always cheaper in terms of I/O and page allocations.

The Two-to-Three Split Operation

When redistribution is impossible—both the overflowing node and its sibling are at maximum capacity—the B*-tree employs a unique two-to-three split. Instead of splitting one node into two (as in standard B+-trees), two full nodes are split into three nodes.

The Mathematics:

For a B*-tree with maximum n keys per node:

Two full nodes contain: 2n keys total
After inserting one more: 2n + 1 keys
Distributed across three nodes: ⌊(2n+1)/3⌋ or ⌈(2n+1)/3⌉ keys each

This ensures each resulting node is at least 2/3 full.

Example with max 6 keys per node:

two_to_three_split.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
BEFORE: Two full adjacent nodes
┌─────────────────────────┐  ┌─────────────────────────┐
│ 5 │10│15│20│25│30│      │  │35│40│45│50│55│60│      │
└─────────────────────────┘  └─────────────────────────┘
        Node A (FULL)              Node B (FULL)
 
Insert key 27 → Both nodes full, redistribution impossible!
 
SPLITTING: Combine all keys + new key, then split into 3
Combined keys: [5,10,15,20,25,27,30,35,40,45,50,55,60] = 13 keys
Distribute: 13/3 ≈ 4-5 keys per node
 
AFTER: Three 2/3-full nodes
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│ 5 │10│15│20│   │  │25│27│30│35│   │  │40│45│50│55│60│
└─────────────────┘  └─────────────────┘  └─────────────────┘
    Node A (4 keys)    NEW Node C (4 keys)   Node B (5 keys)
 
Parent receives two new separator keys: 25 and 40

Key observations:

One new node created instead of one: Standard split creates one new node from one. Two-to-three split creates one new node from two. Fewer nodes overall.
Higher post-split occupancy: Each of the three nodes is at least 2/3 full, not 1/2 full.
Two separators promoted: The parent receives two new keys instead of one, which may trigger parent redistribution or split.
Delayed height growth: Because splits are less frequent and create fewer nodes, the tree height increases more slowly.

Parent Propagation

The two-to-three split sends two keys to the parent instead of one. This increases the likelihood of parent overflow. However, the parent also uses redistribution-first logic, so splits at internal nodes are still relatively rare. In the worst case, splitting can propagate to the root, increasing tree height by one.

Deletion and Underflow Handling

Deletion in B*-trees follows a similar philosophy to insertion: try redistribution before merging. When deleting a key causes a node to fall below the 2/3 minimum occupancy, the algorithm attempts to borrow keys from a sibling before resorting to merging nodes.

The Deletion Algorithm:

b_star_delete.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
function delete(node, key):
    if node is leaf:
        remove key from node
        if node.size < MIN_KEYS:
            handle_underflow(node)
    else:
        child = find_child(node, key)
        delete(child, key)
 
function handle_underflow(node):
    # Step 1: Try borrowing from left sibling
    left_sibling = get_left_sibling(node)
    if left_sibling exists AND left_sibling.size > MIN_KEYS:
        borrow_from_left(node, left_sibling)
        return  # No merge needed!
    
    # Step 2: Try borrowing from right sibling
    right_sibling = get_right_sibling(node)
    if right_sibling exists AND right_sibling.size > MIN_KEYS:
        borrow_from_right(node, right_sibling)
        return  # No merge needed!
    
    # Step 3: If borrowing impossible, merge with a sibling
    if left_sibling exists:
        merge_nodes(left_sibling, node)
    else:
        merge_nodes(node, right_sibling)

Three-to-Two Merge:

Just as insertion uses two-to-three splits, deletion uses three-to-two merges when necessary. If three adjacent nodes are all at minimum occupancy and one loses a key, the algorithm can merge all three into two nodes, maintaining the 2/3 occupancy guarantee.

BEFORE: Three minimally-filled nodes
┌──────────┐  ┌──────────┐  ┌──────────┐
│ A  B  C  │  │ D  E  F  │  │ G  H  I  │
└──────────┘  └──────────┘  └──────────┘
  (min: 3)      (min: 3)      (min: 3)

Delete 'E' → Node 2 underflows

AFTER: Two nodes with redistributed keys
┌─────────────────┐  ┌─────────────────┐
│ A  B  C  D  F   │  │ G  H  I         │
└─────────────────┘  └─────────────────┘
   (5 keys: >2/3)      (3 keys: 2/3)

Deletion Complexity

Deletion in B*-trees is more complex than in standard B+-trees due to the redistribution logic. However, the same guarantees hold: worst-case O(log n) time complexity, with the constant factors being acceptable for most database workloads. The space savings typically outweigh the slightly increased delete complexity.

Space and Performance Analysis

Let's rigorously analyze the benefits of B*-trees compared to standard B+-trees. We'll examine space utilization, tree height, and I/O costs.

Space Utilization:

Space Utilization Comparison
Metric	B+-Tree	B*-Tree	Improvement
Minimum node occupancy	50%	66.7%	+33% minimum fill
Average occupancy (random inserts)	~69%	~81%	+17% average fill
Worst-case nodes for N keys	2N/(m-1)	1.5N/(m-1)	33% fewer nodes
Effective fanout (m=200)	~138 avg	~162 avg	+17% effective fanout

Tree Height Analysis:

The height of a B-tree directly impacts query performance (each level = one I/O). For N keys with effective fanout f:

Height = ⌈log_f(N)⌉

Example: 100 million keys, m=200

Tree Type	Effective Fanout	Height	I/Os per Query
B+-Tree	138	4	4
B*-Tree	162	4	4

Example: 10 billion keys, m=200

Tree Type	Effective Fanout	Height	I/Os per Query
B+-Tree	138	5	5
B*-Tree	162	5	5

At very large scales, the B*-tree may achieve one fewer level of depth, reducing I/O by 20-25% per query.

The Caching Effect

Beyond raw I/O counts, B*-trees improve cache utilization. Fewer, denser nodes mean more keys fit in the buffer pool. With the same memory budget, B*-trees cache effectively 17-33% more keys, reducing disk I/O for hot data access patterns.

Operation Complexity:

Operation	B+-Tree	B*-Tree	Notes
Search	O(log N)	O(log N)	Same asymptotic, slightly fewer levels
Insert	O(log N)	O(log N)	More complex with redistribution
Delete	O(log N)	O(log N)	More complex with redistribution
Range scan	O(log N + K)	O(log N + K)	Same leaf traversal

The constant factors for insert and delete are higher in B*-trees due to sibling access for redistribution. However, this cost is typically offset by the reduced number of splits/merges.

Implementation Considerations

Implementing B*-trees requires careful attention to several additional details beyond standard B+-tree implementation:

1. Sibling Pointers:

Efficient redistribution requires quick access to sibling nodes. Two approaches exist:

Explicit sibling pointers: Each node stores pointers to left/right siblings
Parent navigation: Access siblings through the parent node

Explicit pointers require maintenance during splits/merges but provide O(1) sibling access.

2. Locking Strategy:

b_star_locking.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# B*-tree operations may need to access siblings,
# requiring modified locking protocols
 
function safe_insert(node, key, value):
    # Acquire read lock on parent
    parent = node.parent
    lock_read(parent)
    
    # Acquire write lock on node
    lock_write(node)
    
    # Check if redistribution might be needed
    if node.is_potentially_full():
        # Must also lock potential sibling
        sibling = get_sibling(node, parent)
        if sibling:
            lock_write(sibling)
    
    # Perform insertion
    insert_internal(node, key, value)
    
    # Release locks in reverse order
    # ...

3. Concurrency Challenges:

Redistribution between siblings complicates concurrent access:

Standard B-tree concurrency uses lock coupling (crabbing)
B*-trees may need to lock multiple siblings simultaneously
Potential for deadlocks if not carefully designed

4. Root Handling:

The root node has special rules since it has no siblings for redistribution:

Root may have fewer children than the 2/3 minimum
Root splits use standard one-to-two splitting
Only when root splits does tree height increase

Complexity vs. Benefit Trade-off

B*-trees introduce implementation complexity that may not be justified for all use cases. Many production databases choose standard B+-trees with slight modifications rather than full B*-tree implementation. The benefit is most pronounced in read-heavy workloads with massive datasets where every percentage of space utilization matters.

When to Use B*-Trees

B*-trees offer compelling advantages in specific scenarios but aren't universally superior. Understanding when to apply this variant is crucial for effective system design.

Ideal Use Cases

•Read-heavy workloads where search efficiency is paramount
•Very large datasets where tree height differences matter
•Memory-constrained systems needing maximum cache efficiency
•Storage-cost-sensitive environments like cloud databases
•Append-mostly patterns where redistributions are rare
•Static or slowly-changing data with predictable access

Less Suitable Cases

•Write-heavy workloads where redistribution overhead hurts
•Small datasets where tree height is already minimal
•High-concurrency OLTP where locking complexity is problematic
•Random delete patterns triggering frequent merges
•Implementation simplicity is a priority
•Mixed workloads without clear read dominance

Production Reality

Most modern databases don't implement pure B*-trees but incorporate B*-tree principles selectively. For example, PostgreSQL uses space-utilization hints to delay splits, and Oracle has configurable fill factors. The underlying insight—redistribution before splitting—is widely adopted even when the full B*-tree algorithm isn't.

Summary: B*-Tree Essentials

The B*-tree represents an important optimization of the B+-tree, trading some implementation complexity for improved space utilization and potentially reduced tree height.

Key Takeaways

•The Problem: Standard B+-trees guarantee only 50% minimum node occupancy, wasting up to half of allocated disk pages.
•The Solution: B*-trees guarantee 2/3 (66.7%) minimum occupancy through redistribution-before-splitting.
•Redistribution: When a node overflows, keys are first redistributed to siblings before allocating new nodes.
•Two-to-Three Split: When redistribution fails, two full nodes become three 2/3-full nodes.
•Space Benefits: 17-33% reduction in storage, improved cache utilization, potentially reduced tree height.
•Trade-offs: Increased complexity for insertion/deletion, more complex locking, higher constant factors.
•Practical Impact: B*-tree principles are widely adopted, even when full implementation isn't used.

What's Next:

Having understood B*-trees and their space optimization approach, we'll explore another B-tree variant: Prefix B-trees. These trees reduce key storage by exploiting common prefixes among keys, offering a different dimension of optimization that's particularly valuable for string-based indexes.

Page Complete

You now understand the B*-tree variant, its redistribution-first philosophy, two-to-three splitting mechanism, and the trade-offs involved. This knowledge illuminates how production databases approach the space-efficiency challenge in their index structures.