Loading content...
The B+-tree is a remarkable data structure—balanced, efficient, and disk-optimized. Yet like any engineering solution, it represents a set of tradeoffs rather than perfection. One such tradeoff is space utilization: standard B+-trees guarantee only 50% minimum node occupancy after splits, meaning up to half of your carefully allocated disk pages might sit empty.
Enter the B-tree* (pronounced "B-star tree"), an elegant variant that addresses this inefficiency. By modifying the splitting strategy and redistributing keys between siblings before resorting to splits, B*-trees achieve significantly higher space utilization—guaranteeing at least two-thirds (66.7%) occupancy rather than just one-half.
This seemingly modest improvement has profound implications for database systems, affecting tree height, cache efficiency, I/O costs, and overall system performance. Understanding B*-trees illuminates the broader principle that small algorithmic refinements can yield substantial real-world benefits.
By the end of this page, you will understand the B*-tree's design philosophy, its redistribution-before-splitting mechanism, the mathematical foundations of its space guarantees, and when this variant provides advantages over standard B+-trees. You'll gain insight into how production databases make similar optimization decisions.
To appreciate the B*-tree's contribution, we must first understand the problem it solves. In a standard B+-tree with order m, each internal node contains between ⌈m/2⌉ and m children, and each leaf contains between ⌈(m-1)/2⌉ and m-1 keys.
This means that in the worst case—immediately after a series of splits—nodes may be only half full. Consider the implications:
| Metric | Optimal (100% full) | Worst Case (50% full) | Degradation |
|---|---|---|---|
| Keys per node (m=200) | 199 keys | 100 keys | ~2x more nodes needed |
| Tree height for 10M keys | 3 levels | 4 levels | 33% more I/O per query |
| Disk space usage | 100% efficient | ~50% efficient | 2x storage overhead |
| Cache utilization | Maximum density | Half entries wasted | Reduced hit rate |
Why does this happen?
When a node overflows (exceeds m keys), the standard B+-tree splitting algorithm:
This even distribution means both nodes end up approximately half full. While elegant in its simplicity, this approach:
The fundamental insight of the B*-tree is that splitting should be a last resort, not a first response to overflow.
In enterprise databases storing billions of records, the difference between 50% and 67% space utilization translates to gigabytes or terabytes of additional storage, hundreds of additional I/O operations per second, and measurably slower query latencies. What seems like a small percentage difference becomes a major operational cost.
The B-tree* is a height-balanced tree that modifies the standard B+-tree in two critical ways:
Formally, for a B*-tree of order m:
Calculating the bounds:
For a B*-tree of order m = 100:
This 17-percentage-point improvement in minimum occupancy compounds across the entire tree, significantly reducing the number of nodes and tree height.
The 2/3 occupancy guarantee isn't arbitrary—it's the mathematical consequence of the two-to-three splitting strategy. When two full nodes split into three, each new node is at least 2/3 full. This property is maintained through careful redistribution during insertions and deletions.
The core innovation of the B*-tree is its redistribution-first approach to handling node overflow. Rather than immediately splitting an overflowing node, the algorithm first checks if keys can be redistributed to a sibling.
The Redistribution Algorithm:
123456789101112131415161718192021222324252627282930
function insert(node, key, value): if node is leaf: insert key-value into node in sorted order if node.size > MAX_KEYS: handle_overflow(node) else: child = find_child(node, key) insert(child, key, value) function handle_overflow(node): # Step 1: Try redistribution with left sibling left_sibling = get_left_sibling(node) if left_sibling exists AND left_sibling.size < MAX_KEYS: redistribute_left(node, left_sibling) return # No split needed! # Step 2: Try redistribution with right sibling right_sibling = get_right_sibling(node) if right_sibling exists AND right_sibling.size < MAX_KEYS: redistribute_right(node, right_sibling) return # No split needed! # Step 3: If redistribution impossible, perform two-to-three split if left_sibling exists: two_to_three_split(left_sibling, node) else if right_sibling exists: two_to_three_split(node, right_sibling) else: # Root case - standard split standard_split(node)Redistribution in Action:
Consider two adjacent leaf nodes with a maximum capacity of 4 keys:
BEFORE REDISTRIBUTION:
┌─────────────────┐ ┌─────────────────┐
│ 10 │ 20 │ 30 │ 40 │ │ 50 │ │ │ │
└─────────────────┘ └─────────────────┘
Node A (FULL) Node B (1 key)
Insert key 25 into Node A → OVERFLOW!
AFTER REDISTRIBUTION:
┌─────────────────┐ ┌─────────────────┐
│ 10 │ 20 │ 25 │ │ │ 30 │ 40 │ 50 │ │
└─────────────────┘ └─────────────────┘
Node A (3 keys) Node B (3 keys)
Parent separator updated: 30
The key insight is that by moving key 30 (and 40) to the sibling, we avoided creating a new node entirely. The parent's separator key must be updated to reflect the new boundary.
Redistribution requires only updating existing nodes and their parent separators—typically 3 pages to modify. Splitting requires allocating a new page, modifying 3 existing pages, and potentially propagating splits upward. Redistribution is almost always cheaper in terms of I/O and page allocations.
When redistribution is impossible—both the overflowing node and its sibling are at maximum capacity—the B*-tree employs a unique two-to-three split. Instead of splitting one node into two (as in standard B+-trees), two full nodes are split into three nodes.
The Mathematics:
For a B*-tree with maximum n keys per node:
This ensures each resulting node is at least 2/3 full.
Example with max 6 keys per node:
12345678910111213141516171819
BEFORE: Two full adjacent nodes┌─────────────────────────┐ ┌─────────────────────────┐│ 5 │10│15│20│25│30│ │ │35│40│45│50│55│60│ │└─────────────────────────┘ └─────────────────────────┘ Node A (FULL) Node B (FULL) Insert key 27 → Both nodes full, redistribution impossible! SPLITTING: Combine all keys + new key, then split into 3Combined keys: [5,10,15,20,25,27,30,35,40,45,50,55,60] = 13 keysDistribute: 13/3 ≈ 4-5 keys per node AFTER: Three 2/3-full nodes┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐│ 5 │10│15│20│ │ │25│27│30│35│ │ │40│45│50│55│60│└─────────────────┘ └─────────────────┘ └─────────────────┘ Node A (4 keys) NEW Node C (4 keys) Node B (5 keys) Parent receives two new separator keys: 25 and 40Key observations:
One new node created instead of one: Standard split creates one new node from one. Two-to-three split creates one new node from two. Fewer nodes overall.
Higher post-split occupancy: Each of the three nodes is at least 2/3 full, not 1/2 full.
Two separators promoted: The parent receives two new keys instead of one, which may trigger parent redistribution or split.
Delayed height growth: Because splits are less frequent and create fewer nodes, the tree height increases more slowly.
The two-to-three split sends two keys to the parent instead of one. This increases the likelihood of parent overflow. However, the parent also uses redistribution-first logic, so splits at internal nodes are still relatively rare. In the worst case, splitting can propagate to the root, increasing tree height by one.
Deletion in B*-trees follows a similar philosophy to insertion: try redistribution before merging. When deleting a key causes a node to fall below the 2/3 minimum occupancy, the algorithm attempts to borrow keys from a sibling before resorting to merging nodes.
The Deletion Algorithm:
123456789101112131415161718192021222324252627
function delete(node, key): if node is leaf: remove key from node if node.size < MIN_KEYS: handle_underflow(node) else: child = find_child(node, key) delete(child, key) function handle_underflow(node): # Step 1: Try borrowing from left sibling left_sibling = get_left_sibling(node) if left_sibling exists AND left_sibling.size > MIN_KEYS: borrow_from_left(node, left_sibling) return # No merge needed! # Step 2: Try borrowing from right sibling right_sibling = get_right_sibling(node) if right_sibling exists AND right_sibling.size > MIN_KEYS: borrow_from_right(node, right_sibling) return # No merge needed! # Step 3: If borrowing impossible, merge with a sibling if left_sibling exists: merge_nodes(left_sibling, node) else: merge_nodes(node, right_sibling)Three-to-Two Merge:
Just as insertion uses two-to-three splits, deletion uses three-to-two merges when necessary. If three adjacent nodes are all at minimum occupancy and one loses a key, the algorithm can merge all three into two nodes, maintaining the 2/3 occupancy guarantee.
BEFORE: Three minimally-filled nodes
┌──────────┐ ┌──────────┐ ┌──────────┐
│ A B C │ │ D E F │ │ G H I │
└──────────┘ └──────────┘ └──────────┘
(min: 3) (min: 3) (min: 3)
Delete 'E' → Node 2 underflows
AFTER: Two nodes with redistributed keys
┌─────────────────┐ ┌─────────────────┐
│ A B C D F │ │ G H I │
└─────────────────┘ └─────────────────┘
(5 keys: >2/3) (3 keys: 2/3)
Deletion in B*-trees is more complex than in standard B+-trees due to the redistribution logic. However, the same guarantees hold: worst-case O(log n) time complexity, with the constant factors being acceptable for most database workloads. The space savings typically outweigh the slightly increased delete complexity.
Let's rigorously analyze the benefits of B*-trees compared to standard B+-trees. We'll examine space utilization, tree height, and I/O costs.
Space Utilization:
| Metric | B+-Tree | B*-Tree | Improvement |
|---|---|---|---|
| Minimum node occupancy | 50% | 66.7% | +33% minimum fill |
| Average occupancy (random inserts) | ~69% | ~81% | +17% average fill |
| Worst-case nodes for N keys | 2N/(m-1) | 1.5N/(m-1) | 33% fewer nodes |
| Effective fanout (m=200) | ~138 avg | ~162 avg | +17% effective fanout |
Tree Height Analysis:
The height of a B-tree directly impacts query performance (each level = one I/O). For N keys with effective fanout f:
Height = ⌈log_f(N)⌉
Example: 100 million keys, m=200
| Tree Type | Effective Fanout | Height | I/Os per Query |
|---|---|---|---|
| B+-Tree | 138 | 4 | 4 |
| B*-Tree | 162 | 4 | 4 |
Example: 10 billion keys, m=200
| Tree Type | Effective Fanout | Height | I/Os per Query |
|---|---|---|---|
| B+-Tree | 138 | 5 | 5 |
| B*-Tree | 162 | 5 | 5 |
At very large scales, the B*-tree may achieve one fewer level of depth, reducing I/O by 20-25% per query.
Beyond raw I/O counts, B*-trees improve cache utilization. Fewer, denser nodes mean more keys fit in the buffer pool. With the same memory budget, B*-trees cache effectively 17-33% more keys, reducing disk I/O for hot data access patterns.
Operation Complexity:
| Operation | B+-Tree | B*-Tree | Notes |
|---|---|---|---|
| Search | O(log N) | O(log N) | Same asymptotic, slightly fewer levels |
| Insert | O(log N) | O(log N) | More complex with redistribution |
| Delete | O(log N) | O(log N) | More complex with redistribution |
| Range scan | O(log N + K) | O(log N + K) | Same leaf traversal |
The constant factors for insert and delete are higher in B*-trees due to sibling access for redistribution. However, this cost is typically offset by the reduced number of splits/merges.
Implementing B*-trees requires careful attention to several additional details beyond standard B+-tree implementation:
1. Sibling Pointers:
Efficient redistribution requires quick access to sibling nodes. Two approaches exist:
Explicit pointers require maintenance during splits/merges but provide O(1) sibling access.
2. Locking Strategy:
1234567891011121314151617181920212223
# B*-tree operations may need to access siblings,# requiring modified locking protocols function safe_insert(node, key, value): # Acquire read lock on parent parent = node.parent lock_read(parent) # Acquire write lock on node lock_write(node) # Check if redistribution might be needed if node.is_potentially_full(): # Must also lock potential sibling sibling = get_sibling(node, parent) if sibling: lock_write(sibling) # Perform insertion insert_internal(node, key, value) # Release locks in reverse order # ...3. Concurrency Challenges:
Redistribution between siblings complicates concurrent access:
4. Root Handling:
The root node has special rules since it has no siblings for redistribution:
B*-trees introduce implementation complexity that may not be justified for all use cases. Many production databases choose standard B+-trees with slight modifications rather than full B*-tree implementation. The benefit is most pronounced in read-heavy workloads with massive datasets where every percentage of space utilization matters.
B*-trees offer compelling advantages in specific scenarios but aren't universally superior. Understanding when to apply this variant is crucial for effective system design.
Most modern databases don't implement pure B*-trees but incorporate B*-tree principles selectively. For example, PostgreSQL uses space-utilization hints to delay splits, and Oracle has configurable fill factors. The underlying insight—redistribution before splitting—is widely adopted even when the full B*-tree algorithm isn't.
The B*-tree represents an important optimization of the B+-tree, trading some implementation complexity for improved space utilization and potentially reduced tree height.
What's Next:
Having understood B*-trees and their space optimization approach, we'll explore another B-tree variant: Prefix B-trees. These trees reduce key storage by exploiting common prefixes among keys, offering a different dimension of optimization that's particularly valuable for string-based indexes.
You now understand the B*-tree variant, its redistribution-first philosophy, two-to-three splitting mechanism, and the trade-offs involved. This knowledge illuminates how production databases approach the space-efficiency challenge in their index structures.