Loading learning content...
If leaf nodes are the treasure vaults of a B+-tree—containing all the actual data—then internal nodes are the signposts and highways that guide queries to the right vault. Internal nodes contain no data themselves; their sole purpose is to route searches efficiently from the root to the appropriate leaf.
This navigation-only design might seem wasteful—why have nodes that don't store data? But this specialization is precisely what enables B+-trees' exceptional performance. By freeing internal nodes from data storage responsibilities, we can optimize them purely for navigation speed, maximizing fanout and minimizing tree height.
By the end of this page, you will understand the detailed structure of internal nodes, how they guide searches, how they're maintained during tree modifications, and the sophisticated optimizations real databases apply to maximize their efficiency.
An internal node in a B+-tree contains keys and child pointers. The keys serve as separators, defining the boundaries between subtrees. The pointers indicate which child to visit for a given search key.
Physical Layout:
┌────────────────────────────────────────────────────────────┐
│ Header │ P₀ │ K₁ │ P₁ │ K₂ │ P₂ │ ... │ Kₙ │ Pₙ │ Free │
└────────────────────────────────────────────────────────────┘
Where:
Header: Page metadata (type, entry count, etc.)P₀: Pointer to leftmost child (keys < K₁)Kᵢ: Separator keyPᵢ: Pointer to child for keys ≥ Kᵢ and < Kᵢ₊₁Pₙ: Pointer to rightmost child (keys ≥ Kₙ)Note the asymmetry: there's always one more pointer than keys. If a node has n keys, it has n+1 child pointers.
| Component | Size | Description |
|---|---|---|
| Page Header | 16-64 bytes | Page ID, type flag, entry count, LSN |
| Leftmost Pointer (P₀) | 8 bytes | Points to children with keys < first separator |
| Separator Key (Kᵢ) | Variable | Divides key space between Pᵢ₋₁ and Pᵢ children |
| Child Pointer (Pᵢ) | 8 bytes | Points to child containing keys ≥ Kᵢ |
| High Key (optional) | Variable | Max key in subtree (for B-link trees) |
| Right Sibling (optional) | 8 bytes | Link to right sibling (for B-link trees) |
Unlike in B-trees where internal node keys correspond to actual data records, B+-tree internal keys are purely navigational. They need only be sufficient to correctly route searches—they don't need to match any specific data record exactly.
When a search reaches an internal node, it must determine which child pointer to follow. This decision is made by comparing the search key against the separator keys.
Routing Algorithm:
find_child(internal_node, search_key):
keys = internal_node.keys // [K₁, K₂, ..., Kₙ]
pointers = internal_node.pointers // [P₀, P₁, P₂, ..., Pₙ]
// Binary search for first key > search_key
i = binary_search_upper(keys, search_key)
// Follow pointer Pᵢ
return pointers[i]
Detailed Semantics:
Given keys [K₁, K₂, K₃] and pointers [P₀, P₁, P₂, P₃]:
| If search_key is... | Follow pointer... | Why |
|---|---|---|
| < K₁ | P₀ | Key is less than all separators |
| ≥ K₁ and < K₂ | P₁ | Key is in K₁'s subtree |
| ≥ K₂ and < K₃ | P₂ | Key is in K₂'s subtree |
| ≥ K₃ | P₃ | Key is greater than all separators |
Binary Search Within Nodes:
With potentially hundreds of keys per internal node, linear search would be inefficient. Internal node search uses binary search for O(log m) comparisons, where m is the number of keys.
Example: An internal node with 200 keys requires at most 8 comparisons (log₂ 200 ≈ 7.6).
SIMD Optimization:
Modern databases use SIMD (Single Instruction, Multiple Data) instructions to search internal nodes even faster:
This reduces internal node search time to near-constant, making tree traversal dominated by I/O rather than CPU.
Different implementations use slightly different comparison semantics (< vs ≤). What matters is consistency: the tree build and search must use the same convention. The splitting algorithm must ensure keys end up in the subtree where search will find them.
Internal nodes are created and modified through splits and merges. Understanding these operations clarifies how the tree structure evolves.
Initial State:
When a B+-tree is first created, it consists of a single leaf node (which is also the root). The first internal node appears when this root leaf splits.
Leaf Split Creates Internal Structure:
Before (single root leaf):
[10, 20, 30, 40] ← Root and only leaf
Insert key 25 causes split:
[30] ← New internal root
/ \
[10, 20, 25] [30, 40] ← Two leaf nodes
Step by Step:
Internal Node Split:
When an internal node overflows (too many keys after a child split), it too must split:
Before:
[30, 50, 70] ← Internal node is full
/ | | \
... ... ... ... ← Children
Child split adds key 60:
[30, 50, 60, 70] ← Now overflows
/ | | | \
... ... ... ... ... ← Children
Internal split:
[50] ← Middle key moves up to parent
/ \
[30] [60, 70] ← Two internal nodes
/ \ / | \
... ... ... ... ... ← Children redistributed
Key Difference: In internal node splits, the middle key moves to the parent (unlike leaf splits where it's copied). This maintains the invariant that each key appears exactly once in internal nodes.
Leaf splits: separator key is COPIED up (key remains in right leaf). Internal splits: separator key MOVES up (key is removed from split node). This distinction is fundamental and a common source of implementation bugs.
When deletions cause an internal node to fall below minimum occupancy, it must either borrow keys from a sibling or merge with a sibling.
Redistribution (Borrowing):
Before: Left sibling has spare keys
[40]
/ \
[10, 20, 30] [50] ← Right child underfull
After redistribution:
[30]
/ \
[10, 20] [40, 50] ← Balanced
Mechanism:
Merging:
When a sibling doesn't have spare keys to share, merge two internal nodes:
Before: Both siblings at minimum
[60]
/ \
[30, 40] [70] ← Right child underfull
/ | \ / \
... ... ... ... ... ← Children
After merge:
[30, 40, 60, 70] ← Combined node
/ | | | \
... ... ... ... ... ← All children together
Mechanism:
Merges can cascade upward. If the root's last two children merge, the root becomes empty and is deleted, reducing tree height. This is the only way a B+-tree shrinks in height—when the root has exactly two children that merge.
Since internal node keys are separators (not data), they can be optimized in ways that actual data keys cannot. These optimizations increase fanout and reduce I/O.
Prefix Compression (Trump Compression):
Consider adjacent keys in an internal node:
The separator doesn't need to be "Jones"—it only needs to distinguish the ranges:
Space Savings:
| Full Key | Compressed Separator | Savings |
|---|---|---|
| "Christopher" | "Chr" | 76% |
| "2024-01-15T14:30:00Z" | "2024-01-15T14" | 30% |
| "category:electronics:phones:apple" | "category:electronics:p" | 40% |
Suffix Truncation:
For composite or long keys, only enough prefix to separate ranges is stored:
Leaf 1 ends with: key = "database_systems_introduction"
Leaf 2 starts with: key = "database_systems_query_optimization"
Full separator: "database_systems_query_optimization"
Truncated: "database_systems_q"
──────────────────
Savings: 19 bytes (49% reduction)
Minimum Separator:
The optimal separator is the lexicographically smallest string greater than the max key of the left child:
min_separator(left_max, right_min):
# Find shortest string S where left_max < S ≤ right_min
for length from 1 to len(right_min):
candidate = right_min[0:length]
if left_max < candidate:
return candidate
return right_min # Fallback to full key
This optimization can dramatically increase internal node fanout for string keys.
Separator compression is most valuable when keys are long strings with common prefixes (URLs, file paths, compound keys). For integer keys, there's usually nothing to compress, so these optimizations don't apply.
Because internal nodes are accessed on every search, they're prime candidates for caching. Databases employ several strategies to maximize internal node cache hits.
Analysis: How Much Memory for Internal Nodes?
For a B+-tree of order 500 with 1 billion records:
| Level | Nodes | Size (8KB pages) | Cumulative |
|---|---|---|---|
| 0 (root) | 1 | 8 KB | 8 KB |
| 1 | ~500 | 4 MB | 4 MB |
| 2 | ~250,000 | 2 GB | 2 GB |
| 3 (leaves) | ~2,000,000 | 16 GB | 18 GB |
Key Insight: All internal nodes fit in ~2 GB of RAM. If your buffer pool is at least 2 GB, every search requires only one disk I/O (for the leaf). This explains why database administrators often recommend buffer pools larger than the internal node footprint.
Memory Sizing Rule of Thumb:
Internal nodes ≈ Total data size / Fanout ratio
For 1 TB data, fanout 500:
~1 TB / 500 = ~2 GB for internal nodes
Cloud databases often run on instances with 8-64 GB RAM but terabytes of data. Understanding internal node sizes helps choose the right instance: enough RAM to cache all internal nodes means each query costs exactly one I/O, dramatically improving performance.
Internal nodes present unique concurrency challenges. They're on the path of every query, making them potential bottlenecks. However, their stability (they change only during splits/merges) enables optimizations.
Latch Coupling (Crabbing):
The classic protocol for safe traversal:
traverse_with_crabbing(root, key):
current = acquire_read_latch(root)
while current is not leaf:
child_id = find_child(current, key)
child = acquire_read_latch(child_id)
release_latch(current) # Safe: child is latched
current = child
return current # Still holding latch
By releasing the parent latch once the child is acquired, we minimize contention. At any moment, a query holds at most 2 latches.
Optimistic Latching:
For even better concurrency:
optimistic_traverse(root, key):
path = []
current = root
# Traverse without latches
while current is not leaf:
path.append(current)
current = find_child(current, key)
# Now acquire necessary latches
# If any node changed (version mismatch), restart
This works because internal nodes rarely change—most traversals succeed without restarts.
Internal nodes change only during splits/merges—rare compared to leaf modifications. This low change frequency makes optimistic concurrency practical: readers rarely encounter mid-split nodes, so restarts are infrequent.
Internal nodes are the navigation backbone of B+-trees. Their design—pure routing with no data—enables the tree's exceptional performance. Let's consolidate the key insights:
What's Next:
Having explored both leaf nodes and internal nodes in detail, we're now ready for a comprehensive comparison between B+-trees and standard B-trees. The final page of this module, Comparison with B-tree, synthesizes everything we've learned and provides a definitive analysis of why B+-trees dominate in database applications.
You now have a deep understanding of internal nodes: their structure, how they route searches, how they're created and maintained, and how they can be optimized for maximum performance. This knowledge, combined with your understanding of leaf nodes, provides a complete picture of B+-tree architecture. Next, we'll compare B+-trees with B-trees definitively.