Loading content...
If you open any production database system—Oracle, PostgreSQL, MySQL, SQL Server, or even modern systems like CockroachDB and TiDB—you'll find that the B+-tree is the default, and often the only, tree-based index structure offered. This isn't a historical accident or vendor laziness. The B+-tree represents decades of refinement in understanding what database workloads actually need.
The B+-tree evolved from the B-tree with a singular focus: optimizing for the realities of disk-based storage and the query patterns databases actually encounter. While B-trees were a theoretical breakthrough, B+-trees are an engineering triumph—the structure that powers virtually every relational database on Earth.
By the end of this page, you will understand the formal definition of B+-trees, including their mathematical properties, structural invariants, and the fundamental design decisions that make them the dominant index structure in database systems. You'll see exactly how and why B+-trees differ from B-trees at a definitional level.
The B-tree, invented by Rudolf Bayer and Edward McCreight at Boeing Research Labs in 1970, revolutionized how we store and retrieve data on disk. Its balanced structure and high fanout made it dramatically more efficient than binary search trees for disk-based operations.
However, as database systems matured and workloads grew more complex, engineers identified several limitations of the original B-tree design:
Problem 1: Range Query Inefficiency
In a standard B-tree, data records (or pointers to them) can appear at any level of the tree. To perform a range query (e.g., "find all employees with salaries between $50,000 and $75,000"), you might need to traverse back and forth through the tree structure, visiting internal nodes repeatedly.
Problem 2: Variable Node Sizes
When data records are stored in internal nodes, those nodes become larger—sometimes much larger if records are variable-length. This reduces fanout and increases tree height, exactly the opposite of what we want for disk efficiency.
Problem 3: Sequential Access Patterns
Databases frequently need to scan through ordered data sequentially. B-trees make this unnecessarily complex because the "next" record might be in a sibling subtree requiring navigation back up through parent nodes.
The B+-tree's key insight was to separate navigation from storage. Internal nodes exist solely to guide searches—they hold keys but no data. All actual data resides in leaf nodes, which are linked together for efficient sequential access. This separation enables optimizations impossible in standard B-trees.
A B+-tree of order m is a balanced tree structure that satisfies the following properties. Understanding these properties precisely is essential for reasoning about B+-tree behavior, implementing the algorithms correctly, and analyzing their performance.
Definition: A B+-tree of order m is a rooted tree satisfying the following conditions:
Different textbooks define 'order' differently! Some define order as the maximum number of children (our definition), others as the maximum number of keys, and still others as the minimum degree. Always verify which convention is being used. In this course, order m means a node has at most m children and m-1 keys.
The formal constraints of a B+-tree lead to precise mathematical bounds that are crucial for performance analysis. These bounds determine the tree's height, which directly controls I/O costs.
| Node Type | Minimum Keys | Maximum Keys | Minimum Children | Maximum Children |
|---|---|---|---|---|
| Root (single node tree) | 1 | m-1 | — | — |
| Root (multi-node tree) | 1 | m-1 | 2 | m |
| Internal Node | ⌈m/2⌉ - 1 | m-1 | ⌈m/2⌉ | m |
| Leaf Node | ⌈(m-1)/2⌉ | m-1 | — | — |
Height Bounds:
For a B+-tree of order m containing n key-value pairs:
Maximum Height: The tree achieves maximum height when every node has the minimum number of entries:
h_max ≤ ⌈log_{⌈m/2⌉}((n+1)/2)⌉ + 1
Minimum Height: The tree achieves minimum height when every node is completely full:
h_min = ⌈log_m(n+1)⌉
Example Calculation:
Consider a B+-tree of order 200 storing 1 million records:
This means any record in a million-record database can be found with at most 4 disk reads. This logarithmic growth is what makes B+-trees practical for databases of any size.
Each level of tree height adds one disk I/O to every search. A tree of height 4 means 4 disk reads; height 10 means 10 disk reads. Since disk I/O typically takes 5-10ms (millions of times slower than memory access), keeping height small is paramount. The B+-tree's high fanout ensures height stays small even for billions of records.
The defining characteristic of B+-trees—the feature that most distinguishes them from B-trees—is the rigid separation between navigation and data storage. This isn't just an implementation detail; it's a fundamental architectural decision with far-reaching implications.
Why This Separation Matters:
1. Consistent Search Path Length
In a B+-tree, every search—whether successful or not—traverses exactly the same number of levels: from root to leaf. This predictability simplifies cost estimation for query optimization.
2. Maximized Internal Node Fanout
Since internal nodes don't store data, they can hold more keys and pointers. If data records average 200 bytes but keys are only 8 bytes:
This 10× improvement in fanout can reduce tree height by an entire level.
3. Cache-Friendly Navigation
Internal nodes are small enough to remain cached in memory. A database might keep all internal nodes resident while leaf nodes are demand-paged from disk. This means most searches require only one disk read—for the leaf.
4. Simplified Concurrency
Data modifications only affect leaf nodes. This simplifies locking strategies: internal nodes can use lighter-weight latches since they're modified less frequently.
A subtle but crucial aspect of B+-tree definition is how keys function in internal nodes versus leaf nodes. This distinction often causes confusion but is essential for correct understanding.
In Leaf Nodes: Keys represent the actual search key values of the stored data. If you're indexing employee IDs, the leaf contains the actual employee IDs alongside their associated data (or pointers to it).
In Internal Nodes: Keys are separators or routing keys. They need only be sufficient to correctly route searches—they don't necessarily correspond to any actual data record. The separator key between two subtrees can be:
When a leaf node splits, the middle key is typically copied up to the parent (not moved). This is different from B-tree splits where the middle key moves up. This copy semantics means the same key value can appear in both an internal node (as a separator) and a leaf node (as actual data). This is normal and correct in B+-trees.
Routing Semantics:
Given an internal node with keys [k₁, k₂, k₃] and child pointers [p₀, p₁, p₂, p₃], a search for key K follows:
if K < k₁: follow p₀
else if K < k₂: follow p₁
else if K < k₃: follow p₂
else: follow p₃
Note that the search keys in internal nodes define exclusive upper bounds for all but the rightmost pointer. Different implementations may use ≤ instead of <, but the logic must be consistent throughout the tree.
| Aspect | Internal Node Keys | Leaf Node Keys |
|---|---|---|
| Purpose | Route searches to correct subtree | Identify stored data records |
| Correspondence | May or may not match actual data | Always matches actual data key |
| Modification | Can be optimized (prefix compression) | Must preserve exact key value |
| On Deletion | May remain even if data deleted | Removed with data record |
| Duplication | Same key can appear multiple times in internal nodes | Unique within leaf (for unique indexes) |
A defining property of B+-trees is their minimum occupancy guarantee. Unlike simpler structures where nodes can become arbitrarily empty, B+-trees ensure every node (except possibly the root) is at least half full. This guarantee has profound implications for both space efficiency and performance predictability.
Why 50% Matters:
Space Efficiency: With a guaranteed 50% minimum fill, the B+-tree uses at most twice the space of a perfectly packed structure. In practice, random insertions and deletions lead to approximately 69% average fill (proven by Yao, 1978), meaning space overhead is only about 45%.
Height Guarantees: Minimum occupancy directly affects maximum height. If nodes could become nearly empty, the tree could grow much taller, destroying the logarithmic performance guarantee.
Performance Predictability: With bounded occupancy, the number of disk reads for any operation is predictable. This enables the query optimizer to make accurate cost estimates.
Example:
For a B+-tree of order 100:
While 50% minimum fill wastes some space, attempting higher guarantees would require more frequent rebalancing. The 50% threshold is the sweet spot: it limits wasted space while keeping maintenance operations infrequent. Some systems use 2/3 fill (order divisible by 3) for slightly better space utilization.
The textbook definition of B+-trees describes an idealized structure. Real database implementations introduce variations to handle practical concerns. Understanding these variations helps you recognize B+-trees in the wild.
| Aspect | Textbook Definition | Practical Implementation |
|---|---|---|
| Order | Fixed parameter m | Derived from page size and key/pointer sizes |
| Node size | Abstract 'slots' | Fixed-size pages (4KB, 8KB, 16KB typical) |
| Fill factor | ≥50% always enforced | Configurable, may allow lower during bulk operations |
| Leaf linking | Doubly-linked list | May include parent pointers for crash recovery |
| Key format | Atomic values | May use prefix compression, suffix truncation |
| Concurrency | Not addressed | Elaborate locking/latching protocols |
| Persistence | Assumed | Write-ahead logging, checkpointing |
Variable-Width Keys:
The formal definition assumes fixed-size keys, but real databases support variable-length strings. This complicates the notion of 'order' since the number of keys per node varies. Implementations typically define order based on a target page size rather than a fixed key count.
Overflow Pages:
When values are very large (e.g., TEXT or BLOB columns), they may not fit in the leaf page. Implementations use overflow pages linked from the main leaf entry. The key remains in the leaf; only the value spills over.
Different Leaf Structures:
Some implementations store only keys in leaves with separate pointers to data pages (secondary indexes). Others store the entire row in the leaf (clustered indexes). Both are valid B+-trees; only the "value" part of key-value pairs differs.
PostgreSQL, MySQL/InnoDB, Oracle, and SQL Server all implement B+-trees, but with different page sizes, compression schemes, and concurrency protocols. The core B+-tree properties remain the same—all data in leaves, linked leaves, balanced height—but implementation details vary significantly.
Let's consolidate the essential elements of the B+-tree definition:
What's Next:
Now that we've established the formal definition of B+-trees, the next page explores the most distinctive feature in depth: all data resides in leaf nodes. We'll examine why this design decision is so powerful and how it enables optimizations impossible in standard B-trees.
You now understand the formal definition of B+-trees, including their structural properties, mathematical bounds, and the key design decisions that distinguish them from B-trees. Next, we'll explore the implications of storing all data exclusively in leaf nodes.