Loading learning content...
B-trees and B+-trees are closely related—the B+-tree evolved directly from the B-tree, inheriting its core concepts of balance and high fanout. Yet the differences between them are profound enough that virtually every modern database system chooses B+-trees exclusively for their primary index structures.
This page provides a comprehensive comparison, synthesizing everything we've learned about B+-trees and contrasting it with B-tree characteristics. By the end, you'll understand not just what differs, but why those differences matter for database workloads, and when (rarely) a B-tree might still be preferred.
By the end of this page, you will have a complete understanding of how B+-trees differ from B-trees across every dimension: structure, search behavior, range queries, space efficiency, concurrency, and real-world performance. You'll be able to explain definitively why B+-trees dominate in database systems.
The structural differences between B-trees and B+-trees are fundamental. They affect every aspect of tree behavior.
| Aspect | B-Tree | B+-Tree |
|---|---|---|
| Data location | All nodes (internal + leaf) | Leaves only |
| Internal node content | Keys + data + child pointers | Keys + child pointers only |
| Leaf node content | Keys + data | Keys + data + sibling links |
| Leaf linking | Not present | Doubly-linked list |
| Key duplication | Each key appears exactly once | Keys copied to internal nodes during splits |
| Node structure uniformity | All nodes have same structure | Two distinct node types |
| Record access path | Variable (any level) | Fixed (always to leaf) |
Visual Comparison:
B-Tree:
[40, R40, 80, R80]
/ |
[10, R10, 30, R30] [60, R60] [90, R90]
(R = Record/Data; data mixed with navigation)
B+-Tree:
[40, 80]
/ |
[10, 30] [40, 60] [80, 90]
↑ ↑ ↑
Data Data Data
↔─────────↔─────────↔
(Linked leaves)
(Internal nodes: navigation only; Leaves: all data, all linked)
The B+-tree has a clear separation: navigation above, storage below. The B-tree interleaves navigation and storage at every level.
In a B-tree, the key 40 appears once and is associated with its data record. In a B+-tree, key 40 in an internal node is merely a separator—the actual key 40 with its data is in a leaf. The internal 40 could even be abbreviated if it still correctly routes searches.
The difference in data location fundamentally changes how searches work.
Why Predictable Depth Matters:
1. Query Optimization:
-- The optimizer needs to estimate:
SELECT * FROM orders WHERE order_id = 12345;
With B+-trees: Cost = tree height = exactly 3 I/Os (for example)
With B-trees: Cost = between 1 and 4 I/Os depending on whether 12345 is at root, mid-level, or leaf. The optimizer must maintain statistics about key distribution across levels—complexity that rarely pays off.
2. SLA Guarantees:
For applications requiring consistent latency (financial trading, real-time systems), B+-trees provide predictable performance. B-trees have variance that may violate latency guarantees.
3. Cache Behavior:
With B+-trees, caching internal nodes benefits all queries equally. With B-trees, caching patterns are less predictable—some queries get lucky (finding data near root), others don't.
B-tree proponents sometimes cite early termination as an advantage. In reality, with fanout of 500, only about 0.2% of keys are in non-leaf levels. The 'advantage' applies to a tiny fraction of searches, while the reduced fanout (from storing data in internal nodes) affects ALL searches.
Range queries expose the most dramatic difference between B-trees and B+-trees. The linked leaf structure of B+-trees fundamentally changes how ranges are processed.
| Metric | B-Tree | B+-Tree | Advantage |
|---|---|---|---|
| Algorithm | In-order traversal | Leaf scan | B+-Tree: simpler |
| I/O pattern | Random jumps up/down tree | Sequential leaf access | B+-Tree: 10-100× faster |
| Pages accessed | Up to 1,000 × height | ~10 (1,000 ÷ 100 per page) | B+-Tree: 100-400× fewer |
| Cache efficiency | Poor (scattered access) | Excellent (sequential) | B+-Tree: far better |
| Prefetch possible | Very difficult | Trivial | B+-Tree: yes |
| Time complexity | O(k × log n) | O(log n + k/B) | B+-Tree: optimal |
In-Order Traversal Cost (B-Tree):
To visit all keys in order in a B-tree:
visit(node):
if node is leaf:
output all keys
else:
for each (pointer, key) pair:
visit(child via pointer)
output key
visit(rightmost child)
This traversal constantly moves between levels. After visiting a subtree, you must return to the parent to continue. Each 'return' may require re-reading a page that's no longer cached.
Leaf Scan (B+-Tree):
To visit all keys in order in a B+-tree:
leaf = leftmost_leaf
while leaf exists:
output all keys in leaf
leaf = leaf.next
No tree traversal needed after finding the start. Pure sequential access through leaf pages. The OS and disk can prefetch aggressively.
Database workloads are dominated by range queries: date ranges, price ranges, alphabetical listings, BETWEEN clauses, ORDER BY, GROUP BY, joins on ranges. The B+-tree's range query advantage isn't a minor optimization—it's the primary reason databases adopted B+-trees.
Space efficiency is nuanced: B+-trees duplicate some keys in internal nodes, but their overall space usage is often better than B-trees. Here's why:
Key Duplication in B+-Trees:
When a leaf splits, the middle key is copied to the parent. This means some keys appear twice (or more times, if at multiple split boundaries).
However:
Fanout Impact on Tree Size:
Consider storing 1 million records with 8-byte keys and 200-byte data:
B-Tree (fanout ~35):
B+-Tree (fanout ~500):
The B+-tree is 5× smaller despite key duplication, because higher fanout dramatically reduces the number of nodes needed.
| Factor | B-Tree | B+-Tree | Winner |
|---|---|---|---|
| Key duplication | None | Some (at split points) | B-Tree |
| Internal node size | Large (contains data) | Small (keys only) | B+-Tree |
| Number of internal nodes | Many (low fanout) | Few (high fanout) | B+-Tree |
| Total internal overhead | 5-20% of data | 0.1-1% of data | B+-Tree |
| Overall space | Higher | Lower | B+-Tree |
Textbooks often highlight B+-tree key duplication as a disadvantage. In practice, the duplicated keys in internal nodes are vastly outweighed by the space savings from higher fanout. Real-world B+-trees typically use 2-5× less space than equivalent B-trees.
Fanout directly determines tree height, which in turn determines I/O cost per search. This is perhaps the most critical performance factor.
Fanout Calculation:
With 8KB pages, 8-byte keys, 8-byte pointers, and 200-byte records:
B-Tree Internal Node:
Entry = key + data + pointer = 8 + 200 + 8 = 216 bytes
Fanout = 8192 / 216 ≈ 37
B+-Tree Internal Node:
Entry = key + pointer = 8 + 8 = 16 bytes
Fanout = 8192 / 16 ≈ 500
That's a 13.5× difference in fanout!
| Records | B-Tree Height (f=37) | B+-Tree Height (f=500) | I/O Saved |
|---|---|---|---|
| 10,000 | 3 | 2 | 1 (33%) |
| 1,000,000 | 4 | 3 | 1 (25%) |
| 100,000,000 | 5 | 4 | 1 (20%) |
| 10,000,000,000 | 7 | 5 | 2 (29%) |
Impact Analysis:
Consider a database with 100 million records running 10,000 queries per second:
B-Tree (height 5): 50,000 disk reads/second
B+-Tree (height 4): 40,000 disk reads/second
Savings: 10,000 disk reads/second = 20% reduction in I/O load
This 20% improvement translates directly to:
Over a year of operation, this single level of height difference saves millions of dollars in infrastructure for large systems.
For disk-based systems, tree height dominates performance. One additional level means one more disk I/O per query. When you're processing millions of queries, saving one I/O per query has enormous cumulative impact. B+-trees minimize height systematically.
The structural differences between B-trees and B+-trees significantly affect concurrent access and maintenance operations.
Concurrency Implications:
B-Tree Challenges:
B+-Tree Advantages:
| Aspect | B-Tree | B+-Tree |
|---|---|---|
| Write location | Any level | Leaves only |
| Internal node locking | Frequent for writes | Rare (only splits) |
| Optimistic reading | Risky (data at any level) | Safe (data only in leaves) |
| Range scan locking | Complex multi-level | Simple leaf-to-leaf |
| Hot key contention | High (internal updates) | Lower (leaf updates only) |
| B-link optimization | Possible but complex | Natural fit |
Maintenance Operations:
Index Defragmentation:
Bulk Loading:
Online Reorganization:
Modern databases handle thousands of concurrent transactions. The B+-tree's clean separation—navigation in internals, data in leaves—enables sophisticated concurrency protocols that would be impractical for B-trees.
Despite B+-trees' clear advantages for databases, there are niche scenarios where B-trees might be considered:
Reality Check:
In practice, these scenarios rarely favor B-trees enough to overcome B+-trees' advantages:
The Result: B+-trees are used in virtually all production database systems. B-trees remain primarily in textbooks and historical contexts.
Many systems and textbooks say 'B-tree' when they actually mean 'B+-tree'. Check the actual implementation: if data is only in leaves and leaves are linked, it's a B+-tree regardless of what it's called. PostgreSQL calls its structure 'btree' but it's actually a B+-tree variant.
Let's consolidate the definitive comparison between B+-trees and B-trees:
| Dimension | B-Tree | B+-Tree | Winner |
|---|---|---|---|
| Fanout | Lower (data in internals) | Higher (10-20×) | B+-Tree |
| Tree height | Taller | Shorter | B+-Tree |
| Point query I/O | 1 to height | Always height | Tie to B+-Tree |
| Range query I/O | O(k × log n) | O(log n + k/B) | B+-Tree dramatically |
| Space efficiency | Lower (paradoxically) | Higher | B+-Tree |
| Memory caching | Less effective | More effective | B+-Tree |
| Concurrency | Complex | Simpler, faster | B+-Tree |
| Maintenance | More complex | Cleaner | B+-Tree |
| Implementation | Simpler (one node type) | Slightly more complex | B-Tree slightly |
Module Conclusion:
You've now completed a comprehensive study of B+-tree structure. You understand the formal definition, why data lives only in leaves, how leaf linking enables range queries, how internal nodes provide pure navigation, and why B+-trees definitively outperform B-trees for database workloads.
The next module will put this knowledge to use, covering B+-tree operations: search, insert, delete, and bulk loading.
Congratulations! You've completed Module 3: B+-Tree Structure. You now have a deep, world-class understanding of how B+-trees are structured and why they dominate in database systems. You can explain the advantages of data-in-leaves-only, leaf linking, and internal node optimization—and you understand definitively why B+-trees outperform B-trees for virtually all database workloads.