Loading learning content...
In 1972, Rudolf Bayer and Edward McCreight at Boeing Scientific Research Labs published a paper that would fundamentally transform how computers store and retrieve information. They introduced the B-tree—a self-balancing tree data structure designed specifically for systems that read and write large blocks of data, such as disk storage systems.
Fifty years later, some variation of B-trees powers virtually every relational database management system in production. From MySQL to PostgreSQL, Oracle to SQL Server, MongoDB to SQLite—the B-tree and its descendants form the backbone of data indexing. Understanding B-trees isn't just academic knowledge; it's understanding how databases actually work at their core.
By the end of this page, you will understand what a B-tree is, why it was invented, how it differs from binary search trees, and why its design principles remain optimal for disk-based storage systems. You'll gain the formal definition that underpins all B-tree variants used in modern databases.
Before we can appreciate B-trees, we must understand the fundamental problem they were designed to solve. This problem remains just as relevant today—perhaps more so—as when B-trees were invented.
The Disk I/O Bottleneck
When databases store millions or billions of records, they cannot fit entirely in main memory (RAM). Data must reside on persistent storage—traditionally hard disks, now often SSDs. Accessing data from disk is orders of magnitude slower than accessing data in memory:
| Storage Type | Random Access Time | Relative Speed |
|---|---|---|
| CPU Cache (L1) | ~1 nanosecond | 1× |
| Main Memory (RAM) | ~100 nanoseconds | 100× slower |
| SSD (NVMe) | ~100 microseconds | 100,000× slower |
| Hard Disk (HDD) | ~10 milliseconds | 10,000,000× slower |
This latency gap between memory and disk is the central challenge. Every disk access represents a massive performance penalty compared to in-memory computation.
For disk-based systems, the number of disk accesses (I/O operations) dominates performance—not CPU cycles. An algorithm that does 1,000 more CPU operations but requires 1 fewer disk access will almost always be faster. B-trees are designed to minimize disk I/O above all else.
Why Binary Search Trees Fail
Binary Search Trees (BSTs) provide O(log₂ n) search time—seemingly efficient. But consider what happens with disk-based storage:
This is catastrophically slow. A database serving thousands of queries per second cannot afford 300ms per query.
The Block Access Model
Here's the key insight that motivates B-trees: when you access data from disk, you don't read a single byte—you read an entire block (typically 4KB to 16KB). Reading 1 byte costs the same as reading 4,000 bytes in terms of seek time and rotational latency.
This suggests a radical redesign: instead of one key per node (as in BSTs), why not pack many keys into each node—filling an entire disk block? Each node access would then provide much more information, reducing total disk accesses.
A B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. Unlike binary trees where each node has at most 2 children, B-tree nodes can have many children—often hundreds or thousands—making the tree extremely shallow.
Etymology and Naming
The 'B' in B-tree has been a source of endless speculation. Bayer and McCreight never explicitly stated its meaning. Common theories include:
Most computer scientists accept 'Balanced' as the most fitting interpretation, as perfect balance is the B-tree's defining characteristic.
Informal Definition
Now let's define a B-tree informally:
A B-tree of order m (also called a B-tree of degree m or an m-way tree) is a tree where:
- Every node has at most m children
- Every non-leaf node (except root) has at least ⌈m/2⌉ children
- The root has at least 2 children (if it is not a leaf)
- All leaves appear at the same level (perfect balance)
- A non-leaf node with k children contains k-1 keys
The keys within each node are sorted, and they act as separators that guide searches to the appropriate subtree.
Different textbooks use different conventions. Some define 'order' as the maximum number of children (m), others as the minimum degree (t = ⌈m/2⌉). We use the maximum children convention throughout. Always clarify which definition is being used when reading other sources.
Visual Understanding
Consider a B-tree of order 5 (maximum 5 children per node):
[40 | 70]
/ | \
/ | \
[10 | 20 | 30] [50 | 60] [80 | 90 | 100]
In this example:
Each key acts as a separator: it divides the key space into regions, each covered by a different child pointer. This is the fundamental navigation mechanism in B-trees.
Now let's state the formal definition with mathematical precision. This definition is what you'll find in academic literature and database internals documentation.
Definition: B-tree of Order m
A B-tree of order m (where m ≥ 3) is a rooted tree satisfying the following properties:
Mathematical Notation
Let's formalize the node structure. For a B-tree node N:
The search invariant can be stated formally:
For any key k in the subtree rooted at cᵢ[N]:
This invariant is what makes efficient searching possible—we can eliminate entire subtrees with each comparison.
Each property exists for a reason: (1-2) ensure nodes are neither too full nor too empty, maintaining balance; (3) allows the tree to grow from a single node; (4-6) enable binary search within nodes and correct navigation; (7) guarantees O(log n) height. These constraints work together to achieve provable performance bounds.
| Order (m) | Min Keys (non-root) | Max Keys | Min Children (non-root) | Max Children |
|---|---|---|---|---|
| 3 | 1 | 2 | 2 | 3 |
| 4 | 1 | 3 | 2 | 4 |
| 5 | 2 | 4 | 3 | 5 |
| 100 | 49 | 99 | 50 | 100 |
| 1000 | 499 | 999 | 500 | 1000 |
To fully appreciate B-trees, let's systematically compare them with binary search trees—the data structure most programmers learn first.
Structural Comparison
Binary Search Trees constrain each node to have at most 2 children and store exactly 1 key. B-trees generalize this to m children and m-1 keys. This seemingly simple change has profound implications.
| Characteristic | Binary Search Tree | B-tree (order m) |
|---|---|---|
| Max children per node | 2 | m (typically 100-1000+) |
| Keys per node | 1 | up to m-1 |
| Tree height for n keys | O(log₂ n) | O(log_m n) |
| Disk accesses per search | O(log₂ n) | O(log_m n) |
| Balance guarantee | Requires AVL/Red-Black | Built-in (always balanced) |
| Node size | Fixed (small) | Variable (fills disk block) |
| Cache/disk efficiency | Poor | Excellent |
| Insertion complexity | O(log n) amortized | O(log n) worst-case |
Height Comparison Example
Consider storing 1 billion records (n = 10⁹):
Binary Search Tree (balanced):
B-tree (order 1000):
This is a 10× reduction in disk I/O. In practical terms, the difference between 300ms and 30ms response time—the difference between usable and unusable.
Memory Hierarchy Efficiency
B-trees are designed around how memory hierarchies actually work:
Block-oriented access: Reading one byte from disk costs almost the same as reading 4,096 bytes. B-tree nodes are sized to match disk blocks, so each I/O retrieves maximum useful data.
Prefetching: Modern CPUs and disk controllers prefetch sequential data. B-tree nodes contain contiguous keys, enabling efficient prefetching.
Cache efficiency: Once a node is loaded into memory, searching within it (using binary search) is extremely fast—pure CPU work with no additional I/O.
B-trees trade CPU work (more comparisons within a node) for reduced I/O (fewer nodes accessed). Since I/O is 100,000× slower than CPU operations, this tradeoff is overwhelmingly beneficial for disk-based systems.
The fanout of a tree node is the number of children it has. In B-trees, high fanout is the source of their power. Let's explore why mathematically.
Fanout and Height Relationship
For a B-tree with minimum degree t (where each non-root node has at least t children):
For a tree with n keys and height h:
n ≥ 1 + (t-1) × Σᵢ₌₁ʰ 2t^(i-1) = 1 + 2(t-1) × (t^h - 1)/(t-1) = 2t^h - 1
Solving for h:
h ≤ log_t((n+1)/2)
This proves that height grows logarithmically with base t, not base 2.
| Fanout (t) | Maximum Height | Disk Accesses |
|---|---|---|
| 2 (BST) | 30 | 30 |
| 10 | 10 | 10 |
| 100 | 5 | 5 |
| 500 | 4 | 4 |
| 1000 | 3 | 3 |
Real-World Fanout Calculation
Let's calculate realistic fanout for a database system:
Assumptions:
Available space per node: 4,096 - 16 = 4,080 bytes
Each key-pointer pair: 8 + 8 = 16 bytes
Maximum entries per node: 4,080 / 16 = 255 entries
With a fanout of 255:
This means a billion-record table can be searched with just 5 disk accesses. The root node is almost always cached in memory, reducing this to 4 disk accesses in practice.
Production databases keep the B-tree root node (and often the first 1-2 levels) permanently in memory. For a billion-record table, this means most searches require only 2-3 disk accesses, not 5. This optimization is essentially free since the upper levels represent a tiny fraction of total index size.
B-trees were invented in 1972—over 50 years ago. In that time, we've seen revolutions in hardware, storage technology, and algorithm design. Yet B-trees remain the dominant index structure. Why?
Adaptability to Hardware Evolution
B-trees have proven remarkably adaptable:
Optimal Balance of Properties
B-trees hit a sweet spot that no other data structure matches:
Alternatives and Trade-offs
Other index structures exist, but each sacrifices something B-trees provide:
| Structure | What It Improves | What It Sacrifices |
|---|---|---|
| Hash Index | O(1) point lookups | Range queries impossible |
| Skip List | Simpler implementation | Higher memory overhead |
| LSM-Tree | Write performance | Read performance, space amplification |
| Trie | Prefix queries | Space efficiency, general-purpose use |
For general-purpose indexing where both point queries and range scans are needed, B-trees remain unmatched.
Engineering Maturity
Fifty years of implementation have produced extremely optimized B-tree libraries. Every edge case has been encountered, every bug has been fixed, every optimization has been discovered. This accumulated engineering wisdom makes B-trees not just theoretically sound but also practically proven.
B-trees are a rare case of an algorithm that was essentially 'right' from the start. The basic design from 1972 is still optimal. Improvements like B+-trees (which we'll study next) are refinements, not replacements. This stability is a testament to the elegance of the original design.
Understanding what B-trees are requires dispelling what they are not. Let's address common misconceptions that can hinder learning.
A critical distinction: Classic B-trees store data in ALL nodes. B+-trees (covered in subsequent modules) store data ONLY in leaves. Most databases use B+-trees, but call them 'B-trees'. When reading database documentation, assume B+-tree unless explicitly stated otherwise.
We've established the foundational understanding of what B-trees are and why they matter. Let's consolidate the key concepts:
What's Next
Now that we understand what a B-tree is, the next page explores the specific properties that make B-trees work—the constraints on node sizes, key counts, and structural relationships that together guarantee logarithmic performance. These properties are the rules that maintain balance as the tree grows and shrinks.
You now understand the B-tree definition: a self-balancing, high-fanout tree designed to minimize disk I/O. You can explain why B-trees replaced binary search trees for disk-based indexing and state the formal properties that define a valid B-tree. Next, we'll dive deeper into these properties and understand how they work together to guarantee performance.