Loading learning content...
It's remarkable: virtually every major database system—PostgreSQL, MySQL, Oracle, SQL Server, SQLite, MongoDB, and countless others—uses B-trees (or B+ trees) as their primary indexing structure. This isn't coincidence or inertia; it's the result of B-trees being the optimal choice for the access patterns databases must support.
Understanding why databases choose B-trees gives you insight into database internals that translates directly to better index design, query optimization, and architectural decisions. This knowledge separates developers who use databases from those who truly understand them.
By the end of this page, you will understand why B-trees dominate database indexing, how major database systems implement them, what makes B-trees superior to alternatives like hash indexes for most workloads, and how to apply this knowledge to design effective database indexes.
To appreciate why B-trees won, we must first understand what databases need from an index.
The fundamental challenge:
Databases store data in tables—often containing millions or billions of rows. When a query asks for specific rows (e.g., SELECT * FROM users WHERE id = 12345), the database has two options:
For large tables, the difference is enormous. A billion-row table might require reading terabytes of data for a full scan, versus kilobytes for an indexed lookup.
What a database index must support:
column = valuecolumn BETWEEN a AND bORDER BY)LIKE 'prefix%')B-trees uniquely satisfy ALL these requirements well. No other data structure comes close to matching B-trees across this entire spectrum of needs. This is why they dominate despite being 50+ years old.
Hash tables offer O(1) lookups—faster than B-trees' O(log n). So why don't databases use them?
Hash indexes exist but have limitations:
Many databases offer hash indexes (MySQL's MEMORY engine, PostgreSQL's hash index, etc.), but they're rarely the default because:
| Query Type | B-Tree | Hash Index |
|---|---|---|
| Equality (=) | O(log n) ✓ | O(1) ✓ |
| Range (BETWEEN) | O(log n + k) ✓ | Full scan ✗ |
| Less than (<, <=) | O(log n + k) ✓ | Full scan ✗ |
| Greater than (>, >=) | O(log n + k) ✓ | Full scan ✗ |
| ORDER BY | Index scan ✓ | Sort required ✗ |
| MIN/MAX | O(log n) ✓ | O(n) ✗ |
| LIKE 'prefix%' | Index scan ✓ | Full scan ✗ |
| LIKE '%suffix' | Full scan | Full scan |
The range query killer:
Most database queries involve ranges. Consider real applications:
Hash indexes are useless for all of these—they'd require scanning the entire table. B-trees handle them effortlessly.
Other hash index limitations:
No ordering: Hash tables scatter data randomly, so retrieving sorted results requires a separate sort operation
Higher memory usage: Hash tables need load factor < 1 for performance, wasting space. B-trees achieve 50-69% utilization.
Worst-case degradation: Hash collisions can cause O(n) performance. B-trees guarantee O(log n) always.
Difficult resizing: Hash tables need expensive rehashing when growing. B-trees grow gracefully.
Poor for composite keys: Multi-column lookups work poorly with hashing but naturally with B-trees.
When hash indexes make sense:
B-trees aren't the only balanced tree structure. What about binary search trees, AVL trees, or red-black trees?
Binary-based trees on disk:
We explored this earlier, but let's quantify the problem. For a database with 1 billion rows indexed by a binary balanced tree:
With a B-tree (order 1000):
That's 10× better throughput from data structure choice alone.
What about LSM-Trees?
Log-Structured Merge-trees (LSM-trees) are a significant alternative, used by:
LSM-tree trade-offs:
| Aspect | B-Tree | LSM-Tree |
|---|---|---|
| Write speed | Slower (random I/O) | Faster (sequential I/O) |
| Read speed | Faster (single location) | Slower (check multiple levels) |
| Space amplification | Lower | Higher (multiple copies during compaction) |
| Write amplification | Lower for updates | Higher (rewrite during compaction) |
| Range queries | Excellent | Good (but requires merging) |
When LSM-trees win:
When B-trees win:
Most relational databases remain B-tree focused because transactional workloads are typically read-heavy with updates to existing records.
Let's examine how major database systems implement B-trees:
PostgreSQL:
MySQL/InnoDB:
SQLite:
Oracle:
MongoDB (WiredTiger):
A clustered index stores actual table data in leaf nodes (the table IS the B-tree). Non-clustered indexes store row pointers (or primary key values) in leaves. Most tables have one clustered index (usually on primary key) and multiple non-clustered secondary indexes.
Understanding B-tree internals directly improves your index design.
Principle 1: Column order matters for composite indexes
A B-tree index on (last_name, first_name) orders by last_name first, then first_name within each last_name.
-- Can use the index efficiently:
SELECT * FROM users WHERE last_name = 'Smith'; ✓
SELECT * FROM users WHERE last_name = 'Smith' AND first_name = 'John'; ✓
SELECT * FROM users WHERE last_name > 'M'; ✓
-- Cannot use the composite index efficiently:
SELECT * FROM users WHERE first_name = 'John'; ✗
-- (Would need separate index on first_name)
This is the leftmost prefix rule: composite indexes support queries on any leftmost prefix of columns.
Principle 2: Range conditions stop prefix matching
Once you use a range condition on a column, subsequent columns in the index cannot help with filtering:
-- Index: (country, age, name)
SELECT * FROM users WHERE country = 'USA' AND age > 25 AND name = 'Alice';
-- Index usage:
-- country = 'USA' → Index navigates to 'USA' section ✓
-- age > 25 → Index scans ages > 25 within 'USA' ✓
-- name = 'Alice' → Cannot use index! All age>25 rows examined ✗
-- (filter applied after fetching rows)
Put equality columns before range columns in composite indexes.
Principle 3: Covering indexes eliminate table lookups
If an index contains ALL columns a query needs, the database can answer entirely from the index—never reading the actual table rows.
-- Index: (customer_id, order_date, total)
SELECT order_date, total
FROM orders
WHERE customer_id = 12345;
-- This is a "covering" query:
-- All needed columns (order_date, total) are in the index
-- No need to fetch the full row from the table heap
-- Dramatic speedup for wide tables with many columns
Principle 4: Selectivity affects index usefulness
B-tree indexes work best for high-cardinality columns (many distinct values):
user_id (millions of distinct values) → Excellent for B-treecountry (200 distinct values) → Good for B-treeis_active (2 values: true/false) → Poor for B-tree, consider bitmap index or no indexFor optimal composite index design, order columns as: Equality conditions first (E), then Sort order columns (S), then Range conditions (R). This maximizes the index's usefulness for both filtering and sorting.
Real-world B-tree performance depends on factors beyond algorithmic complexity:
Buffer pool effectiveness:
Databases cache frequently-used B-tree pages in memory. A properly-sized buffer pool can hold:
With upper levels cached, a 4-level B-tree might require only 1 disk read per lookup (leaf only).
Index selectivity:
If a query matches many rows, even an indexed lookup might devolve to reading much of the table:
-- If 90% of orders are 'completed', this is barely better than full scan:
SELECT * FROM orders WHERE status = 'completed';
-- The query optimizer might choose full scan over index anyway
Write amplification:
B-tree modifications can be costly:
Fragmentation over time:
As B-trees evolve through insertions and deletions:
REINDEX or VACUUM commandsReal numbers (typical):
| Operation | Time (SSD) | Time (HDD) |
|---|---|---|
| Single key lookup | 0.1-0.5 ms | 2-15 ms |
| Range scan (1000 rows) | 1-5 ms | 10-100 ms |
| Single insert | 0.5-2 ms | 10-30 ms |
| Bulk insert (1000 rows) | 10-50 ms | 200-500 ms |
Every index speeds up reads but slows down writes. Each write must update every relevant index. Tables with 10+ indexes may have 10× slower inserts than unindexed tables. Balance is essential—index what you query, not everything.
B-trees aren't just for relational databases—they power diverse storage systems:
File systems:
Key-value stores:
Search engines:
Version control:
Why B-trees appear everywhere:
When building any system that needs persistent, ordered, indexed data—consider a B-tree. It's rarely the wrong choice.
While B-trees remain dominant, research continues to improve them and explore alternatives:
B-tree optimizations:
Fractal trees: Buffer intermediate nodes to convert random writes to sequential (TokuDB, PerconaFT)
Bw-trees: Lock-free B-trees using atomic operations (SQL Server Hekaton, Azure CosmosDB)
FAST: SIMD-optimized B-tree search using CPU vector instructions
Adaptive indexing: Build indexes incrementally as queries reveal access patterns (Database cracking)
Write-optimized B-trees: Combine B-tree structure with log-structured writes
Alternative index structures:
| Structure | Best For | Limitations |
|---|---|---|
| LSM-trees | Write-heavy workloads | Read amplification |
| Hash indexes | Pure equality | No range queries |
| Bitmap indexes | Low-cardinality analytics | Poor for updates |
| R-trees | Spatial data (GIS) | Complex balancing |
| Tries | String prefixes | Memory overhead |
| Bloom filters | Negative lookups | False positives, no retrieval |
The lesson:
B-trees are the generalist—good at everything, optimal for ordered access patterns. Specialists beat them in narrow domains:
But for general-purpose database indexing? B-trees remain unmatched.
Modern systems often use multiple structures: B-tree primary indexes, hash for hot key caching, bloom filters to avoid unnecessary B-tree lookups, and specialized indexes for specific query types. Understanding each structure helps you design optimal hybrid solutions.
Your understanding of B-trees translates to concrete skills:
Query optimization:
Index design:
Database selection:
System design interviews:
B-tree knowledge is valuable for:
Example interview question:
"Design a system to store and query 1 billion user sessions, supporting lookups by user ID and time range queries."
B-tree-informed answer:
We've connected B-tree theory to production database reality. Let's consolidate the key insights:
Module Complete:
You've journeyed from the limitations of binary trees on disk, through the elegant simplicity of 2-3 trees, into the industrial-strength complexity of B-trees, and finally to their practical application in production databases.
This knowledge fundamentally changes how you interact with databases. You now understand why queries are fast or slow, why certain index designs work better, and why databases make the architectural choices they do.
Congratulations! You've mastered the conceptual foundations of 2-3 trees and B-trees. You understand why multi-way search trees exist, how they achieve balance through splitting and merging, why they dominate disk-based storage, and how this knowledge applies to real database systems. This is the knowledge that separates developers who use databases from those who understand them.