Data Structures & Algorithms2-3 Trees & B-Trees

2-3 Trees & B-Trees — Conceptual Introduction

LevelIntermediate

Duration60 mins

Topic2-3 Trees & B-Trees

4 / 4

Why Databases Use B-Trees

The Universal Choice of Database Systems

It's remarkable: virtually every major database system—PostgreSQL, MySQL, Oracle, SQL Server, SQLite, MongoDB, and countless others—uses B-trees (or B+ trees) as their primary indexing structure. This isn't coincidence or inertia; it's the result of B-trees being the optimal choice for the access patterns databases must support.

Understanding why databases choose B-trees gives you insight into database internals that translates directly to better index design, query optimization, and architectural decisions. This knowledge separates developers who use databases from those who truly understand them.

What You Will Learn

By the end of this page, you will understand why B-trees dominate database indexing, how major database systems implement them, what makes B-trees superior to alternatives like hash indexes for most workloads, and how to apply this knowledge to design effective database indexes.

The Database Indexing Problem

To appreciate why B-trees won, we must first understand what databases need from an index.

The fundamental challenge:

Databases store data in tables—often containing millions or billions of rows. When a query asks for specific rows (e.g., SELECT * FROM users WHERE id = 12345), the database has two options:

Full table scan: Read every row, checking each against the condition. Cost: O(n) disk reads.
Indexed lookup: Use a data structure to jump directly to matching rows. Cost: O(log n) disk reads.

For large tables, the difference is enormous. A billion-row table might require reading terabytes of data for a full scan, versus kilobytes for an indexed lookup.

What a database index must support:

Equality queries: Find rows where column = value
Range queries: Find rows where column BETWEEN a AND b
Ordered iteration: Retrieve rows in sorted order (for ORDER BY)
Prefix matching: Find strings starting with a pattern (for LIKE 'prefix%')
Minimum/Maximum: Find the smallest or largest value
Efficient updates: Handle inserts, updates, and deletes without rebuilding
Concurrent access: Multiple queries and transactions simultaneously
Crash recovery: Survive system failures without corruption
Disk efficiency: Minimize I/O operations

The Key Insight

B-trees uniquely satisfy ALL these requirements well. No other data structure comes close to matching B-trees across this entire spectrum of needs. This is why they dominate despite being 50+ years old.

Why Not Hash Tables?

Hash tables offer O(1) lookups—faster than B-trees' O(log n). So why don't databases use them?

Hash indexes exist but have limitations:

Many databases offer hash indexes (MySQL's MEMORY engine, PostgreSQL's hash index, etc.), but they're rarely the default because:

B-Tree vs. Hash Index Capabilities
Query Type	B-Tree	Hash Index
Equality (=)	O(log n) ✓	O(1) ✓
Range (BETWEEN)	O(log n + k) ✓	Full scan ✗
Less than (<, <=)	O(log n + k) ✓	Full scan ✗
Greater than (>, >=)	O(log n + k) ✓	Full scan ✗
ORDER BY	Index scan ✓	Sort required ✗
MIN/MAX	O(log n) ✓	O(n) ✗
LIKE 'prefix%'	Index scan ✓	Full scan ✗
LIKE '%suffix'	Full scan	Full scan

The range query killer:

Most database queries involve ranges. Consider real applications:

"Show orders from the last 30 days" → range on date
"Find products priced $50-$100" → range on price
"Get users registered after 2023" → range on timestamp
"Retrieve log entries for this hour" → range on time
"Show results page 5" → implicit range with LIMIT/OFFSET

Hash indexes are useless for all of these—they'd require scanning the entire table. B-trees handle them effortlessly.

Other hash index limitations:

No ordering: Hash tables scatter data randomly, so retrieving sorted results requires a separate sort operation
Higher memory usage: Hash tables need load factor < 1 for performance, wasting space. B-trees achieve 50-69% utilization.
Worst-case degradation: Hash collisions can cause O(n) performance. B-trees guarantee O(log n) always.
Difficult resizing: Hash tables need expensive rehashing when growing. B-trees grow gracefully.
Poor for composite keys: Multi-column lookups work poorly with hashing but naturally with B-trees.

When hash indexes make sense:

Pure equality lookups only (primary key joins)
In-memory tables where range queries aren't needed
Specific workloads benchmarked to show benefit

Why Not Other Trees?

B-trees aren't the only balanced tree structure. What about binary search trees, AVL trees, or red-black trees?

Binary-based trees on disk:

We explored this earlier, but let's quantify the problem. For a database with 1 billion rows indexed by a binary balanced tree:

Tree height: log₂(10⁹) ≈ 30 levels
Each level = 1 disk I/O at ~10ms (HDD)
Single lookup: 30 × 10ms = 300ms
Queries per second: ~3

With a B-tree (order 1000):

Tree height: log₁₀₀₀(10⁹) = 3 levels
Single lookup: 3 × 10ms = 30ms
Queries per second: ~33

That's 10× better throughput from data structure choice alone.

What about LSM-Trees?

Log-Structured Merge-trees (LSM-trees) are a significant alternative, used by:

LevelDB / RocksDB
Apache Cassandra
Apache HBase
MongoDB (WiredTiger's default)
InfluxDB

LSM-tree trade-offs:

Aspect	B-Tree	LSM-Tree
Write speed	Slower (random I/O)	Faster (sequential I/O)
Read speed	Faster (single location)	Slower (check multiple levels)
Space amplification	Lower	Higher (multiple copies during compaction)
Write amplification	Lower for updates	Higher (rewrite during compaction)
Range queries	Excellent	Good (but requires merging)

When LSM-trees win:

Write-heavy workloads
SSD storage (less sensitive to random writes)
Append-mostly data (logs, time series)

When B-trees win:

Read-heavy workloads
Update-heavy workloads (same key modified repeatedly)
Strict latency requirements
Traditional transactional workloads (OLTP)

Most relational databases remain B-tree focused because transactional workloads are typically read-heavy with updates to existing records.

Real Database Implementations

Let's examine how major database systems implement B-trees:

PostgreSQL:

Uses B+ trees for all standard indexes
Page size: 8KB (configurable)
Supports both heap tables and index-organized tables (via covering indexes)
Implements sophisticated concurrency with lehman-yao B-link trees
Visibility information stored with tuples for MVCC
Supports partial indexes (index only rows matching a condition)

MySQL/InnoDB:

Clustered B+ tree index stores table data (primary key index)
Secondary indexes store primary key values (extra lookup for full row)
Page size: 16KB default (configurable 4KB-64KB)
Change buffer for efficient secondary index updates
Adaptive hash index built automatically for hot pages

SQLite:

B-tree for indexes, B+ tree variant for tables
Page size: 4KB default (512 bytes to 64KB)
Designed for simplicity and single-user embedded use
Entire database in a single file
Interesting: uses B-trees even for small databases where simpler structures might suffice—consistency matters

Oracle:

B+ tree primary indexing structure
Sophisticated caching and buffer pool
Index-organized tables option
Bitmap indexes for low-cardinality columns (complement B-trees)
Cluster indexes for related tables

MongoDB (WiredTiger):

B+ tree is one storage engine option
Also supports LSM-tree based storage
Configurable per-collection
Document structure stored in leaves

Clustered vs. Non-Clustered Indexes

A clustered index stores actual table data in leaf nodes (the table IS the B-tree). Non-clustered indexes store row pointers (or primary key values) in leaves. Most tables have one clustered index (usually on primary key) and multiple non-clustered secondary indexes.

B-Tree Index Design Principles

Understanding B-tree internals directly improves your index design.

Principle 1: Column order matters for composite indexes

A B-tree index on (last_name, first_name) orders by last_name first, then first_name within each last_name.

-- Can use the index efficiently:
SELECT * FROM users WHERE last_name = 'Smith';                    ✓
SELECT * FROM users WHERE last_name = 'Smith' AND first_name = 'John';  ✓
SELECT * FROM users WHERE last_name > 'M';                         ✓

-- Cannot use the composite index efficiently:
SELECT * FROM users WHERE first_name = 'John';                     ✗
-- (Would need separate index on first_name)

This is the leftmost prefix rule: composite indexes support queries on any leftmost prefix of columns.

Principle 2: Range conditions stop prefix matching

Once you use a range condition on a column, subsequent columns in the index cannot help with filtering:

-- Index: (country, age, name)
SELECT * FROM users WHERE country = 'USA' AND age > 25 AND name = 'Alice';

-- Index usage:
-- country = 'USA'  → Index navigates to 'USA' section        ✓
-- age > 25         → Index scans ages > 25 within 'USA'      ✓
-- name = 'Alice'   → Cannot use index! All age>25 rows examined ✗
--                    (filter applied after fetching rows)

Put equality columns before range columns in composite indexes.

Principle 3: Covering indexes eliminate table lookups

If an index contains ALL columns a query needs, the database can answer entirely from the index—never reading the actual table rows.

-- Index: (customer_id, order_date, total)
SELECT order_date, total 
FROM orders 
WHERE customer_id = 12345;

-- This is a "covering" query:
-- All needed columns (order_date, total) are in the index
-- No need to fetch the full row from the table heap
-- Dramatic speedup for wide tables with many columns

Principle 4: Selectivity affects index usefulness

B-tree indexes work best for high-cardinality columns (many distinct values):

user_id (millions of distinct values) → Excellent for B-tree
country (200 distinct values) → Good for B-tree
is_active (2 values: true/false) → Poor for B-tree, consider bitmap index or no index

The ESR Rule

For optimal composite index design, order columns as: Equality conditions first (E), then Sort order columns (S), then Range conditions (R). This maximizes the index's usefulness for both filtering and sorting.

Performance Characteristics in Practice

Real-world B-tree performance depends on factors beyond algorithmic complexity:

Buffer pool effectiveness:

Databases cache frequently-used B-tree pages in memory. A properly-sized buffer pool can hold:

The entire index (ideal case)
Upper levels of the tree (common case)
Nothing (pathological case—too small)

With upper levels cached, a 4-level B-tree might require only 1 disk read per lookup (leaf only).

Index selectivity:

If a query matches many rows, even an indexed lookup might devolve to reading much of the table:

-- If 90% of orders are 'completed', this is barely better than full scan:
SELECT * FROM orders WHERE status = 'completed';

-- The query optimizer might choose full scan over index anyway

Write amplification:

B-tree modifications can be costly:

Insert may split nodes (writing 3+ pages)
Delete may rebalance (writing multiple pages)
Updates to indexed columns require delete + insert
Write-ahead logging doubles the work (log first, then data)

Fragmentation over time:

As B-trees evolve through insertions and deletions:

Pages become partially empty (space waste)
Leaf pages end up scattered on disk (poor sequential scan)
Solution: periodic REINDEX or VACUUM commands

Real numbers (typical):

Operation	Time (SSD)	Time (HDD)
Single key lookup	0.1-0.5 ms	2-15 ms
Range scan (1000 rows)	1-5 ms	10-100 ms
Single insert	0.5-2 ms	10-30 ms
Bulk insert (1000 rows)	10-50 ms	200-500 ms

Indexing Costs

Every index speeds up reads but slows down writes. Each write must update every relevant index. Tables with 10+ indexes may have 10× slower inserts than unindexed tables. Balance is essential—index what you query, not everything.

B-Trees in NoSQL and Beyond

B-trees aren't just for relational databases—they power diverse storage systems:

File systems:

NTFS (Windows): Master File Table uses B+ trees
HFS+ (macOS): Catalog file is a B-tree
Btrfs (Linux): Copy-on-write B-trees throughout
ReFS (Windows): B+ trees for metadata

Key-value stores:

BerkeleyDB: One of the original embedded B-tree databases
LMDB: Memory-mapped B+ trees with MVCC
BoltDB (Go): B+ tree in a single file

Search engines:

Lucene/Elasticsearch: B-tree-like structures for term dictionaries
Used to map search terms to document posting lists

Version control:

Git: Packfiles use B-tree-like index structures
Object lookups use sorted structures with binary search

Why B-trees appear everywhere:

Proven correctness: 50+ years of research and production use
Predictable performance: No pathological cases in balanced variants
Disk optimization: Designed for block-based storage
Range support: Natural ordering enables powerful queries
Concurrency support: Well-understood locking protocols
Standards and libraries: Mature implementations available

When building any system that needs persistent, ordered, indexed data—consider a B-tree. It's rarely the wrong choice.

Modern Innovations and Alternatives

While B-trees remain dominant, research continues to improve them and explore alternatives:

B-tree optimizations:

Fractal trees: Buffer intermediate nodes to convert random writes to sequential (TokuDB, PerconaFT)
Bw-trees: Lock-free B-trees using atomic operations (SQL Server Hekaton, Azure CosmosDB)
FAST: SIMD-optimized B-tree search using CPU vector instructions
Adaptive indexing: Build indexes incrementally as queries reveal access patterns (Database cracking)
Write-optimized B-trees: Combine B-tree structure with log-structured writes

Alternative index structures:

Structure	Best For	Limitations
LSM-trees	Write-heavy workloads	Read amplification
Hash indexes	Pure equality	No range queries
Bitmap indexes	Low-cardinality analytics	Poor for updates
R-trees	Spatial data (GIS)	Complex balancing
Tries	String prefixes	Memory overhead
Bloom filters	Negative lookups	False positives, no retrieval

The lesson:

B-trees are the generalist—good at everything, optimal for ordered access patterns. Specialists beat them in narrow domains:

LSM-trees for append-heavy logs
Hash for pure key-value caches
R-trees for geographic queries

But for general-purpose database indexing? B-trees remain unmatched.

Hybrid Approaches

Modern systems often use multiple structures: B-tree primary indexes, hash for hot key caching, bloom filters to avoid unnecessary B-tree lookups, and specialized indexes for specific query types. Understanding each structure helps you design optimal hybrid solutions.

Applying Your B-Tree Knowledge

Your understanding of B-trees translates to concrete skills:

Query optimization:

Read execution plans and understand "Index Scan" vs. "Index Seek" vs. "Full Table Scan"
Recognize when adding an index will help (or not)
Understand why some queries are slow despite indexes (range on non-leading column, low selectivity, etc.)

Index design:

Create composite indexes with optimal column ordering
Use covering indexes for frequent queries
Balance read speedup against write overhead

Database selection:

Choose between B-tree and LSM-tree storage engines for your workload
Configure page sizes and buffer pools appropriately
Understand storage engine trade-offs (InnoDB vs. RocksDB, etc.)

System design interviews:

B-tree knowledge is valuable for:

Designing scalable data stores
Estimating query performance and throughput
Choosing between storage systems
Explaining how databases achieve their performance guarantees

Example interview question:

"Design a system to store and query 1 billion user sessions, supporting lookups by user ID and time range queries."

B-tree-informed answer:

Use a clustered B+ tree index on (user_id, timestamp)
Lookups by user_id: O(log n) to find user, scan sessions in order
Time range within user: Direct traversal of leaf pages
If write-heavy, consider LSM-tree based store with B-tree-like queries
Size nodes for disk block optimization
Cache upper tree levels in memory

Summary: Why Databases Use B-Trees

We've connected B-tree theory to production database reality. Let's consolidate the key insights:

Key Takeaways

•B-trees satisfy all database needs: Equality, range, ordering, prefix, min/max, updates, concurrency, recovery—no other structure covers everything.
•Hash tables fail on range queries: The most common database queries involve ranges, making hash indexes limited to specific use cases.
•Binary trees fail on disk: The height difference (30 vs. 3 levels) translates to 10× performance difference for disk-based workloads.
•Every major database uses B-trees: PostgreSQL, MySQL, Oracle, SQL Server, SQLite, MongoDB—universal adoption for universal reasons.
•Index design follows B-tree structure: Column ordering, range condition placement, and covering indexes all derive from B-tree mechanics.
•LSM-trees are the main alternative: For write-heavy workloads, but most transactional systems favor B-trees.
•B-trees power more than databases: File systems, key-value stores, search engines—anywhere ordered persistent data matters.
•This knowledge is directly applicable: Better queries, better indexes, better architecture, better interviews.

Module Complete:

You've journeyed from the limitations of binary trees on disk, through the elegant simplicity of 2-3 trees, into the industrial-strength complexity of B-trees, and finally to their practical application in production databases.

This knowledge fundamentally changes how you interact with databases. You now understand why queries are fast or slow, why certain index designs work better, and why databases make the architectural choices they do.

Module Complete

Congratulations! You've mastered the conceptual foundations of 2-3 trees and B-trees. You understand why multi-way search trees exist, how they achieve balance through splitting and merging, why they dominate disk-based storage, and how this knowledge applies to real database systems. This is the knowledge that separates developers who use databases from those who understand them.

4 / 4

Loading learning content...

Data Structures & Algorithms2-3 Trees & B-Trees

2-3 Trees & B-Trees — Conceptual Introduction

LevelIntermediate

Duration60 mins

Topic2-3 Trees & B-Trees

4 / 4

Why Databases Use B-Trees

The Universal Choice of Database Systems

What You Will Learn

The Database Indexing Problem

To appreciate why B-trees won, we must first understand what databases need from an index.

The fundamental challenge:

Databases store data in tables—often containing millions or billions of rows. When a query asks for specific rows (e.g., SELECT * FROM users WHERE id = 12345), the database has two options:

Full table scan: Read every row, checking each against the condition. Cost: O(n) disk reads.
Indexed lookup: Use a data structure to jump directly to matching rows. Cost: O(log n) disk reads.

For large tables, the difference is enormous. A billion-row table might require reading terabytes of data for a full scan, versus kilobytes for an indexed lookup.

What a database index must support:

Equality queries: Find rows where column = value
Range queries: Find rows where column BETWEEN a AND b
Ordered iteration: Retrieve rows in sorted order (for ORDER BY)
Prefix matching: Find strings starting with a pattern (for LIKE 'prefix%')
Minimum/Maximum: Find the smallest or largest value
Efficient updates: Handle inserts, updates, and deletes without rebuilding
Concurrent access: Multiple queries and transactions simultaneously
Crash recovery: Survive system failures without corruption
Disk efficiency: Minimize I/O operations

The Key Insight

Why Not Hash Tables?

Hash tables offer O(1) lookups—faster than B-trees' O(log n). So why don't databases use them?

Hash indexes exist but have limitations:

Many databases offer hash indexes (MySQL's MEMORY engine, PostgreSQL's hash index, etc.), but they're rarely the default because:

B-Tree vs. Hash Index Capabilities
Query Type	B-Tree	Hash Index
Equality (=)	O(log n) ✓	O(1) ✓
Range (BETWEEN)	O(log n + k) ✓	Full scan ✗
Less than (<, <=)	O(log n + k) ✓	Full scan ✗
Greater than (>, >=)	O(log n + k) ✓	Full scan ✗
ORDER BY	Index scan ✓	Sort required ✗
MIN/MAX	O(log n) ✓	O(n) ✗
LIKE 'prefix%'	Index scan ✓	Full scan ✗
LIKE '%suffix'	Full scan	Full scan

The range query killer:

Most database queries involve ranges. Consider real applications:

"Show orders from the last 30 days" → range on date
"Find products priced $50-$100" → range on price
"Get users registered after 2023" → range on timestamp
"Retrieve log entries for this hour" → range on time
"Show results page 5" → implicit range with LIMIT/OFFSET

Hash indexes are useless for all of these—they'd require scanning the entire table. B-trees handle them effortlessly.

Other hash index limitations:

No ordering: Hash tables scatter data randomly, so retrieving sorted results requires a separate sort operation
Higher memory usage: Hash tables need load factor < 1 for performance, wasting space. B-trees achieve 50-69% utilization.
Worst-case degradation: Hash collisions can cause O(n) performance. B-trees guarantee O(log n) always.
Difficult resizing: Hash tables need expensive rehashing when growing. B-trees grow gracefully.
Poor for composite keys: Multi-column lookups work poorly with hashing but naturally with B-trees.

When hash indexes make sense:

Pure equality lookups only (primary key joins)
In-memory tables where range queries aren't needed
Specific workloads benchmarked to show benefit

Why Not Other Trees?

B-trees aren't the only balanced tree structure. What about binary search trees, AVL trees, or red-black trees?

Binary-based trees on disk:

We explored this earlier, but let's quantify the problem. For a database with 1 billion rows indexed by a binary balanced tree:

Tree height: log₂(10⁹) ≈ 30 levels
Each level = 1 disk I/O at ~10ms (HDD)
Single lookup: 30 × 10ms = 300ms
Queries per second: ~3

With a B-tree (order 1000):

Tree height: log₁₀₀₀(10⁹) = 3 levels
Single lookup: 3 × 10ms = 30ms
Queries per second: ~33

That's 10× better throughput from data structure choice alone.

What about LSM-Trees?

Log-Structured Merge-trees (LSM-trees) are a significant alternative, used by:

LevelDB / RocksDB
Apache Cassandra
Apache HBase
MongoDB (WiredTiger's default)
InfluxDB

LSM-tree trade-offs:

Aspect	B-Tree	LSM-Tree
Write speed	Slower (random I/O)	Faster (sequential I/O)
Read speed	Faster (single location)	Slower (check multiple levels)
Space amplification	Lower	Higher (multiple copies during compaction)
Write amplification	Lower for updates	Higher (rewrite during compaction)
Range queries	Excellent	Good (but requires merging)

When LSM-trees win:

Write-heavy workloads
SSD storage (less sensitive to random writes)
Append-mostly data (logs, time series)

When B-trees win:

Read-heavy workloads
Update-heavy workloads (same key modified repeatedly)
Strict latency requirements
Traditional transactional workloads (OLTP)

Most relational databases remain B-tree focused because transactional workloads are typically read-heavy with updates to existing records.

Real Database Implementations

Let's examine how major database systems implement B-trees:

PostgreSQL:

Uses B+ trees for all standard indexes
Page size: 8KB (configurable)
Supports both heap tables and index-organized tables (via covering indexes)
Implements sophisticated concurrency with lehman-yao B-link trees
Visibility information stored with tuples for MVCC
Supports partial indexes (index only rows matching a condition)

MySQL/InnoDB:

Clustered B+ tree index stores table data (primary key index)
Secondary indexes store primary key values (extra lookup for full row)
Page size: 16KB default (configurable 4KB-64KB)
Change buffer for efficient secondary index updates
Adaptive hash index built automatically for hot pages

SQLite:

B-tree for indexes, B+ tree variant for tables
Page size: 4KB default (512 bytes to 64KB)
Designed for simplicity and single-user embedded use
Entire database in a single file
Interesting: uses B-trees even for small databases where simpler structures might suffice—consistency matters

Oracle:

B+ tree primary indexing structure
Sophisticated caching and buffer pool
Index-organized tables option
Bitmap indexes for low-cardinality columns (complement B-trees)
Cluster indexes for related tables

MongoDB (WiredTiger):

B+ tree is one storage engine option
Also supports LSM-tree based storage
Configurable per-collection
Document structure stored in leaves

Clustered vs. Non-Clustered Indexes

B-Tree Index Design Principles

Understanding B-tree internals directly improves your index design.

Principle 1: Column order matters for composite indexes

A B-tree index on (last_name, first_name) orders by last_name first, then first_name within each last_name.

-- Can use the index efficiently:
SELECT * FROM users WHERE last_name = 'Smith';                    ✓
SELECT * FROM users WHERE last_name = 'Smith' AND first_name = 'John';  ✓
SELECT * FROM users WHERE last_name > 'M';                         ✓

-- Cannot use the composite index efficiently:
SELECT * FROM users WHERE first_name = 'John';                     ✗
-- (Would need separate index on first_name)

This is the leftmost prefix rule: composite indexes support queries on any leftmost prefix of columns.

Principle 2: Range conditions stop prefix matching

Once you use a range condition on a column, subsequent columns in the index cannot help with filtering:

-- Index: (country, age, name)
SELECT * FROM users WHERE country = 'USA' AND age > 25 AND name = 'Alice';

-- Index usage:
-- country = 'USA'  → Index navigates to 'USA' section        ✓
-- age > 25         → Index scans ages > 25 within 'USA'      ✓
-- name = 'Alice'   → Cannot use index! All age>25 rows examined ✗
--                    (filter applied after fetching rows)

Put equality columns before range columns in composite indexes.

Principle 3: Covering indexes eliminate table lookups

If an index contains ALL columns a query needs, the database can answer entirely from the index—never reading the actual table rows.

-- Index: (customer_id, order_date, total)
SELECT order_date, total 
FROM orders 
WHERE customer_id = 12345;

-- This is a "covering" query:
-- All needed columns (order_date, total) are in the index
-- No need to fetch the full row from the table heap
-- Dramatic speedup for wide tables with many columns

Principle 4: Selectivity affects index usefulness

B-tree indexes work best for high-cardinality columns (many distinct values):

user_id (millions of distinct values) → Excellent for B-tree
country (200 distinct values) → Good for B-tree
is_active (2 values: true/false) → Poor for B-tree, consider bitmap index or no index

The ESR Rule

Performance Characteristics in Practice

Real-world B-tree performance depends on factors beyond algorithmic complexity:

Buffer pool effectiveness:

Databases cache frequently-used B-tree pages in memory. A properly-sized buffer pool can hold:

The entire index (ideal case)
Upper levels of the tree (common case)
Nothing (pathological case—too small)

With upper levels cached, a 4-level B-tree might require only 1 disk read per lookup (leaf only).

Index selectivity:

If a query matches many rows, even an indexed lookup might devolve to reading much of the table:

-- If 90% of orders are 'completed', this is barely better than full scan:
SELECT * FROM orders WHERE status = 'completed';

-- The query optimizer might choose full scan over index anyway

Write amplification:

B-tree modifications can be costly:

Insert may split nodes (writing 3+ pages)
Delete may rebalance (writing multiple pages)
Updates to indexed columns require delete + insert
Write-ahead logging doubles the work (log first, then data)

Fragmentation over time:

As B-trees evolve through insertions and deletions:

Pages become partially empty (space waste)
Leaf pages end up scattered on disk (poor sequential scan)
Solution: periodic REINDEX or VACUUM commands

Real numbers (typical):

Operation	Time (SSD)	Time (HDD)
Single key lookup	0.1-0.5 ms	2-15 ms
Range scan (1000 rows)	1-5 ms	10-100 ms
Single insert	0.5-2 ms	10-30 ms
Bulk insert (1000 rows)	10-50 ms	200-500 ms

Indexing Costs

B-Trees in NoSQL and Beyond

B-trees aren't just for relational databases—they power diverse storage systems:

File systems:

NTFS (Windows): Master File Table uses B+ trees
HFS+ (macOS): Catalog file is a B-tree
Btrfs (Linux): Copy-on-write B-trees throughout
ReFS (Windows): B+ trees for metadata

Key-value stores:

BerkeleyDB: One of the original embedded B-tree databases
LMDB: Memory-mapped B+ trees with MVCC
BoltDB (Go): B+ tree in a single file

Search engines:

Lucene/Elasticsearch: B-tree-like structures for term dictionaries
Used to map search terms to document posting lists

Version control:

Git: Packfiles use B-tree-like index structures
Object lookups use sorted structures with binary search

Why B-trees appear everywhere:

Proven correctness: 50+ years of research and production use
Predictable performance: No pathological cases in balanced variants
Disk optimization: Designed for block-based storage
Range support: Natural ordering enables powerful queries
Concurrency support: Well-understood locking protocols
Standards and libraries: Mature implementations available

When building any system that needs persistent, ordered, indexed data—consider a B-tree. It's rarely the wrong choice.

Modern Innovations and Alternatives

While B-trees remain dominant, research continues to improve them and explore alternatives:

B-tree optimizations:

Fractal trees: Buffer intermediate nodes to convert random writes to sequential (TokuDB, PerconaFT)
Bw-trees: Lock-free B-trees using atomic operations (SQL Server Hekaton, Azure CosmosDB)
FAST: SIMD-optimized B-tree search using CPU vector instructions
Adaptive indexing: Build indexes incrementally as queries reveal access patterns (Database cracking)
Write-optimized B-trees: Combine B-tree structure with log-structured writes

Alternative index structures:

Structure	Best For	Limitations
LSM-trees	Write-heavy workloads	Read amplification
Hash indexes	Pure equality	No range queries
Bitmap indexes	Low-cardinality analytics	Poor for updates
R-trees	Spatial data (GIS)	Complex balancing
Tries	String prefixes	Memory overhead
Bloom filters	Negative lookups	False positives, no retrieval

The lesson:

B-trees are the generalist—good at everything, optimal for ordered access patterns. Specialists beat them in narrow domains:

LSM-trees for append-heavy logs
Hash for pure key-value caches
R-trees for geographic queries

But for general-purpose database indexing? B-trees remain unmatched.

Hybrid Approaches

Applying Your B-Tree Knowledge

Your understanding of B-trees translates to concrete skills:

Query optimization:

Read execution plans and understand "Index Scan" vs. "Index Seek" vs. "Full Table Scan"
Recognize when adding an index will help (or not)
Understand why some queries are slow despite indexes (range on non-leading column, low selectivity, etc.)

Index design:

Create composite indexes with optimal column ordering
Use covering indexes for frequent queries
Balance read speedup against write overhead

Database selection:

Choose between B-tree and LSM-tree storage engines for your workload
Configure page sizes and buffer pools appropriately
Understand storage engine trade-offs (InnoDB vs. RocksDB, etc.)

System design interviews:

B-tree knowledge is valuable for:

Designing scalable data stores
Estimating query performance and throughput
Choosing between storage systems
Explaining how databases achieve their performance guarantees

Example interview question:

"Design a system to store and query 1 billion user sessions, supporting lookups by user ID and time range queries."

B-tree-informed answer:

Use a clustered B+ tree index on (user_id, timestamp)
Lookups by user_id: O(log n) to find user, scan sessions in order
Time range within user: Direct traversal of leaf pages
If write-heavy, consider LSM-tree based store with B-tree-like queries
Size nodes for disk block optimization
Cache upper tree levels in memory

Summary: Why Databases Use B-Trees

We've connected B-tree theory to production database reality. Let's consolidate the key insights:

Key Takeaways

•B-trees satisfy all database needs: Equality, range, ordering, prefix, min/max, updates, concurrency, recovery—no other structure covers everything.
•Hash tables fail on range queries: The most common database queries involve ranges, making hash indexes limited to specific use cases.
•Binary trees fail on disk: The height difference (30 vs. 3 levels) translates to 10× performance difference for disk-based workloads.
•Every major database uses B-trees: PostgreSQL, MySQL, Oracle, SQL Server, SQLite, MongoDB—universal adoption for universal reasons.
•Index design follows B-tree structure: Column ordering, range condition placement, and covering indexes all derive from B-tree mechanics.
•LSM-trees are the main alternative: For write-heavy workloads, but most transactional systems favor B-trees.
•B-trees power more than databases: File systems, key-value stores, search engines—anywhere ordered persistent data matters.
•This knowledge is directly applicable: Better queries, better indexes, better architecture, better interviews.

Module Complete:

Module Complete

4 / 4