B Tree Concept - Learning Module

Loading content...

0/252

Height Analysis

Predicting B-tree Performance

Understanding B-tree height is not merely academic—it's directly predictive of performance. Since each level of the tree typically requires one disk I/O, knowing the height tells you exactly how many disk accesses a search requires. This enables capacity planning, query optimization, and system design decisions.

In this final page of the B-tree Concept module, we derive precise height formulas, analyze how parameters affect height, and connect theory to practice for database sizing and performance estimation.

What You Will Learn

By the end of this page, you will derive minimum and maximum height bounds for B-trees, understand how fanout affects performance, calculate expected height for practical scenarios, and apply height analysis to database capacity planning.

Minimum Height Derivation

The minimum possible height of a B-tree occurs when all nodes are maximally full. This gives us the lower bound—the best case for a given number of keys.

Theorem: For a B-tree of order m with n keys, the minimum height is:

$$h_{min} = \lceil \log_m (n+1) \rceil - 1$$

Derivation:

With all nodes at maximum capacity:

Each internal node has m children
Each internal node has m-1 keys
Each leaf has m-1 keys

The maximum number of keys a B-tree of height h can hold:

max-capacity-derivation.txt
Height h B-tree with maximum fanout m:
 
Level 0 (root):        1 node, up to m-1 keys
Level 1:               m nodes, up to m × (m-1) keys
Level 2:               m² nodes, up to m² × (m-1) keys
...
Level h-1 (internals): m^(h-1) nodes, up to m^(h-1) × (m-1) keys
Level h (leaves):      m^h nodes, up to m^h × (m-1) keys
 
Total keys (maximum):
  n_max = (m-1) × (1 + m + m² + ... + m^h)
        = (m-1) × (m^(h+1) - 1) / (m - 1)
        = m^(h+1) - 1
 
For n keys, we need:
  n ≤ m^(h+1) - 1
  n + 1 ≤ m^(h+1)
  log_m(n+1) ≤ h + 1
  h ≥ log_m(n+1) - 1
 
Since h must be an integer:
  h_min = ⌈log_m(n+1)⌉ - 1

Minimum Height for Maximally Full B-trees
Keys (n)	Order 10	Order 100	Order 500	Order 1000
100	2	1	1	1
10,000	4	2	2	2
1,000,000	6	3	3	2
100,000,000	8	4	4	3
10,000,000,000	10	5	4	4

Practical Observation

For typical database fanouts (100-1000), even 10 billion records require only 4-5 levels in the best case. This demonstrates the power of high fanout: increasing order by 10× typically reduces height by 1 level.

Maximum Height Derivation

The maximum possible height occurs when all nodes are minimally occupied (at the threshold of underflow). This is the worst case we must plan for.

Theorem: For a B-tree of order m with n keys, the maximum height is:

$$h_{max} = \lfloor \log_{\lceil m/2 \rceil} \left( \frac{n+1}{2} \right) \rfloor$$

Derivation:

With all nodes at minimum capacity:

Root has at least 2 children (1 key)
Other internal nodes have at least ⌈m/2⌉ children (⌈m/2⌉ - 1 keys)
Leaves have at least ⌈m/2⌉ - 1 keys

Let t = ⌈m/2⌉ (the minimum degree).

min-capacity-derivation.txt
Height h B-tree with minimum fanout t = ⌈m/2⌉:
 
Level 0 (root):     1 node, at least 1 key, 2 children
Level 1:            2 nodes, at least 2(t-1) keys, 2t children
Level 2:            2t nodes, at least 2t(t-1) keys, 2t² children
...
Level h-1:          2t^(h-2) nodes
Level h (leaves):   2t^(h-1) leaves
 
Minimum keys in leaves:
  Each leaf has at least (t-1) keys
  Total leaves: 2t^(h-1)
  Minimum leaf keys: 2t^(h-1) × (t-1)
 
Total minimum keys (leaves dominate for analysis):
  n_min ≈ 2t^(h-1) × (t-1)
 
More precisely, for n keys with n+1 'slots':
  n + 1 ≥ 2t^(h-1)
  
Solving for h:
  t^(h-1) ≤ (n+1)/2
  h - 1 ≤ log_t((n+1)/2)
  h ≤ 1 + log_t((n+1)/2)
 
Since h must be an integer:
  h_max = ⌊1 + log_t((n+1)/2)⌋ = ⌊log_t((n+1)/2)⌋ + 1

Maximum Height for Minimally Full B-trees
Keys (n)	Order 10 (t=5)	Order 100 (t=50)	Order 500 (t=250)	Order 1000 (t=500)
100	3	2	2	2
10,000	6	3	3	3
1,000,000	9	5	4	3
100,000,000	12	6	5	4
10,000,000,000	15	7	5	5

The Gap Between Min and Max

For order 10, maximum height can be 50% more than minimum height (e.g., 9 vs 6 for 1M keys). For order 1000, the gap is much smaller (4 vs 3). Higher fanout not only reduces height but also reduces height variance. This is why databases prefer high-fanout B-trees.

Average Height and Fill Factor

In practice, B-trees are neither maximally nor minimally full. The average case depends on the fill factor—the fraction of node capacity actually used.

Fill Factor Definition

Fill factor α = (actual keys in node) / (maximum keys per node)

α = 1.0: maximally full (100% utilization)
α = 0.5: minimally full (50% utilization)
α ≈ 0.69: average for random insertions (ln(2))

Why ln(2) ≈ 0.69?

For random insertions into a B-tree:

After a split, each half starts at 50% capacity
Random insertions fill nodes until they split at 100%
Average occupancy is the midpoint: (50% + 100%) / 2 = 75%?

Actually, the correct analysis shows that the expected fill factor converges to ln(2) ≈ 0.693 due to the distribution of split points.

Height With Average Fill Factor

With fill factor α, effective fanout is approximately α × m:

$$h_{avg} \approx \log_{\alpha \cdot m}(n)$$

Fill Factor	Order 100	Effective Fanout	Height for 1B keys
50% (min)	100	50	6
69% (avg)	100	69	5-6
80% (good)	100	80	5
100% (max)	100	100	5

Impact of Fill Factor

Fill factor affects:

Height: Lower fill = taller tree = more I/Os
Space: Lower fill = more nodes = more disk space
Write performance: Higher fill = more frequent splits

There's a trade-off: very high fill factors mean almost every insertion causes a split, while very low fill factors waste space and increase height.

Configurable Fill Factor

Many databases allow configuring fill factor for indexes. PostgreSQL's 'fillfactor' storage parameter defaults to 90% for B-trees. Setting it to 70% leaves room for updates, reducing splits but increasing initial size. The right value depends on update patterns.

Fanout and Height Relationship

Fanout (the number of children per node) is the key parameter determining B-tree height. Let's analyze this relationship precisely.

The Logarithmic Relationship

Height h ≈ log_f(n) where f is the average fanout.

Doubling fanout: h → h × (log_f(n) / log_2f(n)) = h × (log_f(n) / (1 + log_f(n) / log_f(2)))

Simplified: doubling fanout reduces height by approximately 1 / log_f(2).

For f = 100: doubling to 200 reduces height by ~15% For f = 10: doubling to 20 reduces height by ~30%

Height vs Fanout for 1 Billion Keys
Fanout	Height	Disk Accesses	Relative I/O
2 (binary tree)	30	30	10.0x
10	10	10	3.3x
50	5	5	1.7x
100	5	5	1.7x
200	4	4	1.3x
500	4	4	1.3x
1000	3	3	1.0x

Diminishing Returns

The benefit of increasing fanout is logarithmic, meaning:

Going from fanout 2 to 10: huge improvement (30 → 10 levels)
Going from fanout 100 to 200: modest improvement (5 → 4 levels)
Going from fanout 500 to 1000: minimal improvement (4 → 3 levels)

Optimal Fanout Considerations

Why not maximize fanout by using huge pages?

Page size limits: Operating systems have natural page sizes (4KB, 8KB)
Cache efficiency: Larger nodes hurt CPU cache performance
Write amplification: Modifying a 64KB page costs more than 4KB
Lock granularity: Larger nodes mean coarser locking

For most systems, fanout of 100-500 (8KB-16KB pages) hits the sweet spot.

Rule of Thumb

For practical database sizing, assume fanout of 100-200 for internal nodes. This gives height ≈ log₁₀₀(n). A billion-record table needs ~5 levels; a trillion-record table needs ~6 levels. Root and first level are usually cached, so actual I/O is 3-4 accesses.

Height and I/O Cost

The primary motivation for understanding height is predicting I/O cost. Let's make this connection explicit.

Search I/O Cost

For a point query (find one key):

Traverse from root to leaf: h node accesses
Each node access is 1 disk I/O (assuming cold cache)
Total: h disk I/Os

With Caching

In practice, upper levels are cached:

Root: always in memory (1 page, accessed constantly)
Level 1: usually in memory (≤ m pages, frequently accessed)
Level 2+: may or may not be cached

Effective I/O = h - (cached levels) ≈ h - 2 for active indexes

Range Query I/O Cost

For a range query returning k keys:

Initial descent: h disk I/Os
Leaf scan: k/(m-1) leaf pages
Total: h + ⌈k/(m-1)⌉ disk I/Os

I/O Costs for Various Query Types
Query Type	Cold Cache I/O	Warm Cache I/O	Example (h=4, m=100)
Point query	h	h - 2	4 cold, 2 warm
Existence check	h	h - 2	4 cold, 2 warm
Range (100 keys)	h + 2	h - 2 + 2	6 cold, 4 warm
Range (1000 keys)	h + 11	h - 2 + 11	15 cold, 13 warm
Full scan	n/m + h	n/m	101 cold, 100 warm

Latency Estimation

With disk characteristics, we can estimate query latency:

HDD: 10ms average seek + read time
SSD: 0.1ms average access time

Point Query on HDD (h=4, warm cache):
  2 disk accesses × 10ms = 20ms

Point Query on SSD (h=4, warm cache):
  2 disk accesses × 0.1ms = 0.2ms

Range Query for 1000 keys on HDD:
  13 sequential reads × 5ms (sequential faster) = 65ms

Range Query for 1000 keys on SSD:
  13 reads × 0.05ms = 0.65ms

This is why SSDs transformed database performance—reducing each I/O from milliseconds to microseconds makes B-tree height less critical (but still important for maximum throughput).

Buffer Pool Hit Ratio

Production databases track buffer pool 'hit ratio'—the fraction of page accesses satisfied from memory. A well-tuned OLTP system achieves 99%+ hit ratio, meaning most B-tree accesses don't hit disk at all. Height analysis is worst-case; reality is usually much better.

Capacity Planning with Height Analysis

Height analysis enables precise capacity planning. Given expected data volume and performance requirements, we can calculate required fanout and resources.

Problem: Size an index for a table expected to grow to 500 million rows, with point query latency under 2ms on SSD.

Analysis:

capacity-planning.txt
Given:
  - n = 500,000,000 rows
  - Target latency: < 2ms
  - SSD access time: ~0.15ms per I/O
  - Assume 2 levels cached (root + level 1)
 
Step 1: Calculate maximum disk I/Os
  Max I/Os = 2ms / 0.15ms ≈ 13 disk accesses allowed
  Actual disk I/Os needed = h - 2 (cached levels)
  So h - 2 ≤ 13, meaning h ≤ 15
 
Step 2: Calculate minimum fanout for height 15
  h = log_f(n) → f = n^(1/h)
  f = (500,000,000)^(1/15) ≈ 7.5
  Minimum fanout: ~8 (very achievable!)
 
Step 3: With typical page size (8KB), calculate key size limit
  Page = 8192 bytes
  Overhead ≈ 100 bytes
  Pointer = 6 bytes
  Available per entry = (8192 - 100) / 8 = ~1000 bytes
  Max key size ≈ 1000 - 6 = 994 bytes
 
Step 4: Calculate expected height with 8KB pages
  Assume 8-byte integer key: entry size = 14 bytes
  Fanout = 8092 / 14 ≈ 578
  Height = ⌈log_578(500M)⌉ = ⌈3.15⌉ = 4
  Disk I/Os = 4 - 2 = 2
  Latency = 2 × 0.15ms = 0.3ms ✓ (well under 2ms!)
 
Conclusion: Standard 8KB pages easily meet requirements.

Space Estimation

Height analysis also helps estimate index size:

Total nodes ≈ n / (fill_factor × (m-1)) × (1 + 1/m + 1/m² + ...)
           ≈ n / (0.69 × (m-1)) × m/(m-1)
           ≈ n / (0.69 × (m-1))

For n = 500M, m = 578:
  Nodes ≈ 500M / (0.69 × 577) ≈ 1.25M nodes
  Size ≈ 1.25M × 8KB ≈ 10GB index size

Actual size includes overhead, typically 8-12GB for this case.

This calculation helps plan storage allocation and buffer pool sizing.

Practical Takeaway

For most practical scenarios, B-trees on modern hardware easily meet performance requirements. The analysis helps validate designs and identify exceptional cases (e.g., very wide keys, extreme data volumes) where special attention is needed.

Height Analysis for B+-trees

B+-trees (covered in detail in subsequent modules) have slightly different height characteristics because data is stored only in leaves.

Key Difference

Aspect	B-tree	B+-tree
Data storage	All nodes	Leaves only
Internal node content	Keys + data + pointers	Keys + pointers only
Internal fanout	Lower (data takes space)	Higher (more room for keys)
Leaf fanout	Same as internal	Potentially different
Height for same n	Slightly higher	Usually lower or equal

B+-tree Height Calculation

For B+-trees, we consider two fanouts:

f_internal = fanout of internal nodes (keys + pointers)
f_leaf = entries per leaf (keys + data pointers)

Height determination:

Number of leaves = ⌈n / f_leaf⌉
Internal tree height to cover that many leaves:
  h_internal = ⌈log_{f_internal}(number of leaves)⌉

Total height = h_internal + 1 (for leaf level)

Example Comparison

btree-vs-bplus-height.txt
Configuration:
  Page size: 8KB (8192 bytes)
  Key size: 8 bytes
  Data pointer: 8 bytes
  Child pointer: 8 bytes
  Records: 1 billion (10^9)
 
B-tree (classic):
  Entry: key (8) + data (8) + child (8) = 24 bytes
  Entries per page: 8000 / 24 ≈ 333
  Height: ⌈log_333(10^9)⌉ = ⌈3.57⌉ = 4
 
B+-tree:
  Internal entry: key (8) + child (8) = 16 bytes
  Internal fanout: 8000 / 16 = 500
  
  Leaf entry: key (8) + data_ptr (8) = 16 bytes
  Leaf capacity: 8000 / 16 = 500
  
  Number of leaves: 10^9 / 500 = 2,000,000
  Internal tree height: ⌈log_500(2M)⌉ = ⌈2.33⌉ = 3
  Total height: 3 + 1 = 4
 
Result: Same height, but B+-tree has better range scan performance
         due to leaf chaining.

Why B+-trees Dominate

While height is often similar, B+-trees excel at range scans (linked leaves), have more predictable internal fanout (data size doesn't affect traversal), and can have different leaf and internal page organizations. This is why most databases use B+-trees, not classic B-trees.

Advanced Height Considerations

Variable-Length Keys and Height

For VARCHAR or composite keys, fanout varies by actual key sizes:

Page capacity: 8000 bytes
Average key: 50 bytes
Pointer: 8 bytes

Average fanout = 8000 / (50 + 8) ≈ 138

BUT: if some keys are 200 bytes:
  Worst-case fanout = 8000 / (200 + 8) ≈ 38
  Height could vary from 4 to 5 depending on key distribution

Key compression (prefix/suffix) helps maintain consistent fanout.

Multi-Version and Height

MVCC databases may store multiple versions of index entries:

PostgreSQL: different entries for different transaction visibility
Index 'bloat' can increase effective height
Regular VACUUM/REINDEX restores optimal height

Partitioned Indexes

Partitioned tables have multiple B-trees (one per partition):

Each partition's index is smaller → lower height
Query across partitions: multiple shorter trees vs. one tall tree
Trade-off: partition pruning efficiency vs. overhead

Factors Affecting Real-World Height

•Key size variance: Wide keys reduce fanout; compression helps
•Fill factor settings: Lower fill factor = taller trees but fewer splits
•Deletion patterns: Tombstones and sparse pages inflate height
•Index bloat: MVCC versions accumulate; regular maintenance required
•Page size: Larger pages increase fanout but have trade-offs
•Compression: Within-page compression can effectively increase fanout
•Fragmentation: Non-sequential page allocation may increase I/O

Theory vs Practice

Theoretical height calculations assume ideal conditions. Real indexes may be taller due to bloat, fragmentation, and variable key sizes. Monitor actual index statistics (pg_stat_user_indexes, SHOW INDEX, etc.) to verify assumptions and schedule maintenance.

Summary: Height Analysis

We've completed a rigorous analysis of B-tree height. Let's consolidate the key insights:

Key Takeaways

•Minimum height = ⌈log_m(n+1)⌉ - 1 (all nodes full); Maximum height = ⌊log_{⌈m/2⌉}((n+1)/2)⌋ + 1 (all nodes at minimum).
•Average fill factor is ~69% (ln 2) for random insertions, giving realistic height between min and max.
•Fanout dominates height — Doubling fanout reduces height by ~15-30% depending on current fanout.
•Height directly predicts I/O — Each level is one disk access; with caching, effective I/O is h minus cached levels.
•Capacity planning uses height analysis — Given n and latency requirements, calculate required fanout and resources.
•B+-trees have similar height but better internal fanout due to data-free internal nodes.
•Real-world factors (key variance, bloat, fragmentation) can increase actual height above theoretical predictions.

Module Complete

This concludes Module 1: B-Tree Concept. You now have a comprehensive understanding of:

What B-trees are and why they were invented
The formal properties that define valid B-trees
How nodes are structured internally and on disk
How the balanced property is maintained
How to analyze and predict B-tree height and performance

In the next module, we'll examine B-tree operations in detail—the algorithms for searching, inserting, and deleting keys while maintaining all invariants.

Module Complete

Congratulations! You've mastered the foundational concepts of B-trees. You can now analyze B-tree performance, calculate expected height, plan index capacity, and understand why B-trees remain the dominant index structure in database systems. Proceed to Module 2: B-tree Operations to learn the algorithms that bring these concepts to life.

Height Analysis

Predicting B-tree Performance

What You Will Learn

Minimum Height Derivation

The minimum possible height of a B-tree occurs when all nodes are maximally full. This gives us the lower bound—the best case for a given number of keys.

Theorem: For a B-tree of order m with n keys, the minimum height is:

$$h_{min} = \lceil \log_m (n+1) \rceil - 1$$

Derivation:

With all nodes at maximum capacity:

Each internal node has m children
Each internal node has m-1 keys
Each leaf has m-1 keys

The maximum number of keys a B-tree of height h can hold:

max-capacity-derivation.txt
Height h B-tree with maximum fanout m:
 
Level 0 (root):        1 node, up to m-1 keys
Level 1:               m nodes, up to m × (m-1) keys
Level 2:               m² nodes, up to m² × (m-1) keys
...
Level h-1 (internals): m^(h-1) nodes, up to m^(h-1) × (m-1) keys
Level h (leaves):      m^h nodes, up to m^h × (m-1) keys
 
Total keys (maximum):
  n_max = (m-1) × (1 + m + m² + ... + m^h)
        = (m-1) × (m^(h+1) - 1) / (m - 1)
        = m^(h+1) - 1
 
For n keys, we need:
  n ≤ m^(h+1) - 1
  n + 1 ≤ m^(h+1)
  log_m(n+1) ≤ h + 1
  h ≥ log_m(n+1) - 1
 
Since h must be an integer:
  h_min = ⌈log_m(n+1)⌉ - 1

Minimum Height for Maximally Full B-trees
Keys (n)	Order 10	Order 100	Order 500	Order 1000
100	2	1	1	1
10,000	4	2	2	2
1,000,000	6	3	3	2
100,000,000	8	4	4	3
10,000,000,000	10	5	4	4

Practical Observation

Maximum Height Derivation

The maximum possible height occurs when all nodes are minimally occupied (at the threshold of underflow). This is the worst case we must plan for.

Theorem: For a B-tree of order m with n keys, the maximum height is:

$$h_{max} = \lfloor \log_{\lceil m/2 \rceil} \left( \frac{n+1}{2} \right) \rfloor$$

Derivation:

With all nodes at minimum capacity:

Root has at least 2 children (1 key)
Other internal nodes have at least ⌈m/2⌉ children (⌈m/2⌉ - 1 keys)
Leaves have at least ⌈m/2⌉ - 1 keys

Let t = ⌈m/2⌉ (the minimum degree).

min-capacity-derivation.txt
Height h B-tree with minimum fanout t = ⌈m/2⌉:
 
Level 0 (root):     1 node, at least 1 key, 2 children
Level 1:            2 nodes, at least 2(t-1) keys, 2t children
Level 2:            2t nodes, at least 2t(t-1) keys, 2t² children
...
Level h-1:          2t^(h-2) nodes
Level h (leaves):   2t^(h-1) leaves
 
Minimum keys in leaves:
  Each leaf has at least (t-1) keys
  Total leaves: 2t^(h-1)
  Minimum leaf keys: 2t^(h-1) × (t-1)
 
Total minimum keys (leaves dominate for analysis):
  n_min ≈ 2t^(h-1) × (t-1)
 
More precisely, for n keys with n+1 'slots':
  n + 1 ≥ 2t^(h-1)
  
Solving for h:
  t^(h-1) ≤ (n+1)/2
  h - 1 ≤ log_t((n+1)/2)
  h ≤ 1 + log_t((n+1)/2)
 
Since h must be an integer:
  h_max = ⌊1 + log_t((n+1)/2)⌋ = ⌊log_t((n+1)/2)⌋ + 1

Maximum Height for Minimally Full B-trees
Keys (n)	Order 10 (t=5)	Order 100 (t=50)	Order 500 (t=250)	Order 1000 (t=500)
100	3	2	2	2
10,000	6	3	3	3
1,000,000	9	5	4	3
100,000,000	12	6	5	4
10,000,000,000	15	7	5	5

The Gap Between Min and Max

Average Height and Fill Factor

In practice, B-trees are neither maximally nor minimally full. The average case depends on the fill factor—the fraction of node capacity actually used.

Fill Factor Definition

Fill factor α = (actual keys in node) / (maximum keys per node)

α = 1.0: maximally full (100% utilization)
α = 0.5: minimally full (50% utilization)
α ≈ 0.69: average for random insertions (ln(2))

Why ln(2) ≈ 0.69?

For random insertions into a B-tree:

After a split, each half starts at 50% capacity
Random insertions fill nodes until they split at 100%
Average occupancy is the midpoint: (50% + 100%) / 2 = 75%?

Actually, the correct analysis shows that the expected fill factor converges to ln(2) ≈ 0.693 due to the distribution of split points.

Height With Average Fill Factor

With fill factor α, effective fanout is approximately α × m:

$$h_{avg} \approx \log_{\alpha \cdot m}(n)$$

Fill Factor	Order 100	Effective Fanout	Height for 1B keys
50% (min)	100	50	6
69% (avg)	100	69	5-6
80% (good)	100	80	5
100% (max)	100	100	5

Impact of Fill Factor

Fill factor affects:

Height: Lower fill = taller tree = more I/Os
Space: Lower fill = more nodes = more disk space
Write performance: Higher fill = more frequent splits

There's a trade-off: very high fill factors mean almost every insertion causes a split, while very low fill factors waste space and increase height.

Configurable Fill Factor

Fanout and Height Relationship

Fanout (the number of children per node) is the key parameter determining B-tree height. Let's analyze this relationship precisely.

The Logarithmic Relationship

Height h ≈ log_f(n) where f is the average fanout.

Doubling fanout: h → h × (log_f(n) / log_2f(n)) = h × (log_f(n) / (1 + log_f(n) / log_f(2)))

Simplified: doubling fanout reduces height by approximately 1 / log_f(2).

For f = 100: doubling to 200 reduces height by ~15% For f = 10: doubling to 20 reduces height by ~30%

Height vs Fanout for 1 Billion Keys
Fanout	Height	Disk Accesses	Relative I/O
2 (binary tree)	30	30	10.0x
10	10	10	3.3x
50	5	5	1.7x
100	5	5	1.7x
200	4	4	1.3x
500	4	4	1.3x
1000	3	3	1.0x

Diminishing Returns

The benefit of increasing fanout is logarithmic, meaning:

Going from fanout 2 to 10: huge improvement (30 → 10 levels)
Going from fanout 100 to 200: modest improvement (5 → 4 levels)
Going from fanout 500 to 1000: minimal improvement (4 → 3 levels)

Optimal Fanout Considerations

Why not maximize fanout by using huge pages?

Page size limits: Operating systems have natural page sizes (4KB, 8KB)
Cache efficiency: Larger nodes hurt CPU cache performance
Write amplification: Modifying a 64KB page costs more than 4KB
Lock granularity: Larger nodes mean coarser locking

For most systems, fanout of 100-500 (8KB-16KB pages) hits the sweet spot.

Rule of Thumb

Height and I/O Cost

The primary motivation for understanding height is predicting I/O cost. Let's make this connection explicit.

Search I/O Cost

For a point query (find one key):

Traverse from root to leaf: h node accesses
Each node access is 1 disk I/O (assuming cold cache)
Total: h disk I/Os

With Caching

In practice, upper levels are cached:

Root: always in memory (1 page, accessed constantly)
Level 1: usually in memory (≤ m pages, frequently accessed)
Level 2+: may or may not be cached

Effective I/O = h - (cached levels) ≈ h - 2 for active indexes

Range Query I/O Cost

For a range query returning k keys:

Initial descent: h disk I/Os
Leaf scan: k/(m-1) leaf pages
Total: h + ⌈k/(m-1)⌉ disk I/Os

I/O Costs for Various Query Types
Query Type	Cold Cache I/O	Warm Cache I/O	Example (h=4, m=100)
Point query	h	h - 2	4 cold, 2 warm
Existence check	h	h - 2	4 cold, 2 warm
Range (100 keys)	h + 2	h - 2 + 2	6 cold, 4 warm
Range (1000 keys)	h + 11	h - 2 + 11	15 cold, 13 warm
Full scan	n/m + h	n/m	101 cold, 100 warm

Latency Estimation

With disk characteristics, we can estimate query latency:

HDD: 10ms average seek + read time
SSD: 0.1ms average access time

Point Query on HDD (h=4, warm cache):
  2 disk accesses × 10ms = 20ms

Point Query on SSD (h=4, warm cache):
  2 disk accesses × 0.1ms = 0.2ms

Range Query for 1000 keys on HDD:
  13 sequential reads × 5ms (sequential faster) = 65ms

Range Query for 1000 keys on SSD:
  13 reads × 0.05ms = 0.65ms

This is why SSDs transformed database performance—reducing each I/O from milliseconds to microseconds makes B-tree height less critical (but still important for maximum throughput).

Buffer Pool Hit Ratio

Capacity Planning with Height Analysis

Height analysis enables precise capacity planning. Given expected data volume and performance requirements, we can calculate required fanout and resources.

Problem: Size an index for a table expected to grow to 500 million rows, with point query latency under 2ms on SSD.

Analysis:

capacity-planning.txt
Given:
  - n = 500,000,000 rows
  - Target latency: < 2ms
  - SSD access time: ~0.15ms per I/O
  - Assume 2 levels cached (root + level 1)
 
Step 1: Calculate maximum disk I/Os
  Max I/Os = 2ms / 0.15ms ≈ 13 disk accesses allowed
  Actual disk I/Os needed = h - 2 (cached levels)
  So h - 2 ≤ 13, meaning h ≤ 15
 
Step 2: Calculate minimum fanout for height 15
  h = log_f(n) → f = n^(1/h)
  f = (500,000,000)^(1/15) ≈ 7.5
  Minimum fanout: ~8 (very achievable!)
 
Step 3: With typical page size (8KB), calculate key size limit
  Page = 8192 bytes
  Overhead ≈ 100 bytes
  Pointer = 6 bytes
  Available per entry = (8192 - 100) / 8 = ~1000 bytes
  Max key size ≈ 1000 - 6 = 994 bytes
 
Step 4: Calculate expected height with 8KB pages
  Assume 8-byte integer key: entry size = 14 bytes
  Fanout = 8092 / 14 ≈ 578
  Height = ⌈log_578(500M)⌉ = ⌈3.15⌉ = 4
  Disk I/Os = 4 - 2 = 2
  Latency = 2 × 0.15ms = 0.3ms ✓ (well under 2ms!)
 
Conclusion: Standard 8KB pages easily meet requirements.

Space Estimation

Height analysis also helps estimate index size:

Total nodes ≈ n / (fill_factor × (m-1)) × (1 + 1/m + 1/m² + ...)
           ≈ n / (0.69 × (m-1)) × m/(m-1)
           ≈ n / (0.69 × (m-1))

For n = 500M, m = 578:
  Nodes ≈ 500M / (0.69 × 577) ≈ 1.25M nodes
  Size ≈ 1.25M × 8KB ≈ 10GB index size

Actual size includes overhead, typically 8-12GB for this case.

This calculation helps plan storage allocation and buffer pool sizing.

Practical Takeaway

Height Analysis for B+-trees

B+-trees (covered in detail in subsequent modules) have slightly different height characteristics because data is stored only in leaves.

Key Difference

Aspect	B-tree	B+-tree
Data storage	All nodes	Leaves only
Internal node content	Keys + data + pointers	Keys + pointers only
Internal fanout	Lower (data takes space)	Higher (more room for keys)
Leaf fanout	Same as internal	Potentially different
Height for same n	Slightly higher	Usually lower or equal

B+-tree Height Calculation

For B+-trees, we consider two fanouts:

f_internal = fanout of internal nodes (keys + pointers)
f_leaf = entries per leaf (keys + data pointers)

Height determination:

Number of leaves = ⌈n / f_leaf⌉
Internal tree height to cover that many leaves:
  h_internal = ⌈log_{f_internal}(number of leaves)⌉

Total height = h_internal + 1 (for leaf level)

Example Comparison

btree-vs-bplus-height.txt
Configuration:
  Page size: 8KB (8192 bytes)
  Key size: 8 bytes
  Data pointer: 8 bytes
  Child pointer: 8 bytes
  Records: 1 billion (10^9)
 
B-tree (classic):
  Entry: key (8) + data (8) + child (8) = 24 bytes
  Entries per page: 8000 / 24 ≈ 333
  Height: ⌈log_333(10^9)⌉ = ⌈3.57⌉ = 4
 
B+-tree:
  Internal entry: key (8) + child (8) = 16 bytes
  Internal fanout: 8000 / 16 = 500
  
  Leaf entry: key (8) + data_ptr (8) = 16 bytes
  Leaf capacity: 8000 / 16 = 500
  
  Number of leaves: 10^9 / 500 = 2,000,000
  Internal tree height: ⌈log_500(2M)⌉ = ⌈2.33⌉ = 3
  Total height: 3 + 1 = 4
 
Result: Same height, but B+-tree has better range scan performance
         due to leaf chaining.

Why B+-trees Dominate

Advanced Height Considerations

Variable-Length Keys and Height

For VARCHAR or composite keys, fanout varies by actual key sizes:

Page capacity: 8000 bytes
Average key: 50 bytes
Pointer: 8 bytes

Average fanout = 8000 / (50 + 8) ≈ 138

BUT: if some keys are 200 bytes:
  Worst-case fanout = 8000 / (200 + 8) ≈ 38
  Height could vary from 4 to 5 depending on key distribution

Key compression (prefix/suffix) helps maintain consistent fanout.

Multi-Version and Height

MVCC databases may store multiple versions of index entries:

PostgreSQL: different entries for different transaction visibility
Index 'bloat' can increase effective height
Regular VACUUM/REINDEX restores optimal height

Partitioned Indexes

Partitioned tables have multiple B-trees (one per partition):

Each partition's index is smaller → lower height
Query across partitions: multiple shorter trees vs. one tall tree
Trade-off: partition pruning efficiency vs. overhead

Factors Affecting Real-World Height

•Key size variance: Wide keys reduce fanout; compression helps
•Fill factor settings: Lower fill factor = taller trees but fewer splits
•Deletion patterns: Tombstones and sparse pages inflate height
•Index bloat: MVCC versions accumulate; regular maintenance required
•Page size: Larger pages increase fanout but have trade-offs
•Compression: Within-page compression can effectively increase fanout
•Fragmentation: Non-sequential page allocation may increase I/O

Theory vs Practice

Summary: Height Analysis

We've completed a rigorous analysis of B-tree height. Let's consolidate the key insights:

Key Takeaways

•Minimum height = ⌈log_m(n+1)⌉ - 1 (all nodes full); Maximum height = ⌊log_{⌈m/2⌉}((n+1)/2)⌋ + 1 (all nodes at minimum).
•Average fill factor is ~69% (ln 2) for random insertions, giving realistic height between min and max.
•Fanout dominates height — Doubling fanout reduces height by ~15-30% depending on current fanout.
•Height directly predicts I/O — Each level is one disk access; with caching, effective I/O is h minus cached levels.
•Capacity planning uses height analysis — Given n and latency requirements, calculate required fanout and resources.
•B+-trees have similar height but better internal fanout due to data-free internal nodes.
•Real-world factors (key variance, bloat, fragmentation) can increase actual height above theoretical predictions.

Module Complete

This concludes Module 1: B-Tree Concept. You now have a comprehensive understanding of:

What B-trees are and why they were invented
The formal properties that define valid B-trees
How nodes are structured internally and on disk
How the balanced property is maintained
How to analyze and predict B-tree height and performance

In the next module, we'll examine B-tree operations in detail—the algorithms for searching, inserting, and deleting keys while maintaining all invariants.

Module Complete