B Plus Tree Performance - Learning Module

Loading content...

0/241

Fanout Optimization

Maximizing Tree Efficiency Through Fanout

Fanout—the number of child pointers per internal node—is the single most important factor determining B+-tree efficiency. The relationship is exponential: doubling the fanout can reduce tree height by an entire level, cutting lookup I/O in half.

While page size sets an upper bound on fanout, numerous optimization techniques can dramatically increase the actual number of entries that fit in each node. Understanding and applying these techniques separates adequate indexes from highly optimized ones that deliver exceptional performance.

What You Will Master

By the end of this page, you will understand how fanout affects performance mathematically, master key compression techniques (prefix, suffix, dictionary), learn structural optimizations used in production databases, and apply these techniques to maximize index efficiency.

The Mathematics of Fanout

The impact of fanout on tree characteristics follows exponential relationships that make even small improvements highly valuable.

The Fanout-Height Relationship:

For a B+-tree with N records and average fanout f:

$$h = \lceil\log_f(N)\rceil + 1$$

This logarithmic relationship means:

Doubling fanout approximately halves the logarithm, reducing height by ~1 level
The higher the fanout, the greater the impact of further increases

Fanout Impact on Tree Height (1 Billion Records)
Fanout	Tree Height	Total Nodes	Internal Nodes	I/O per Lookup
100	5	~10.1M	~101K	5
200	4	~5.03M	~25K	4
500	4	~2.01M	~4K	4
1000	3	~1.00M	~1K	3
2000	3	~500K	~250	3

The Multiplicative Effect:

Notice how fanout improvements provide multiplicative benefits:

Height reduction: Fewer levels to traverse
Fewer internal nodes: Less memory needed to cache the index structure
Smaller index: Fewer pages means smaller storage footprint
Better caching: More of the tree fits in buffer pool

For a database serving millions of queries per day, reducing height from 5 to 4 eliminates millions of disk I/O operations.

fanout_impact.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import math
 
def analyze_fanout_impact(record_count, fanout_values):
    """
    Analyze the impact of different fanout values on tree characteristics.
    """
    print(f"Analysis for {record_count:,} records")
    print("=" * 70)
    print(f"{'Fanout':<10} {'Height':<8} {'Leaves':<15} {'Internal':<12} {'Total Size':<12}")
    print("-" * 70)
    
    for f in fanout_values:
        # Calculate tree structure
        leaves = math.ceil(record_count / (f - 1))  # leaf capacity = f-1
        height = 1
        level_nodes = leaves
        internal_nodes = 0
        
        while level_nodes > 1:
            parent_nodes = math.ceil(level_nodes / f)
            internal_nodes += parent_nodes
            level_nodes = parent_nodes
            height += 1
        
        total_nodes = leaves + internal_nodes
        # Assume 8KB pages
        total_size_gb = total_nodes * 8 / (1024 * 1024)
        
        print(f"{f:<10} {height:<8} {leaves:<15,} {internal_nodes:<12,} {total_size_gb:<12.2f} GB")
 
# Analyze for different fanout values
record_count = 1_000_000_000  # 1 billion
fanout_values = [100, 200, 300, 500, 750, 1000]
analyze_fanout_impact(record_count, fanout_values)
 
# Impact of 20% fanout improvement
print("\n" + "=" * 70)
print("Impact of 20% fanout improvement:")
base_fanout = 400
improved_fanout = 480
 
for f, label in [(base_fanout, "Baseline"), (improved_fanout, "Optimized")]:
    leaves = math.ceil(record_count / (f - 1))
    height = math.ceil(math.log(leaves, f)) + 1
    print(f"{label}: Fanout {f} → Height {height}, ~{leaves//1000}K leaf pages")

The 20% Rule

A 20% improvement in fanout often translates to ~17% reduction in tree height (log relationship). For a height-4 tree, this could mean the difference between 4 and 3 levels—a 25% reduction in I/O per lookup. Small optimizations compound into significant performance gains.

Prefix Compression

Prefix compression exploits the fact that adjacent keys in a sorted B+-tree often share common prefixes. By storing the shared prefix once and recording only the unique suffixes, we can dramatically reduce key storage requirements.

How Prefix Compression Works:

Consider a page containing these keys:

products/electronics/laptops/dell-xps-15
products/electronics/laptops/lenovo-thinkpad
products/electronics/laptops/macbook-pro
products/electronics/phones/iphone-15
products/electronics/phones/pixel-8

Without compression: 5 × ~45 bytes = 225 bytes

With prefix compression:

Prefix: "products/electronics/" (21 bytes)
Suffixes: "laptops/dell-xps-15", "laptops/lenovo-thinkpad", etc.

With hierarchical compression: Even more savings by sharing "laptops/" prefix among the first three entries.

prefix_compression.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def find_common_prefix(keys):
    """Find the longest common prefix among a list of keys."""
    if not keys:
        return ""
    
    prefix = keys[0]
    for key in keys[1:]:
        while not key.startswith(prefix):
            prefix = prefix[:-1]
            if not prefix:
                return ""
    return prefix
 
def prefix_compress_page(keys):
    """
    Simulate prefix compression on a B+-tree page.
    Returns compression statistics.
    """
    if not keys:
        return {"original_size": 0, "compressed_size": 0, "ratio": 0}
    
    # Original size
    original_size = sum(len(k) for k in keys)
    
    # Find common prefix
    prefix = find_common_prefix(keys)
    prefix_len = len(prefix)
    
    # Compressed storage:
    # - 2 bytes for prefix length
    # - prefix itself
    # - For each key: 1 byte suffix length + suffix
    compressed_size = 2 + prefix_len
    for key in keys:
        suffix = key[prefix_len:]
        compressed_size += 1 + len(suffix)
    
    ratio = (original_size - compressed_size) / original_size * 100
    
    return {
        "original_size": original_size,
        "compressed_size": compressed_size,
        "savings": original_size - compressed_size,
        "ratio": ratio,
        "prefix": prefix,
        "prefix_len": prefix_len
    }
 
# Example: URL index
urls = [
    "https://example.com/products/category/electronics/item/12345",
    "https://example.com/products/category/electronics/item/12346",
    "https://example.com/products/category/electronics/item/12347",
    "https://example.com/products/category/electronics/item/12348",
    "https://example.com/products/category/electronics/item/12349",
]
 
result = prefix_compress_page(urls)
print("URL Index Prefix Compression")
print(f"Original size: {result['original_size']} bytes")
print(f"Compressed size: {result['compressed_size']} bytes")
print(f"Savings: {result['savings']} bytes ({result['ratio']:.1f}%)")
print(f"Common prefix: '{result['prefix']}'")
 
# Example: UUID index (worst case - random, no common prefix)
import uuid
uuids = [str(uuid.uuid4()) for _ in range(5)]
uuid_result = prefix_compress_page(uuids)
print(f"\nUUID Compression: {uuid_result['ratio']:.1f}% (minimal savings expected)")

Prefix Compression Effectiveness by Key Type
Key Type	Typical Prefix Sharing	Compression Ratio	Fanout Improvement
URLs / Paths	Very High	60-80%	2.5-5×
Sequential IDs	High	40-60%	1.7-2.5×
Timestamps	High	50-70%	2-3×
Email addresses	Medium	30-50%	1.4-2×
Random UUIDs	None	0%	1× (no benefit)
Hash values	None	0%	1× (no benefit)

Prefix Compression Trade-offs

Prefix compression requires decompression during search (CPU overhead), may complicate concurrent access (prefix changes require page restructuring), and provides no benefit for random keys like UUIDs. Modern CPUs handle decompression efficiently, but the complexity must be considered.

Suffix Truncation

Suffix truncation (also called separator key compression or fence key compression) is a powerful technique specific to internal nodes of B+-trees. The key insight: internal node keys only need to distinguish between child subtrees—they don't need to be complete keys.

The Insight:

In a B+-tree internal node, a key K separates children: all keys < K go left, all keys ≥ K go right. We only need enough of K to make this distinction—not the full key value.

Consider these adjacent child page boundary keys:

Left child max: "transaction_2024_01_15_abc123xyz"
Right child min: "transaction_2024_01_16_def456uvw"

Full separator: "transaction_2024_01_16_def456uvw" Minimal separator: "transaction_2024_01_16" (or even just "transaction_2024_01_16")

The truncated version is sufficient to route searches correctly!

suffix_truncation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def compute_minimal_separator(left_max, right_min):
    """
    Compute the minimal separator key between two adjacent child ranges.
    
    The separator must be:
    - > left_max (strictly greater)
    - <= right_min (less than or equal)
    
    We find the shortest prefix of right_min that satisfies these conditions.
    """
    # Find the first character position where they differ
    min_len = min(len(left_max), len(right_min))
    
    for i in range(min_len):
        if left_max[i] < right_min[i]:
            # We can truncate here - prefix up to i (inclusive) is sufficient
            return right_min[:i+1]
        elif left_max[i] > right_min[i]:
            # This shouldn't happen in a valid B+-tree (left_max < right_min)
            raise ValueError("Invalid ordering: left_max >= right_min")
    
    # If we get here, one is a prefix of the other
    # right_min must be longer (since left_max < right_min)
    return right_min[:min_len + 1]
 
def analyze_suffix_truncation(separators):
    """Analyze savings from suffix truncation on separator keys."""
    total_full = 0
    total_truncated = 0
    
    for left_max, right_min in separators:
        minimal = compute_minimal_separator(left_max, right_min)
        full_len = len(right_min)
        truncated_len = len(minimal)
        
        total_full += full_len
        total_truncated += truncated_len
        
        savings = (full_len - truncated_len) / full_len * 100
        print(f"Full: '{right_min[:30]}...' ({full_len})")
        print(f"Minimal: '{minimal}' ({truncated_len})")
        print(f"Savings: {savings:.1f}%\n")
    
    overall_savings = (total_full - total_truncated) / total_full * 100
    print(f"Overall: {total_full} → {total_truncated} bytes ({overall_savings:.1f}% reduction)")
 
# Example: Log entry keys (timestamp + ID)
separators = [
    ("2024-01-15T23:59:59.999Z_id_99999", "2024-01-16T00:00:00.001Z_id_00001"),
    ("2024-01-16T00:59:59.999Z_id_99999", "2024-01-16T01:00:00.001Z_id_00001"),
    ("2024-01-16T01:59:59.999Z_id_99999", "2024-01-16T02:00:00.001Z_id_00001"),
]
 
print("Suffix Truncation Analysis: Time-Series Index")
print("=" * 60)
analyze_suffix_truncation(separators)

Benefits

•Dramatically smaller internal nodes
•Higher fanout (2× or more for long keys)
•All internal nodes fit in memory more easily
•Reduces overall index size
•No search accuracy impact

Limitations

•Only applies to internal nodes
•Leaf nodes still store full keys
•Requires recomputation on node splits
•Some databases don't implement it
•Complex with variable-length keys

PostgreSQL's Suffix Truncation

PostgreSQL 12+ implements suffix truncation for B-tree internal pages. For indexes on long text columns, this can reduce internal index size by 70% or more, significantly improving buffer pool efficiency and reducing tree height.

Key Normalization and Encoding

Key normalization transforms keys into a canonical binary format optimized for comparison and storage. This technique enables simpler comparison operations and can reduce key sizes.

Why Normalize Keys?

Native data types have varying comparison rules:

Strings: Collation-dependent, case sensitivity, Unicode normalization
Numbers: Sign handling, endianness
Dates: Format variations, timezone considerations
Composites: Multiple comparison rules per key

Normalized keys allow simple byte-for-byte comparison (memcmp) regardless of original type.

key_normalization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import struct
 
def normalize_integer(value, signed=True, width=8):
    """
    Normalize an integer for ordered byte comparison.
    
    For memcmp to work correctly:
    - Flip sign bit for signed integers (makes negative < positive)
    - Use big-endian byte order (MSB first for proper ordering)
    """
    if signed:
        # Flip the sign bit so memcmp gives correct ordering
        # For 8-byte signed: XOR with 0x8000000000000000
        flip_mask = 1 << (width * 8 - 1)
        adjusted = value ^ flip_mask
        # Handle negative wraparound
        if adjusted < 0:
            adjusted += (1 << (width * 8))
    else:
        adjusted = value
    
    return adjusted.to_bytes(width, byteorder='big')
 
def normalize_string(s, max_len=8):
    """
    Normalize a string for ordered byte comparison.
    Simple approach: truncate/pad to fixed length.
    Production systems use more sophisticated encoding.
    """
    # Encode as UTF-8 and truncate/pad
    encoded = s.encode('utf-8')[:max_len]
    return encoded.ljust(max_len, b'\x00')
 
def normalize_composite_key(timestamp, user_id, txn_id):
    """
    Normalize a composite key: (timestamp, user_id, txn_id)
    All components concatenated in order for single memcmp comparison.
    """
    ts_bytes = normalize_integer(int(timestamp * 1000000), signed=False, width=8)
    user_bytes = normalize_integer(user_id, signed=False, width=4)
    txn_bytes = normalize_integer(txn_id, signed=False, width=8)
    
    return ts_bytes + user_bytes + txn_bytes
 
# Demonstrate that normalized comparison matches semantic comparison
test_integers = [-100, -1, 0, 1, 100]
print("Integer Normalization Test")
print("=" * 50)
normalized = [(n, normalize_integer(n)) for n in test_integers]
 
# Sort by normalized bytes
sorted_by_bytes = sorted(normalized, key=lambda x: x[1])
sorted_values = [x[0] for x in sorted_by_bytes]
print(f"Original order: {test_integers}")
print(f"After byte-sort: {sorted_values}")
print(f"Correct: {sorted_values == sorted(test_integers)}")

Key Normalization Strategies
Data Type	Normalization Strategy	Size Impact	Example
INTEGER	Flip sign bit, big-endian	No change	-100 → 0x7FFFFF9C
FLOAT	Sign/exponent encoding	No change	IEEE 754 transform
VARCHAR	Collation encoding	May increase	ICU collation keys
TIMESTAMP	Epoch nanoseconds	Fixed 8 bytes	Consistent comparison
UUID	Binary form	16 bytes	Remove hyphens
COMPOSITE	Concatenate normalized	Sum of parts	See code example

Collation and Normalization

String normalization for collation (locale-aware sorting) can actually increase key size. For example, German 'ß' sorted as 'ss' or French accented characters may expand significantly. This is a trade-off between correct ordering and storage efficiency.

Dictionary Compression

Dictionary compression replaces repeated key values or key components with shorter integer codes. This is particularly effective for low-cardinality columns (few distinct values) or keys with repetitive patterns.

How Dictionary Compression Works:

Instead of storing full values, maintain a dictionary mapping:

Dictionary:
  1 → "North America"
  2 → "Europe"
  3 → "Asia Pacific"
  4 → "Latin America"

Original keys: ["North America", "Europe", "North America", "Asia Pacific", ...]
Compressed: [1, 2, 1, 3, ...]

Each key goes from ~15 bytes average to 1-4 bytes.

dictionary_compression.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
class DictionaryEncoder:
    """
    Simple dictionary encoder for B+-tree keys.
    Used for low-cardinality column values.
    """
    
    def __init__(self):
        self.value_to_code = {}
        self.code_to_value = {}
        self.next_code = 1
    
    def encode(self, value):
        """Encode a value to its dictionary code."""
        if value not in self.value_to_code:
            self.value_to_code[value] = self.next_code
            self.code_to_value[self.next_code] = value
            self.next_code += 1
        return self.value_to_code[value]
    
    def decode(self, code):
        """Decode a dictionary code to its value."""
        return self.code_to_value.get(code)
    
    def bytes_per_code(self):
        """Number of bytes needed for codes."""
        if self.next_code <= 256:
            return 1
        elif self.next_code <= 65536:
            return 2
        else:
            return 4
 
def analyze_dictionary_compression(values):
    """Analyze space savings from dictionary compression."""
    encoder = DictionaryEncoder()
    
    # Original size
    original_size = sum(len(str(v).encode('utf-8')) for v in values)
    
    # Encode all values
    codes = [encoder.encode(v) for v in values]
    
    # Dictionary size
    dict_size = sum(len(str(v).encode('utf-8')) + 4 for v in encoder.value_to_code.keys())
    
    # Compressed size
    bytes_per = encoder.bytes_per_code()
    compressed_size = len(codes) * bytes_per + dict_size
    
    cardinality = len(encoder.value_to_code)
    
    return {
        "original_size": original_size,
        "compressed_size": compressed_size,
        "dict_size": dict_size,
        "cardinality": cardinality,
        "bytes_per_code": bytes_per,
        "ratio": (original_size - compressed_size) / original_size * 100
    }
 
# Example: Country column (low cardinality)
import random
countries = ["United States", "United Kingdom", "Germany", "France", 
             "Japan", "Canada", "Australia", "Brazil", "India", "China"]
 
# 1 million rows with random countries
values = [random.choice(countries) for _ in range(1_000_000)]
 
result = analyze_dictionary_compression(values)
print("Dictionary Compression: Country Column (1M rows)")
print(f"Cardinality: {result['cardinality']} distinct values")
print(f"Original: {result['original_size']:,} bytes")
print(f"Compressed: {result['compressed_size']:,} bytes")
print(f"Dictionary overhead: {result['dict_size']:,} bytes")
print(f"Savings: {result['ratio']:.1f}%")

Dictionary Compression Effectiveness
Column Type	Cardinality	Original Size	Code Size	Typical Savings
Status (Active/Inactive)	2	7 bytes avg	1 byte	85%
Country	~200	10 bytes avg	1 byte	90%
State/Province	~5000	8 bytes avg	2 bytes	75%
Product Category	~100	15 bytes avg	1 byte	93%
Email Domain	~1M	12 bytes avg	3 bytes	75%
User ID (high card)	Millions	8 bytes	4 bytes	50%

Dictionary Compression Trade-offs

Dictionary compression requires maintaining the dictionary in memory, adds encoding/decoding overhead, and doesn't help with high-cardinality columns. It's most effective for columns with <10,000 distinct values appearing in millions of rows.

Structural Optimizations

Beyond key compression, structural modifications to the B+-tree itself can improve fanout and overall efficiency.

1. Variable-Length Internal Node Format:

Instead of fixed-size slots, internal nodes can use variable-length entries:

node_format.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/* Fixed-format internal node (traditional) */
struct FixedInternalNode {
    uint16_t num_keys;
    uint32_t child_pointers[MAX_FANOUT];
    char     keys[MAX_FANOUT - 1][MAX_KEY_SIZE];  // Wastes space!
};
 
/* Variable-format internal node (optimized) */
struct VarInternalNode {
    uint16_t num_keys;
    uint16_t free_space_offset;
    
    /* Slot array: offsets to actual key data */
    uint16_t key_offsets[1];  // Dynamic size
    
    /* After key_offsets array: */
    /* - Child pointers (packed) */
    /* - Key data (variable length, grows from end of page) */
};
 
/* Benefits:
 * - Short keys use less space
 * - More entries fit per page
 * - Suffix-truncated separators pack efficiently
 * 
 * Trade-offs:
 * - More complex slot management
 * - Insertion requires shifting
 * - Must track free space
 */

2. Pointer Swizzling:

Replace on-disk page pointers with in-memory pointers for cached nodes:

On-disk: 8-byte page ID → fetch from disk/buffer pool
Swizzled: 8-byte memory pointer → direct memory access

This eliminates buffer pool lookup overhead for frequently accessed nodes.

3. Hybrid Node Formats:

Some systems use different formats for different node types:

Root node: Kept entirely in memory, fully expanded
Internal nodes: Heavily compressed (suffix truncation, etc.)
Leaf nodes: Optimized for the specific query patterns (may include values)

Database-Specific Optimizations

•InnoDB (MySQL): Change buffer defers secondary index updates; adaptive hash indexes
•PostgreSQL: Suffix truncation, deduplication, fast-path for rightmost inserts
•Oracle: Prefix key compression, reverse key indexes for contention reduction
•SQL Server: Included columns, filtered indexes, columnstore indexes
•BerkeleyDB: Prefix/suffix compression, bulk load optimization

PostgreSQL Deduplication

PostgreSQL 13+ implements B-tree deduplication: when a key appears multiple times, it stores the key once with a list of TIDs. This is similar to dictionary compression but at the tree level. For non-unique indexes with many duplicates, this can reduce index size by 50% or more.

Measuring and Monitoring Fanout

Understanding your actual index fanout is essential for optimization. Here's how to measure it in production databases.

fanout_monitoring.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- PostgreSQL: Estimate index fanout from statistics
SELECT
    indexrelname as index_name,
    pg_relation_size(indexrelid) as index_bytes,
    idx_scan as index_scans,
    idx_tup_read as tuples_via_index,
    idx_tup_fetch as tuples_fetched,
    -- Estimate pages and entries
    (pg_relation_size(indexrelid) / 8192)::int as index_pages,
    (SELECT reltuples FROM pg_class WHERE oid = indexrelid)::bigint as estimated_entries
FROM pg_stat_user_indexes
WHERE indexrelname LIKE 'idx_%'
ORDER BY pg_relation_size(indexrelid) DESC;
 
-- PostgreSQL: Detailed B-tree statistics (requires pageinspect extension)
CREATE EXTENSION IF NOT EXISTS pageinspect;
 
-- Get B-tree meta page info
SELECT * FROM bt_metap('idx_users_email');
 
-- Get statistics for internal pages (level > 0)
SELECT 
    blkno,
    type,
    live_items,
    dead_items,
    avg_item_size,
    free_size
FROM bt_page_stats('idx_users_email', 1);  -- Page 1 (root)
 
-- MySQL InnoDB: Index statistics
SELECT 
    table_name,
    index_name,
    stat_name,
    stat_value,
    stat_description
FROM mysql.innodb_index_stats
WHERE database_name = 'mydb'
  AND table_name = 'users'
  AND index_name = 'idx_email';
 
-- Oracle: Index statistics
SELECT
    index_name,
    blevel,         -- B-tree level (height - 1)
    leaf_blocks,    -- Number of leaf pages
    distinct_keys,
    avg_leaf_blocks_per_key,
    avg_data_blocks_per_key
FROM user_indexes
WHERE index_name = 'IDX_USERS_EMAIL';

Fanout Optimization Checklist
Check	Good	Needs Attention	Action
Average entries/page	70% of theoretical max	<50% of max	Check fill factor, fragmentation
Index height	≤ log_f(N) + 1	Higher than expected	Consider key compression
Key size	Match actual data	Significant padding	Use appropriate types
Duplicate handling	Deduplication enabled	Many duplicate keys	Enable deduplication
Page utilization	80%	<60%	REINDEX or VACUUM

Regular Index Health Checks

Schedule regular index health assessments. Fanout can degrade over time due to fragmentation, bloat, and changing data patterns. Many production issues stem from indexes that have degraded significantly from their initial optimal state.

Summary: Mastering Fanout Optimization

We've comprehensively explored techniques to maximize B+-tree fanout, directly improving index performance.

Key Takeaways

•Fanout has exponential impact: Doubling fanout can reduce height by one level, halving lookup I/O
•Prefix compression excels for structured keys: URLs, paths, timestamps can see 60-80% compression
•Suffix truncation is powerful for internals: Store only enough to distinguish subtrees
•Key normalization enables fast comparison: Binary comparisons are simpler and faster
•Dictionary compression suits low cardinality: Categorical columns see 80%+ compression
•Monitor real fanout: Theoretical max is rarely achieved; track and optimize continuously

What's Next:

With fanout optimization understood, we'll move to I/O cost analysis—the comprehensive framework for understanding and predicting the disk I/O costs of B+-tree operations.

Page Complete

You now understand the techniques used by database systems to maximize B+-tree fanout, from key compression to structural optimizations. This knowledge enables you to design efficient indexes and diagnose performance issues. Next, we'll analyze the I/O costs of B+-tree operations in detail.