BST Variants - Learning Module

Loading content...

0/276

Connection to Other Balanced Trees

The BST Foundation Extends

Throughout this chapter, we've explored Binary Search Trees in depth—the BST property, operations, deletion algorithms, complexity analysis, and advanced patterns like order statistics and augmentation. Now we step back to see the bigger picture.

The BST isn't just a data structure—it's a foundation. The core ideas you've learned extend to an entire family of tree structures, each optimized for different scenarios:

AVL Trees: Strict height balance for predictable performance
Red-Black Trees: Relaxed balance for faster modifications
B-Trees: Multi-way trees for disk-based storage
B+ Trees: B-tree variant for database indexes
Splay Trees: Self-adjusting for access pattern optimization
Treaps: Randomized structure for simplicity

This page connects the BST concepts you've mastered to this broader family, showing how the same fundamental ideas manifest in different contexts.

What You Will Master

By the end of this page, you will understand how BST principles underlie all balanced tree variants, see the key tradeoffs between different balanced structures, know when to choose which tree variant for your application, and recognize how your BST knowledge transfers to more advanced structures.

The Universal BST Foundation

Every balanced tree variant shares the BST's core property:

For every node, all keys in the left subtree are less than the node's key, and all keys in the right subtree are greater.

This property is what enables O(log n) search—at each node, we eliminate half the remaining candidates. It's so fundamental that we often call these structures 'search trees' regardless of their specific balancing mechanism.

What varies between balanced tree types:

Balance invariant: How strictly height is controlled
Rebalancing mechanism: Rotations, splits, color changes, etc.
Node structure: Binary vs. multi-way, additional metadata
Optimization target: Worst-case height, modification cost, cache efficiency

Balanced Tree Family: Shared Foundation
Property	Shared by All
Ordering property	Yes: left < node < right
O(log n) search	Yes (with balance)
In-order gives sorted output	Yes
Supports range queries	Yes
Min/Max at extremes	Yes (leftmost/rightmost)
Augmentation compatible	Yes (with care for rebalancing)

Transfer of Knowledge

Because all balanced trees share the BST property, everything you've learned about BST search, in-order traversal, successor/predecessor finding, range queries, and augmentation applies to every variant. You're not starting over—you're extending a solid foundation.

AVL Trees: Strict Height Balance

AVL trees (named after inventors Adelson-Velsky and Landis, 1962) are the oldest self-balancing BST. They maintain a strict balance invariant:

AVL Property: For every node, the heights of its left and right subtrees differ by at most 1.

The balance factor of a node is height(left) - height(right). Valid balance factors are {-1, 0, 1}.

How it connects to BST:

Same BST property for ordering
Same search algorithm (no modification needed)
Addition: height tracking at each node
Addition: Rebalancing via rotations after modifications

avl-bst-connection.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
interface AVLNode<K, V> {
    key: K;
    value: V;
    left: AVLNode<K, V> | null;
    right: AVLNode<K, V> | null;
    height: number;  // AVL-specific augmentation
}
 
/**
 * AVL search is IDENTICAL to BST search.
 * The balance property doesn't change how we search—
 * it just guarantees the height is O(log n).
 */
function avlSearch<K, V>(
    node: AVLNode<K, V> | null,
    key: K,
    compare: (a: K, b: K) => number
): V | null {
    if (node === null) return null;
    
    const cmp = compare(key, node.key);
    if (cmp === 0) return node.value;
    if (cmp < 0) return avlSearch(node.left, key, compare);
    return avlSearch(node.right, key, compare);
}
 
/**
 * BST in-order traversal works unchanged.
 * Produces sorted output due to BST property.
 */
function avlInorder<K, V>(
    node: AVLNode<K, V> | null,
    callback: (key: K, value: V) => void
): void {
    if (node === null) return;
    avlInorder(node.left, callback);
    callback(node.key, node.value);
    avlInorder(node.right, callback);
}
 
/**
 * The difference is in INSERT and DELETE:
 * After standard BST operation, check balance and rotate if needed.
 */

AVL characteristics:

Aspect	Detail
Height guarantee	h ≤ 1.44 × log₂(n + 2)
Search	O(log n) guaranteed
Insert	O(log n) — may need up to 2 rotations
Delete	O(log n) — may need up to O(log n) rotations
Space overhead	1 integer (height) per node
Best for	Read-heavy workloads with occasional writes

When to use AVL:

You need strictest height guarantees
Lookups vastly outnumber modifications
Predictable worst-case performance is critical
Examples: In-memory databases, read-heavy caches

Red-Black Trees: Relaxed Balance

Red-Black trees relax the AVL balance requirement in exchange for faster modifications. Each node is colored red or black, and the tree maintains five properties:

Every node is red or black
The root is black
Every null leaf is considered black
Red nodes cannot have red children (no red-red parent-child)
Every path from any node to its null descendants has the same count of black nodes

Key insight: Property 5 ensures the tree is 'black-balanced,' which limits height to 2 × log(n + 1).

Connection to BST:

Same BST property for ordering
Same search and traversal algorithms
Addition: Color bit per node (just 1 bit!)
Different rebalancing mechanism: color flips + rotations

Red-Black vs AVL:

Aspect	AVL	Red-Black
Height bound	~1.44 log n	~2 log n
Lookup speed	Slightly faster (shorter height)	Slightly slower
Insert rotations	Up to 2	Up to 2
Delete rotations	Up to O(log n)	Up to 3
Insert/Delete speed	Slower	Faster
Color storage	Not needed	1 bit per node
Typically used	Databases, search-heavy apps	Standard libraries, OS kernels

Why Red-Black is more common:

Red-Black trees limit delete rotations to O(1), making them faster for write-heavy workloads. This is why:

C++ std::map and std::set use Red-Black trees
Java TreeMap and TreeSet use Red-Black trees
Linux kernel's Completely Fair Scheduler uses Red-Black trees
Many other system-level implementations choose Red-Black

Implementation Complexity

Both AVL and Red-Black trees are significantly more complex to implement than basic BSTs. In production, use standard library implementations. Understanding the concepts helps you choose the right structure and debug issues, but implementing from scratch is rarely necessary.

B-Trees: Multi-Way Extension for Disk

B-Trees generalize the BST concept from binary to multi-way search trees. In a B-tree of order m:

Each node can have up to m children
Each node can store up to m-1 keys
Keys within a node are sorted
Keys act as separators directing search to the correct child

The BST connection is clear:

A BST is essentially a B-tree of order 2 (each node has up to 2 children, 1 key). B-trees generalize this to larger nodes.

BST Node (B-tree order 2):    B-tree Node (order 5):
      [K]                     [K₁|K₂|K₃|K₄]
     /   \                   /  |  |  |  \
   <K    >K                <K₁ K₁-K₂ K₂-K₃ K₃-K₄ >K₄

Why B-trees exist: The disk access problem

When data is stored on disk (HDD or SSD), the cost model changes:

Memory operations: ~100 nanoseconds
Disk random access: ~10 milliseconds (HDD) or ~100 microseconds (SSD)
Disk operations are 1,000-100,000x slower than memory!

In a BST with n nodes, search requires O(log₂ n) node accesses. For n = 1 billion:

log₂(10⁹) ≈ 30 disk accesses per search
At 10ms each: 300ms per search (!)

B-trees with large order m reduce this:

log_m(10⁹) for m=1000 ≈ 3 disk accesses
At 10ms each: 30ms per search (10x improvement)

By making nodes large enough to match disk block sizes (typically 4KB-64KB), B-trees minimize disk I/O.

b-tree-structure.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
/**
 * B-tree node conceptual structure.
 * This is simplified for understanding—real implementations  
 * are more complex for performance.
 */
interface BTreeNode<K, V> {
    keys: K[];           // Sorted keys in this node
    values: V[];         // Values corresponding to keys
    children: BTreeNode<K, V>[];  // Child pointers (keys.length + 1 of them)
    isLeaf: boolean;     // Whether this is a leaf node
}
 
/**
 * B-tree search conceptually mirrors BST search.
 * At each node, find which child subtree contains the key.
 */
function btreeSearch<K, V>(
    node: BTreeNode<K, V>,
    key: K,
    compare: (a: K, b: K) => number
): V | null {
    // Binary search within node to find key or child index
    let i = 0;
    while (i < node.keys.length && compare(key, node.keys[i]) > 0) {
        i++;
    }
    
    // Check if we found the key in this node
    if (i < node.keys.length && compare(key, node.keys[i]) === 0) {
        return node.values[i];
    }
    
    // If leaf, key doesn't exist
    if (node.isLeaf) {
        return null;
    }
    
    // Recurse to appropriate child
    return btreeSearch(node.children[i], key, compare);
}
 
/**
 * Key insight: This is the same logic as BST search!
 * 
 * BST:     if (key < node.key) go left; else if (key > node.key) go right;
 * B-tree:  find i where keys[i-1] < key < keys[i]; go to children[i]
 * 
 * The fundamental "compare and choose direction" is identical.
 */

B-Tree Applications

B-trees (and the B+ variant) are the foundation of virtually every database index: PostgreSQL, MySQL, Oracle, SQLite, and more. File systems like NTFS, HFS+, and ext4 also use B-tree variants. Understanding BST concepts directly informs understanding these critical systems.

B+ Trees: Database Index Specialization

B+ Trees are a refinement of B-trees specifically optimized for database use:

All values at leaves: Internal nodes store only keys (for routing); actual data is at leaves only
Leaf nodes are linked: A linked list connects all leaves for efficient range scans
Higher branching factor: Without values in internal nodes, more keys fit per node

B+ Tree Structure:

         [30|60|90]                    ← Internal: keys only
        /    |    |    \
   [10|20] [40|50] [70|80] [100|110]   ← Leaves: key-value pairs
      ↔       ↔       ↔       ↔        ← Linked for range scans

Why B+ trees dominate databases:

Feature	Benefit
Values only at leaves	Internal nodes are smaller → higher branching → fewer disk reads
Leaf linked list	Range queries are sequential disk reads (fast!)
Predictable leaf access	Every search touches the same number of levels
Cache-friendly	Full internal node caching practical for hot paths

Range query example:

SELECT * FROM users WHERE age BETWEEN 25 AND 35

Search for key 25: Follow tree to leaf (O(log n) disk reads)
Scan linked list until key > 35: Sequential reads (very fast)
Total: O(log n + k) where k is result count

In a BST, range queries require in-order traversal which jumps around memory/disk. B+ tree leaf links enable sequential access—a massive performance win for spinning disks.

Your BST Knowledge Applies

B+ tree search, insertion (with splits), and deletion (with merges) are direct extensions of BST concepts. The BST property still holds: left subtrees have smaller keys, right subtrees have larger. You're not learning a new paradigm—you're scaling up familiar ideas.

Splay Trees: Self-Adjusting for Access Patterns

Splay trees take a fundamentally different approach to balance. Instead of maintaining a strict invariant, they restructure based on access patterns:

After every operation (search, insert, delete), splay the accessed node to the root.

Splaying is a series of rotations that moves a node to the root while roughly halving the depth of nodes on the access path. This provides:

Amortized O(log n): Over a sequence of operations, average cost is O(log n)
Access locality: Recently accessed items are near root
No balance metadata: Nodes don't need height or color stored

Connection to BST:

Splay trees are pure BSTs—the ordering property is unchanged, search follows the same logic. The only additions are:

After search: splay the found node (or last node on failed path) to root
After insert: splay the new node to root
After delete: splay the parent of deleted node to root

When splay trees excel:

Access patterns have locality: Same keys accessed repeatedly
Working set fits near root: Popular items self-organize to top
Simplicity valued: Easier to implement than AVL/Red-Black

When to avoid:

Worst-case matters: Single operation can be O(n)
Concurrent access: Splaying modifies structure on every read
Predictable latency required: Amortized ≠ guaranteed

Working Set Theorem

Splay trees have a remarkable property: accessing an item that was accessed t operations ago takes O(log t) amortized time, regardless of total tree size. This means frequently accessed items are effectively O(1) to access. This 'working set property' makes splay trees ideal for caches and LRU-like structures.

Treaps: Randomized Simplicity

Treaps (tree + heap) combine BST ordering with heap structure using randomization:

Each node has a key (for BST property) and a random priority (for heap property)
The tree is a BST with respect to keys
The tree is a max-heap with respect to priorities
Random priorities probabilistically balance the tree

Treap structure (key:priority):

       50:0.95           ← Highest priority at root
      /       \
   30:0.72   70:0.88
   /    \        \
20:0.15 40:0.31  80:0.42

BST property: 20 < 30 < 40 < 50 < 70 < 80  ✓
Heap property: Each node's priority ≥ children's  ✓

Why randomized balance works:

With priorities chosen uniformly at random, the expected tree structure is equivalent to a BST built from a random insertion order. Random BSTs have expected height O(log n) with high probability.

Treap characteristics:

Aspect	Detail
Expected height	O(log n)
Search	O(log n) expected
Insert	O(log n) expected (BST insert + rotation up)
Delete	O(log n) expected (rotation down + remove)
Simplicity	Very simple to implement
Worst case	O(n) possible but exponentially unlikely

When to use treaps:

Educational purposes (clearest balanced tree to implement)
Expected case is sufficient (worst case extremely rare)
Random access patterns (no adversarial input)
Competitive programming (fast to code correctly)

BST Property Preserved

The heap property on priorities is used only for balancing—all key-based operations (search, range query, successor, etc.) work exactly as in a standard BST. Your BST knowledge transfers completely; the random priorities are 'behind the scenes' for structural purposes only.

Choosing the Right Tree Structure

With so many balanced tree variants, how do you choose? Here's a decision framework:

Balanced Tree Selection Guide
Scenario	Recommended	Reason
General purpose, in-memory	Red-Black (use library)	Best all-around for mixed workloads
Read-heavy, rare writes	AVL	Strictest balance → fastest reads
Disk-based storage	B+ Tree	Optimized for block I/O
Database index	B+ Tree	Range scans, high branching factor
Cache with access locality	Splay Tree	Recently accessed items stay hot
Simple implementation needed	Treap	Easy to code, good expected performance
Concurrent access	Specialized (e.g., lock-free)	Standard trees need modification
Real-time systems	AVL	Strictest worst-case guarantees

Decision flowchart:

Data in memory or on disk?
│
├─ Memory: Read/write ratio?
│   ├─ Read-heavy: AVL
│   ├─ Write-heavy: Red-Black
│   └─ Unknown: Red-Black (default)
│
└─ Disk: Need range scans?
    ├─ Yes: B+ Tree
    └─ No (point lookups only): B-Tree

Special requirements?
├─ Access locality matters: Splay
├─ Simple implementation: Treap
├─ Concurrent access: Specialized structure
└─ None: Use language standard library

The 90% answer: Use your language's standard library TreeMap/std::map/equivalent. It's Red-Black, well-tested, and sufficient for nearly all in-memory use cases.

Don't Optimize Prematurely

Unless you have measured evidence that tree operations are your bottleneck, use the standard library. The difference between AVL and Red-Black is microseconds. Focus on algorithm choice (tree vs. hash table) before tree variant selection.

The Unified Mental Model

As we conclude this exploration, let's crystallize the unified mental model that connects all balanced search trees:

The BST Core (Universal)

•Ordering property: Left subtree < node < right subtree
•Logarithmic potential: If balanced, height is O(log n)
•Search pattern: Compare and choose direction → O(h) operations
•Traversal: In-order yields sorted sequence
•Key operations: Search, insert, delete, min, max, successor, predecessor
•Augmentation: Add aggregate data for additional queries

The Balancing Layer (Varies)

•AVL: Height metadata + strict balance + rotations
•Red-Black: Color metadata + relaxed balance + rotations + color flips
•B-Tree: Multi-way extension + splits and merges
•Splay: No metadata + splay on access
•Treap: Priority metadata + heap-based rotations

The key insight:

The BST property is the invariant that enables efficient search. Balancing is the mechanism that guarantees the tree stays efficient over time. Different balancing mechanisms trade off:

Strictness vs. modification speed
Worst-case vs. amortized guarantees
Memory vs. disk optimization
Implementation complexity vs. simplicity

But the core remains constant. Master the BST, and you understand the foundation of every search tree.

Summary: BSTs and the Balanced Tree Family

We've connected BST concepts to the broader family of balanced trees. Let's consolidate the key insights:

Key Takeaways

•BST property is universal — All balanced trees maintain left < node < right ordering
•Balance is the variable — Different trees use different mechanisms to guarantee O(log n) height
•AVL = strict height balance — Best for read-heavy, worst-case-critical applications
•Red-Black = relaxed balance — Best for general-purpose, standard library choice
•B-Trees = multi-way for disk — Foundation of databases and file systems
•Splay = self-adjusting — Optimizes for access patterns automatically
•Treaps = randomized simplicity — Easy implementation with good expected performance
•Your BST knowledge transfers — Search, traversal, range queries, augmentation all apply

Module complete:

This concludes our exploration of BST Variants & Extensions. You've learned to handle duplicates with multiple strategies, implement order statistics for rank-based queries, design custom augmentations for specialized needs, and understand how BSTs connect to the entire family of balanced search trees.

The BST is not just a data structure—it's a paradigm for organizing comparable data that scales from educational examples to the databases and file systems powering the digital world.

Module Complete

Congratulations! You've mastered BST Variants & Extensions. You now understand duplicate handling strategies, order statistics trees, augmented BST patterns, and how BST concepts extend to AVL, Red-Black, B-trees, and beyond. This knowledge forms the foundation for understanding virtually any tree-based data structure you'll encounter.

Connection to Other Balanced Trees

The BST Foundation Extends

The BST isn't just a data structure—it's a foundation. The core ideas you've learned extend to an entire family of tree structures, each optimized for different scenarios:

AVL Trees: Strict height balance for predictable performance
Red-Black Trees: Relaxed balance for faster modifications
B-Trees: Multi-way trees for disk-based storage
B+ Trees: B-tree variant for database indexes
Splay Trees: Self-adjusting for access pattern optimization
Treaps: Randomized structure for simplicity

This page connects the BST concepts you've mastered to this broader family, showing how the same fundamental ideas manifest in different contexts.

What You Will Master

The Universal BST Foundation

Every balanced tree variant shares the BST's core property:

For every node, all keys in the left subtree are less than the node's key, and all keys in the right subtree are greater.

What varies between balanced tree types:

Balance invariant: How strictly height is controlled
Rebalancing mechanism: Rotations, splits, color changes, etc.
Node structure: Binary vs. multi-way, additional metadata
Optimization target: Worst-case height, modification cost, cache efficiency

Balanced Tree Family: Shared Foundation
Property	Shared by All
Ordering property	Yes: left < node < right
O(log n) search	Yes (with balance)
In-order gives sorted output	Yes
Supports range queries	Yes
Min/Max at extremes	Yes (leftmost/rightmost)
Augmentation compatible	Yes (with care for rebalancing)

Transfer of Knowledge

AVL Trees: Strict Height Balance

AVL trees (named after inventors Adelson-Velsky and Landis, 1962) are the oldest self-balancing BST. They maintain a strict balance invariant:

AVL Property: For every node, the heights of its left and right subtrees differ by at most 1.

The balance factor of a node is height(left) - height(right). Valid balance factors are {-1, 0, 1}.

How it connects to BST:

Same BST property for ordering
Same search algorithm (no modification needed)
Addition: height tracking at each node
Addition: Rebalancing via rotations after modifications

avl-bst-connection.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
interface AVLNode<K, V> {
    key: K;
    value: V;
    left: AVLNode<K, V> | null;
    right: AVLNode<K, V> | null;
    height: number;  // AVL-specific augmentation
}
 
/**
 * AVL search is IDENTICAL to BST search.
 * The balance property doesn't change how we search—
 * it just guarantees the height is O(log n).
 */
function avlSearch<K, V>(
    node: AVLNode<K, V> | null,
    key: K,
    compare: (a: K, b: K) => number
): V | null {
    if (node === null) return null;
    
    const cmp = compare(key, node.key);
    if (cmp === 0) return node.value;
    if (cmp < 0) return avlSearch(node.left, key, compare);
    return avlSearch(node.right, key, compare);
}
 
/**
 * BST in-order traversal works unchanged.
 * Produces sorted output due to BST property.
 */
function avlInorder<K, V>(
    node: AVLNode<K, V> | null,
    callback: (key: K, value: V) => void
): void {
    if (node === null) return;
    avlInorder(node.left, callback);
    callback(node.key, node.value);
    avlInorder(node.right, callback);
}
 
/**
 * The difference is in INSERT and DELETE:
 * After standard BST operation, check balance and rotate if needed.
 */

AVL characteristics:

Aspect	Detail
Height guarantee	h ≤ 1.44 × log₂(n + 2)
Search	O(log n) guaranteed
Insert	O(log n) — may need up to 2 rotations
Delete	O(log n) — may need up to O(log n) rotations
Space overhead	1 integer (height) per node
Best for	Read-heavy workloads with occasional writes

When to use AVL:

You need strictest height guarantees
Lookups vastly outnumber modifications
Predictable worst-case performance is critical
Examples: In-memory databases, read-heavy caches

Red-Black Trees: Relaxed Balance

Red-Black trees relax the AVL balance requirement in exchange for faster modifications. Each node is colored red or black, and the tree maintains five properties:

Every node is red or black
The root is black
Every null leaf is considered black
Red nodes cannot have red children (no red-red parent-child)
Every path from any node to its null descendants has the same count of black nodes

Key insight: Property 5 ensures the tree is 'black-balanced,' which limits height to 2 × log(n + 1).

Connection to BST:

Same BST property for ordering
Same search and traversal algorithms
Addition: Color bit per node (just 1 bit!)
Different rebalancing mechanism: color flips + rotations

Red-Black vs AVL:

Aspect	AVL	Red-Black
Height bound	~1.44 log n	~2 log n
Lookup speed	Slightly faster (shorter height)	Slightly slower
Insert rotations	Up to 2	Up to 2
Delete rotations	Up to O(log n)	Up to 3
Insert/Delete speed	Slower	Faster
Color storage	Not needed	1 bit per node
Typically used	Databases, search-heavy apps	Standard libraries, OS kernels

Why Red-Black is more common:

Red-Black trees limit delete rotations to O(1), making them faster for write-heavy workloads. This is why:

C++ std::map and std::set use Red-Black trees
Java TreeMap and TreeSet use Red-Black trees
Linux kernel's Completely Fair Scheduler uses Red-Black trees
Many other system-level implementations choose Red-Black

Implementation Complexity

B-Trees: Multi-Way Extension for Disk

B-Trees generalize the BST concept from binary to multi-way search trees. In a B-tree of order m:

Each node can have up to m children
Each node can store up to m-1 keys
Keys within a node are sorted
Keys act as separators directing search to the correct child

The BST connection is clear:

A BST is essentially a B-tree of order 2 (each node has up to 2 children, 1 key). B-trees generalize this to larger nodes.

BST Node (B-tree order 2):    B-tree Node (order 5):
      [K]                     [K₁|K₂|K₃|K₄]
     /   \                   /  |  |  |  \
   <K    >K                <K₁ K₁-K₂ K₂-K₃ K₃-K₄ >K₄

Why B-trees exist: The disk access problem

When data is stored on disk (HDD or SSD), the cost model changes:

Memory operations: ~100 nanoseconds
Disk random access: ~10 milliseconds (HDD) or ~100 microseconds (SSD)
Disk operations are 1,000-100,000x slower than memory!

In a BST with n nodes, search requires O(log₂ n) node accesses. For n = 1 billion:

log₂(10⁹) ≈ 30 disk accesses per search
At 10ms each: 300ms per search (!)

B-trees with large order m reduce this:

log_m(10⁹) for m=1000 ≈ 3 disk accesses
At 10ms each: 30ms per search (10x improvement)

By making nodes large enough to match disk block sizes (typically 4KB-64KB), B-trees minimize disk I/O.

b-tree-structure.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
/**
 * B-tree node conceptual structure.
 * This is simplified for understanding—real implementations  
 * are more complex for performance.
 */
interface BTreeNode<K, V> {
    keys: K[];           // Sorted keys in this node
    values: V[];         // Values corresponding to keys
    children: BTreeNode<K, V>[];  // Child pointers (keys.length + 1 of them)
    isLeaf: boolean;     // Whether this is a leaf node
}
 
/**
 * B-tree search conceptually mirrors BST search.
 * At each node, find which child subtree contains the key.
 */
function btreeSearch<K, V>(
    node: BTreeNode<K, V>,
    key: K,
    compare: (a: K, b: K) => number
): V | null {
    // Binary search within node to find key or child index
    let i = 0;
    while (i < node.keys.length && compare(key, node.keys[i]) > 0) {
        i++;
    }
    
    // Check if we found the key in this node
    if (i < node.keys.length && compare(key, node.keys[i]) === 0) {
        return node.values[i];
    }
    
    // If leaf, key doesn't exist
    if (node.isLeaf) {
        return null;
    }
    
    // Recurse to appropriate child
    return btreeSearch(node.children[i], key, compare);
}
 
/**
 * Key insight: This is the same logic as BST search!
 * 
 * BST:     if (key < node.key) go left; else if (key > node.key) go right;
 * B-tree:  find i where keys[i-1] < key < keys[i]; go to children[i]
 * 
 * The fundamental "compare and choose direction" is identical.
 */

B-Tree Applications

B+ Trees: Database Index Specialization

B+ Trees are a refinement of B-trees specifically optimized for database use:

All values at leaves: Internal nodes store only keys (for routing); actual data is at leaves only
Leaf nodes are linked: A linked list connects all leaves for efficient range scans
Higher branching factor: Without values in internal nodes, more keys fit per node

B+ Tree Structure:

         [30|60|90]                    ← Internal: keys only
        /    |    |    \
   [10|20] [40|50] [70|80] [100|110]   ← Leaves: key-value pairs
      ↔       ↔       ↔       ↔        ← Linked for range scans

Why B+ trees dominate databases:

Feature	Benefit
Values only at leaves	Internal nodes are smaller → higher branching → fewer disk reads
Leaf linked list	Range queries are sequential disk reads (fast!)
Predictable leaf access	Every search touches the same number of levels
Cache-friendly	Full internal node caching practical for hot paths

Range query example:

SELECT * FROM users WHERE age BETWEEN 25 AND 35

Search for key 25: Follow tree to leaf (O(log n) disk reads)
Scan linked list until key > 35: Sequential reads (very fast)
Total: O(log n + k) where k is result count

In a BST, range queries require in-order traversal which jumps around memory/disk. B+ tree leaf links enable sequential access—a massive performance win for spinning disks.

Your BST Knowledge Applies

Splay Trees: Self-Adjusting for Access Patterns

Splay trees take a fundamentally different approach to balance. Instead of maintaining a strict invariant, they restructure based on access patterns:

After every operation (search, insert, delete), splay the accessed node to the root.

Splaying is a series of rotations that moves a node to the root while roughly halving the depth of nodes on the access path. This provides:

Amortized O(log n): Over a sequence of operations, average cost is O(log n)
Access locality: Recently accessed items are near root
No balance metadata: Nodes don't need height or color stored

Connection to BST:

Splay trees are pure BSTs—the ordering property is unchanged, search follows the same logic. The only additions are:

After search: splay the found node (or last node on failed path) to root
After insert: splay the new node to root
After delete: splay the parent of deleted node to root

When splay trees excel:

Access patterns have locality: Same keys accessed repeatedly
Working set fits near root: Popular items self-organize to top
Simplicity valued: Easier to implement than AVL/Red-Black

When to avoid:

Worst-case matters: Single operation can be O(n)
Concurrent access: Splaying modifies structure on every read
Predictable latency required: Amortized ≠ guaranteed

Working Set Theorem

Treaps: Randomized Simplicity

Treaps (tree + heap) combine BST ordering with heap structure using randomization:

Each node has a key (for BST property) and a random priority (for heap property)
The tree is a BST with respect to keys
The tree is a max-heap with respect to priorities
Random priorities probabilistically balance the tree

Treap structure (key:priority):

       50:0.95           ← Highest priority at root
      /       \
   30:0.72   70:0.88
   /    \        \
20:0.15 40:0.31  80:0.42

BST property: 20 < 30 < 40 < 50 < 70 < 80  ✓
Heap property: Each node's priority ≥ children's  ✓

Why randomized balance works:

With priorities chosen uniformly at random, the expected tree structure is equivalent to a BST built from a random insertion order. Random BSTs have expected height O(log n) with high probability.

Treap characteristics:

Aspect	Detail
Expected height	O(log n)
Search	O(log n) expected
Insert	O(log n) expected (BST insert + rotation up)
Delete	O(log n) expected (rotation down + remove)
Simplicity	Very simple to implement
Worst case	O(n) possible but exponentially unlikely

When to use treaps:

Educational purposes (clearest balanced tree to implement)
Expected case is sufficient (worst case extremely rare)
Random access patterns (no adversarial input)
Competitive programming (fast to code correctly)

BST Property Preserved

Choosing the Right Tree Structure

With so many balanced tree variants, how do you choose? Here's a decision framework:

Balanced Tree Selection Guide
Scenario	Recommended	Reason
General purpose, in-memory	Red-Black (use library)	Best all-around for mixed workloads
Read-heavy, rare writes	AVL	Strictest balance → fastest reads
Disk-based storage	B+ Tree	Optimized for block I/O
Database index	B+ Tree	Range scans, high branching factor
Cache with access locality	Splay Tree	Recently accessed items stay hot
Simple implementation needed	Treap	Easy to code, good expected performance
Concurrent access	Specialized (e.g., lock-free)	Standard trees need modification
Real-time systems	AVL	Strictest worst-case guarantees

Decision flowchart:

Data in memory or on disk?
│
├─ Memory: Read/write ratio?
│   ├─ Read-heavy: AVL
│   ├─ Write-heavy: Red-Black
│   └─ Unknown: Red-Black (default)
│
└─ Disk: Need range scans?
    ├─ Yes: B+ Tree
    └─ No (point lookups only): B-Tree

Special requirements?
├─ Access locality matters: Splay
├─ Simple implementation: Treap
├─ Concurrent access: Specialized structure
└─ None: Use language standard library

The 90% answer: Use your language's standard library TreeMap/std::map/equivalent. It's Red-Black, well-tested, and sufficient for nearly all in-memory use cases.

Don't Optimize Prematurely

The Unified Mental Model

As we conclude this exploration, let's crystallize the unified mental model that connects all balanced search trees:

The BST Core (Universal)

•Ordering property: Left subtree < node < right subtree
•Logarithmic potential: If balanced, height is O(log n)
•Search pattern: Compare and choose direction → O(h) operations
•Traversal: In-order yields sorted sequence
•Key operations: Search, insert, delete, min, max, successor, predecessor
•Augmentation: Add aggregate data for additional queries

The Balancing Layer (Varies)

•AVL: Height metadata + strict balance + rotations
•Red-Black: Color metadata + relaxed balance + rotations + color flips
•B-Tree: Multi-way extension + splits and merges
•Splay: No metadata + splay on access
•Treap: Priority metadata + heap-based rotations

The key insight:

The BST property is the invariant that enables efficient search. Balancing is the mechanism that guarantees the tree stays efficient over time. Different balancing mechanisms trade off:

Strictness vs. modification speed
Worst-case vs. amortized guarantees
Memory vs. disk optimization
Implementation complexity vs. simplicity

But the core remains constant. Master the BST, and you understand the foundation of every search tree.

Summary: BSTs and the Balanced Tree Family

We've connected BST concepts to the broader family of balanced trees. Let's consolidate the key insights:

Key Takeaways

•BST property is universal — All balanced trees maintain left < node < right ordering
•Balance is the variable — Different trees use different mechanisms to guarantee O(log n) height
•AVL = strict height balance — Best for read-heavy, worst-case-critical applications
•Red-Black = relaxed balance — Best for general-purpose, standard library choice
•B-Trees = multi-way for disk — Foundation of databases and file systems
•Splay = self-adjusting — Optimizes for access patterns automatically
•Treaps = randomized simplicity — Easy implementation with good expected performance
•Your BST knowledge transfers — Search, traversal, range queries, augmentation all apply

Module complete:

The BST is not just a data structure—it's a paradigm for organizing comparable data that scales from educational examples to the databases and file systems powering the digital world.

Module Complete