Database Management SystemsB-Trees & B+-Trees

B-Tree Concept

LevelIntermediate

Duration60 mins

TopicB-Trees & B+-Trees

2 / 5

B-tree Properties

The Rules That Guarantee Performance

Every B-tree implementation must uphold a specific set of invariants—properties that remain true before and after every operation. These aren't arbitrary constraints; they're carefully chosen rules that together guarantee the logarithmic height, efficient space utilization, and reliable performance that make B-trees the foundation of modern database indexing.

Understanding these properties deeply is essential because they explain why B-tree algorithms work the way they do. Every split during insertion, every merge during deletion, every rotation during rebalancing—all exist to maintain these invariants.

What You Will Learn

By the end of this page, you will understand each B-tree property in depth, know why each property is necessary, see how properties interact to guarantee performance, and be able to identify invalid B-trees by spotting property violations.

Order and Degree Conventions

Before diving into properties, we must address a significant source of confusion: different textbooks and implementations use different terminology. Understanding these conventions is essential for reading any B-tree literature or documentation.

The Two Major Conventions

Convention 1: Maximum Children (Order m)

Used by Knuth, most database textbooks, and our curriculum
A B-tree of order m has at most m children per node
Each node has at most m-1 keys
Non-root nodes have at least ⌈m/2⌉ children (at least ⌈m/2⌉ - 1 keys)

Convention 2: Minimum Degree (t)

Used by CLRS (Cormen et al.) and algorithms courses
Every non-root node has at least t-1 keys and at most 2t-1 keys
Equivalent to order m = 2t in the first convention
The root has at least 1 key (can be empty only if tree is empty)

Convention Translation
Property	Order Convention (m)	Minimum Degree Convention (t)
Maximum keys per node	m - 1	2t - 1
Minimum keys (non-root)	⌈m/2⌉ - 1	t - 1
Maximum children	m	2t
Minimum children (non-root)	⌈m/2⌉	t
Example: m=5 or t=3	max 4 keys, min 2 keys	max 5 keys, min 2 keys

Watch Out!

The conventions are NOT equivalent for odd values! A B-tree of 'order 5' has max 4 keys, but a B-tree of 'degree 3' has max 5 keys. Always verify which convention a source uses before applying formulas. In this curriculum, we use the maximum children (order m) convention.

Why the Confusion Exists

Bayer and McCreight's original paper used the maximum children convention. When Cormen et al. wrote their influential algorithms textbook, they chose the minimum degree convention because it simplifies certain proofs. Both are valid; both are widely used.

Practical Note: Database documentation typically uses 'order' loosely to mean 'approximately how many keys/children per node.' For actual implementation, what matters is:

Block size (how large is each node?)
Key size and pointer size
Calculated maximum entries that fit in a block

The theoretical conventions help with analysis; practical implementations derive limits from hardware constraints.

Property 1: Bounded Node Size

Formal Statement: Every node in a B-tree contains at most m-1 keys and has at most m children.

This is the upper bound property—it limits how full any node can become.

Why This Property Is Essential

Disk Block Alignment: Nodes must fit within disk blocks. If nodes could grow unboundedly, we'd need multiple I/O operations per node.
Memory Allocation: Fixed maximum size allows predictable memory allocation. The data structure doesn't require dynamic resizing of individual nodes.
Efficient In-Node Search: Binary search within a node is O(log m). Unbounded m would degrade in-node search performance.

maximum-node-capacity.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
-- Example: PostgreSQL B-tree page calculation
-- Default page size: 8KB (8192 bytes)
-- Index tuple header: ~8 bytes
-- Key (integer): 4 bytes
-- Child pointer: 6 bytes (PostgreSQL TID)
-- Per-entry overhead: ~4 bytes
 
-- Available space per page: ~8000 bytes (after header)
-- Space per entry: 4 + 6 + 4 = ~14 bytes
-- Maximum entries: 8000 / 14 ≈ 571
 
-- Therefore: PostgreSQL integer B-tree has order ≈ 570
-- Each node can hold up to 570 children and 569 keys

What Happens at Maximum Capacity

When an insertion would exceed the maximum, the node must split:

Create a new sibling node
Move half the keys to the sibling
Promote the median key to the parent
If the parent overflows, recursively split up the tree

This splitting mechanism is how B-trees grow in height—only when the root splits does height increase. This guarantees that height increases are rare and global (all leaves remain at the same level).

The Invariant Maintained:

For all nodes N: keys(N) ≤ m-1 AND children(N) ≤ m

Practical Fullness

In practice, B-tree nodes are often 50-60% full on average due to random insertions. Some implementations allow 'fill factor' settings to leave room for future insertions, reducing splits at the cost of slightly more nodes.

Property 2: Minimum Occupancy

Formal Statement: Every non-root internal node has at least ⌈m/2⌉ children (and therefore at least ⌈m/2⌉ - 1 keys).

This is the lower bound property—it prevents nodes from becoming too sparse.

Why This Property Is Essential

Space Efficiency: Guarantees at least 50% utilization. Without this, a malicious sequence of insertions and deletions could create nearly-empty nodes, wasting disk space and increasing tree height.
Height Bound: The minimum occupancy directly determines the maximum tree height. More children per node means fewer levels needed.
Balanced Structure: Combined with the leaf-level property, minimum occupancy ensures the tree remains balanced regardless of insertion/deletion patterns.

Minimum Occupancy by Order
Order (m)	⌈m/2⌉	Min Children	Min Keys	Utilization
3	2	2	1	50%
4	2	2	1	50%
5	3	3	2	50%
100	50	50	49	50%
256	128	128	127	50%

What Happens Below Minimum Capacity

When a deletion would reduce a node below minimum occupancy, one of three things happens:

1. Redistribution (Rotation): If a sibling has extra keys, borrow one through the parent.

Before:  [10 | 20]      After:   [15 | 20]
         /  |  \                 /  |  \
      [5] [15|17] [25]       [5|10] [17] [25]

(Borrow 15 from right sibling; 10 moves down from parent)

2. Merge: If no sibling can spare keys, merge with a sibling, pulling down a parent key.

Before:  [10 | 20 | 30]    After:   [10 | 30]
         /  |  \   \                /  |   \
      [5] [15] [25] [35]       [5] [15|20|25] [35]

(Merge [15] with [25], pull down 20 from parent)

3. Recursive Propagation: If merging causes the parent to underflow, continue up the tree.

The Root Exception

The root is special: it can have as few as 1 key (2 children) or even 0 keys if the tree contains a single node. This exception exists because:

The tree must start somewhere (single-node trees are valid)
When the last child of the root merges, the root shrinks, reducing tree height

The 50% Guarantee

Minimum occupancy guarantees that a B-tree never wastes more than 50% of its allocated space. In practice, random insertions typically achieve 67-69% utilization (ln(2) ≈ 0.693). Sequential insertions can achieve even higher utilization with leaf-splitting strategies.

Property 3: Key-Child Relationship

Formal Statement: A node with k children contains exactly k-1 keys.

This property establishes the structural relationship between keys and child pointers within each node.

The Separator Role of Keys

Keys in a B-tree node serve as separators or fence posts that divide the key space into regions. Think of it like a fence:

       Keys:    [  K₁  |  K₂  |  K₃  ]
                  |      |      |      |
    Children:   C₀     C₁     C₂     C₃
                  ↓      ↓      ↓      ↓
              < K₁   K₁≤x<K₂  K₂≤x<K₃  ≥ K₃

3 keys create 4 regions (children)
k-1 keys create k regions
Each child pointer leads to a subtree containing keys in that region

The Search Invariant

•All keys in subtree C₀: key < K₁
•All keys in subtree Cᵢ (for 0 < i < k-1): Kᵢ ≤ key < Kᵢ₊₁
•All keys in subtree Cₖ₋₁: key ≥ Kₖ₋₁
•This invariant enables efficient search: compare the target with separator keys to determine which subtree to explore

Why k-1 Keys for k Children?

Mathematically, if you have k intervals, you need k-1 separators:

2 intervals → 1 separator (like binary search)
3 intervals → 2 separators
k intervals → k-1 separators

This is analogous to:

A road with k segments needs k-1 mile markers
An array of k sections needs k-1 dividing indices
A sorted sequence of k teams needs k-1 ranking positions between them

Leaf Node Interpretation

Leaf nodes are slightly different: they have keys but no child pointers (or null child pointers). Some definitions say leaves have k keys and k+1 null pointers; others say leaves simply have k keys.

Conceptually:

Internal nodes: keys are separators for navigation
Leaf nodes: keys are the actual data (in classic B-trees) or point to data (in B+-trees)

The Invariant Maintained:

For all non-leaf nodes N: children(N) = keys(N) + 1
For all leaf nodes N: children(N) = 0

Implementation Detail

In code, B-tree nodes often store children as an array of size m (or m+1) and keys as an array of size m-1 (or m). A 'count' or 'numKeys' field tracks how many slots are actually used. Empty slots are typically null or contain sentinel values.

Property 4: Key Ordering

Formal Statement: Keys within each node are stored in non-decreasing (sorted) order: K₁ ≤ K₂ ≤ K₃ ≤ ... ≤ Kₖ₋₁.

This property enables efficient in-node search and maintains the global ordering necessary for correct navigation.

Why Sorted Order Matters

1. Binary Search Within Nodes

With sorted keys, finding the appropriate child pointer is O(log k) using binary search:

Node: [5 | 12 | 23 | 47 | 89]

Search for 30:
- Compare with middle (23): 30 > 23 → search right half
- Compare with middle of right (47): 30 < 47 → search left portion
- Position found: between 23 and 47 → follow pointer between them

Total comparisons: O(log k) ≈ O(log m)

2. Sequential Access Support

Sorted order within nodes supports range queries. Once you find the starting point, you scan forward through sorted keys.

binary-search-in-node.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
function findChildIndex(node, searchKey):
    keys = node.keys
    left = 0
    right = node.numKeys - 1
    
    // Binary search for the position
    while left <= right:
        mid = (left + right) / 2
        if searchKey == keys[mid]:
            return mid + 1  // Exact match, go to right subtree
        else if searchKey < keys[mid]:
            right = mid - 1
        else:
            left = mid + 1
    
    // searchKey not found, return insertion position
    // left now points to first key greater than searchKey
    return left
 
// The returned index is the child pointer to follow

Maintaining Sorted Order During Insertion

When inserting a new key into a node, the key must be placed in the correct position to maintain sorted order:

Before: [5 | 12 | 47 | 89]  (inserting 23)

1. Find insertion position: between 12 and 47 (index 2)
2. Shift elements right: [5 | 12 | _ | 47 | 89]
3. Insert key: [5 | 12 | 23 | 47 | 89]

After: [5 | 12 | 23 | 47 | 89] ✓ Sorted order maintained

This shifting operation is O(m) in the worst case but happens in memory after the node is loaded, so it's fast compared to disk I/O.

Duplicate Keys

The 'non-decreasing' allows for duplicate keys in classic B-tree definitions. Different databases handle duplicates differently:

Allow duplicates with ties: K₁ = K₂ = K₃ is valid
Unique key with TID: Store (key, tuple_id) pairs to break ties
Overflow pages: Store duplicate key values in linked overflow structures

The Invariant Maintained:

For all nodes N with keys K₁, K₂, ..., Kₖ:
  K₁ ≤ K₂ ≤ ... ≤ Kₖ

Comparison Cost

For simple keys (integers, fixed-length strings), comparisons are fast. For complex keys (variable-length strings, composite keys), comparison cost matters. Some B-tree implementations cache prefix information or use other tricks to minimize comparison overhead.

Property 5: Subtree Ordering

Formal Statement: For a node with keys K₁, K₂, ..., Kₖ₋₁ and children C₀, C₁, ..., Cₖ₋₁:

All keys in subtree rooted at C₀ are less than K₁
All keys in subtree rooted at Cᵢ (for 0 < i < k-1) are ≥ Kᵢ and < Kᵢ₊₁
All keys in subtree rooted at Cₖ₋₁ are ≥ Kₖ₋₁

This property extends the key ordering from within nodes to the entire tree structure. It's the property that makes B-tree search work correctly.

Visual Representation

Converting Mermaid diagram...

The Global Ordering Property

Subtree ordering creates a global sorted order across the entire tree. An in-order traversal of a B-tree visits all keys in sorted order:

Recursively visit subtree C₀
Visit key K₁
Recursively visit subtree C₁
Visit key K₂
... and so on

This is why B-trees excel at range queries: once you find the starting key, all subsequent keys in range are reachable by forward traversal.

Search Algorithm Correctness

Subtree ordering is what makes B-tree search correct. When searching for key K:

Start at root
At each node, find position i where Kᵢ₋₁ ≤ K < Kᵢ (or edge cases)
Follow child pointer Cᵢ
Repeat until finding K or reaching a leaf

The subtree ordering property guarantees that if K exists in the tree, following this algorithm will find it. More importantly, it guarantees that if K is not found by this algorithm, it doesn't exist anywhere in the tree.

btree-search.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
function search(node, targetKey):
    i = 0
    
    // Find the first key greater than targetKey
    while i < node.numKeys AND targetKey > node.keys[i]:
        i = i + 1
    
    // Check if we found exact match
    if i < node.numKeys AND targetKey == node.keys[i]:
        return (node, i)  // Found!
    
    // If leaf, key doesn't exist
    if node.isLeaf:
        return NOT_FOUND
    
    // Otherwise, recurse into appropriate child
    // Subtree ordering guarantees correctness!
    return search(node.children[i], targetKey)

The Power of Invariants

Every B-tree operation (insert, delete, split, merge) must preserve subtree ordering. This is verified during debugging by checking that an in-order traversal produces a sorted sequence. If in-order traversal ever produces unsorted output, the B-tree is corrupted.

Property 6: Perfect Leaf Balance

Formal Statement: All leaf nodes appear at the same depth (the tree has uniform depth).

This is perhaps the most distinctive property of B-trees compared to other tree structures. While AVL trees and Red-Black trees maintain bounded height difference, B-trees maintain zero height difference—all paths from root to leaf have identical length.

Why Perfect Balance Is Revolutionary

Consider the implications:

BST / AVL / Red-Black

•Leaves can be at different depths
•Search time varies: O(log n) average but path lengths differ
•Worstcase path may be 2x longest path
•Rotations during insert/delete to rebalance
•Balancing is reactive (fix after breaking)

B-tree

•ALL leaves at exactly the same depth
•Search time is exactly O(log n) for ALL searches
•Every search traverses identical number of levels
•Splits and merges maintain balance inherently
•Balancing is proactive (structure prevents imbalance)

How B-trees Maintain Perfect Balance

The key insight is that B-trees grow and shrink at the root, not at the leaves:

Tree Growth (Height Increase):

Insertions happen at leaf level
If leaf overflows, it splits; median goes to parent
If parent overflows, it splits; median goes to grandparent
This propagates up to root
If root overflows, it splits; a new root is created with one key
Result: ALL leaves are now one level deeper

Tree Shrinkage (Height Decrease):

Deletions happen at leaf level (after swapping if necessary)
If leaf underflows, merge or redistribute
Merge propagates up if parent underflows
If root has only one child after merge, that child becomes new root
Result: ALL leaves are now one level shallower

The Invariant Maintained:

For all leaf nodes L₁, L₂: depth(L₁) = depth(L₂) = h
where h is the tree height

Predictable Performance

Perfect balance means predictable I/O cost. A database query optimizer can precisely calculate the number of disk accesses needed to traverse a B-tree index: it's exactly equal to the tree height plus one (for the leaf). No variance, no surprises.

Properties Working Together

The six properties we've discussed don't exist in isolation—they work together as a cohesive system that guarantees B-tree performance. Let's see how they interact.

Property Interactions
Property	Primary Purpose	Works With	Guarantees
Bounded Node Size	Limit max size	Minimum Occupancy	Node fits in disk block
Minimum Occupancy	Prevent sparse nodes	Bounded Size, Leaf Balance	50%+ utilization, height bound
Key-Child Relationship	Structure integrity	Key Ordering, Subtree Ordering	Correct navigation
Key Ordering	Efficient in-node search	Subtree Ordering	Binary search within node
Subtree Ordering	Global sorted order	All other properties	Correct search, range queries
Perfect Leaf Balance	Uniform depth	Min Occupancy, Bounded Size	Predictable O(log n) height

The Height Guarantee

Combining minimum occupancy and leaf balance, we can derive the height bound:

Each internal node has at least ⌈m/2⌉ children
With n keys, height h satisfies: n + 1 ≤ 2 × (⌈m/2⌉)^h
Solving: h ≤ log_{⌈m/2⌉}((n+1)/2)
For large m: h ≈ log_m(n)

The Space Guarantee

Combining all properties:

Maximum space: n keys stored in at most n + n/(⌈m/2⌉-1) + ... ≈ n × (1 + 2/(m-2)) nodes
Minimum utilization: ~50% of allocated node space contains data
Wastage bounded: at most 50% overhead regardless of insertion/deletion pattern

The Time Guarantee

All operations (search, insert, delete):

Traverse at most h levels: O(log_m n) disk accesses
At each level, O(log m) comparisons for in-node search
Total comparisons: O(log_m n × log m) = O(log n)
Dominant cost: O(log_m n) disk I/O

The Complete Picture

Together, these properties make B-trees optimal for disk-based systems: guaranteed logarithmic height with respect to a large base (minimizing I/O), guaranteed 50%+ space utilization (efficient disk usage), and guaranteed sorted order (enabling range queries). No single property could provide these guarantees alone.

Summary: B-tree Properties

We've explored each B-tree property in depth. Let's consolidate:

Key Takeaways

•Bounded node size ensures nodes fit in disk blocks, enabling single-I/O node access.
•Minimum occupancy guarantees 50%+ utilization and bounds tree height through sufficient fanout.
•Key-child relationship (k-1 keys for k children) establishes the separator-pointer structure.
•Key ordering within nodes enables efficient O(log m) binary search.
•Subtree ordering extends ordering globally, making correct search possible.
•Perfect leaf balance guarantees all searches traverse exactly h levels—no variance.
•Properties work together to provide provable O(log_m n) I/O, 50%+ utilization, and correct semantics.

What's Next

With properties understood, the next page examines node structure in detail—how individual B-tree nodes are laid out in memory and on disk, what metadata they contain, and how internal vs. leaf nodes differ in structure and purpose.

Page Complete

You now understand the six essential B-tree properties and how they work together. You can identify property violations, explain why each constraint exists, and derive the performance guarantees that follow from these invariants. Next, we'll see how these properties manifest in concrete node structure.

2 / 5

Loading learning content...

Database Management SystemsB-Trees & B+-Trees

B-Tree Concept

LevelIntermediate

Duration60 mins

TopicB-Trees & B+-Trees

2 / 5

B-tree Properties

The Rules That Guarantee Performance

What You Will Learn

Order and Degree Conventions

The Two Major Conventions

Convention 1: Maximum Children (Order m)

Used by Knuth, most database textbooks, and our curriculum
A B-tree of order m has at most m children per node
Each node has at most m-1 keys
Non-root nodes have at least ⌈m/2⌉ children (at least ⌈m/2⌉ - 1 keys)

Convention 2: Minimum Degree (t)

Used by CLRS (Cormen et al.) and algorithms courses
Every non-root node has at least t-1 keys and at most 2t-1 keys
Equivalent to order m = 2t in the first convention
The root has at least 1 key (can be empty only if tree is empty)

Convention Translation
Property	Order Convention (m)	Minimum Degree Convention (t)
Maximum keys per node	m - 1	2t - 1
Minimum keys (non-root)	⌈m/2⌉ - 1	t - 1
Maximum children	m	2t
Minimum children (non-root)	⌈m/2⌉	t
Example: m=5 or t=3	max 4 keys, min 2 keys	max 5 keys, min 2 keys

Watch Out!

Why the Confusion Exists

Practical Note: Database documentation typically uses 'order' loosely to mean 'approximately how many keys/children per node.' For actual implementation, what matters is:

Block size (how large is each node?)
Key size and pointer size
Calculated maximum entries that fit in a block

The theoretical conventions help with analysis; practical implementations derive limits from hardware constraints.

Property 1: Bounded Node Size

Formal Statement: Every node in a B-tree contains at most m-1 keys and has at most m children.

This is the upper bound property—it limits how full any node can become.

Why This Property Is Essential

Disk Block Alignment: Nodes must fit within disk blocks. If nodes could grow unboundedly, we'd need multiple I/O operations per node.
Memory Allocation: Fixed maximum size allows predictable memory allocation. The data structure doesn't require dynamic resizing of individual nodes.
Efficient In-Node Search: Binary search within a node is O(log m). Unbounded m would degrade in-node search performance.

maximum-node-capacity.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
-- Example: PostgreSQL B-tree page calculation
-- Default page size: 8KB (8192 bytes)
-- Index tuple header: ~8 bytes
-- Key (integer): 4 bytes
-- Child pointer: 6 bytes (PostgreSQL TID)
-- Per-entry overhead: ~4 bytes
 
-- Available space per page: ~8000 bytes (after header)
-- Space per entry: 4 + 6 + 4 = ~14 bytes
-- Maximum entries: 8000 / 14 ≈ 571
 
-- Therefore: PostgreSQL integer B-tree has order ≈ 570
-- Each node can hold up to 570 children and 569 keys

What Happens at Maximum Capacity

When an insertion would exceed the maximum, the node must split:

Create a new sibling node
Move half the keys to the sibling
Promote the median key to the parent
If the parent overflows, recursively split up the tree

The Invariant Maintained:

For all nodes N: keys(N) ≤ m-1 AND children(N) ≤ m

Practical Fullness

Property 2: Minimum Occupancy

Formal Statement: Every non-root internal node has at least ⌈m/2⌉ children (and therefore at least ⌈m/2⌉ - 1 keys).

This is the lower bound property—it prevents nodes from becoming too sparse.

Why This Property Is Essential

Space Efficiency: Guarantees at least 50% utilization. Without this, a malicious sequence of insertions and deletions could create nearly-empty nodes, wasting disk space and increasing tree height.
Height Bound: The minimum occupancy directly determines the maximum tree height. More children per node means fewer levels needed.
Balanced Structure: Combined with the leaf-level property, minimum occupancy ensures the tree remains balanced regardless of insertion/deletion patterns.

Minimum Occupancy by Order
Order (m)	⌈m/2⌉	Min Children	Min Keys	Utilization
3	2	2	1	50%
4	2	2	1	50%
5	3	3	2	50%
100	50	50	49	50%
256	128	128	127	50%

What Happens Below Minimum Capacity

When a deletion would reduce a node below minimum occupancy, one of three things happens:

1. Redistribution (Rotation): If a sibling has extra keys, borrow one through the parent.

Before:  [10 | 20]      After:   [15 | 20]
         /  |  \                 /  |  \
      [5] [15|17] [25]       [5|10] [17] [25]

(Borrow 15 from right sibling; 10 moves down from parent)

2. Merge: If no sibling can spare keys, merge with a sibling, pulling down a parent key.

Before:  [10 | 20 | 30]    After:   [10 | 30]
         /  |  \   \                /  |   \
      [5] [15] [25] [35]       [5] [15|20|25] [35]

(Merge [15] with [25], pull down 20 from parent)

3. Recursive Propagation: If merging causes the parent to underflow, continue up the tree.

The Root Exception

The root is special: it can have as few as 1 key (2 children) or even 0 keys if the tree contains a single node. This exception exists because:

The tree must start somewhere (single-node trees are valid)
When the last child of the root merges, the root shrinks, reducing tree height

The 50% Guarantee

Property 3: Key-Child Relationship

Formal Statement: A node with k children contains exactly k-1 keys.

This property establishes the structural relationship between keys and child pointers within each node.

The Separator Role of Keys

Keys in a B-tree node serve as separators or fence posts that divide the key space into regions. Think of it like a fence:

       Keys:    [  K₁  |  K₂  |  K₃  ]
                  |      |      |      |
    Children:   C₀     C₁     C₂     C₃
                  ↓      ↓      ↓      ↓
              < K₁   K₁≤x<K₂  K₂≤x<K₃  ≥ K₃

3 keys create 4 regions (children)
k-1 keys create k regions
Each child pointer leads to a subtree containing keys in that region

The Search Invariant

•All keys in subtree C₀: key < K₁
•All keys in subtree Cᵢ (for 0 < i < k-1): Kᵢ ≤ key < Kᵢ₊₁
•All keys in subtree Cₖ₋₁: key ≥ Kₖ₋₁
•This invariant enables efficient search: compare the target with separator keys to determine which subtree to explore

Why k-1 Keys for k Children?

Mathematically, if you have k intervals, you need k-1 separators:

2 intervals → 1 separator (like binary search)
3 intervals → 2 separators
k intervals → k-1 separators

This is analogous to:

A road with k segments needs k-1 mile markers
An array of k sections needs k-1 dividing indices
A sorted sequence of k teams needs k-1 ranking positions between them

Leaf Node Interpretation

Conceptually:

Internal nodes: keys are separators for navigation
Leaf nodes: keys are the actual data (in classic B-trees) or point to data (in B+-trees)

The Invariant Maintained:

For all non-leaf nodes N: children(N) = keys(N) + 1
For all leaf nodes N: children(N) = 0

Implementation Detail

Property 4: Key Ordering

Formal Statement: Keys within each node are stored in non-decreasing (sorted) order: K₁ ≤ K₂ ≤ K₃ ≤ ... ≤ Kₖ₋₁.

This property enables efficient in-node search and maintains the global ordering necessary for correct navigation.

Why Sorted Order Matters

1. Binary Search Within Nodes

With sorted keys, finding the appropriate child pointer is O(log k) using binary search:

Node: [5 | 12 | 23 | 47 | 89]

Search for 30:
- Compare with middle (23): 30 > 23 → search right half
- Compare with middle of right (47): 30 < 47 → search left portion
- Position found: between 23 and 47 → follow pointer between them

Total comparisons: O(log k) ≈ O(log m)

2. Sequential Access Support

Sorted order within nodes supports range queries. Once you find the starting point, you scan forward through sorted keys.

binary-search-in-node.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
function findChildIndex(node, searchKey):
    keys = node.keys
    left = 0
    right = node.numKeys - 1
    
    // Binary search for the position
    while left <= right:
        mid = (left + right) / 2
        if searchKey == keys[mid]:
            return mid + 1  // Exact match, go to right subtree
        else if searchKey < keys[mid]:
            right = mid - 1
        else:
            left = mid + 1
    
    // searchKey not found, return insertion position
    // left now points to first key greater than searchKey
    return left
 
// The returned index is the child pointer to follow

Maintaining Sorted Order During Insertion

When inserting a new key into a node, the key must be placed in the correct position to maintain sorted order:

Before: [5 | 12 | 47 | 89]  (inserting 23)

1. Find insertion position: between 12 and 47 (index 2)
2. Shift elements right: [5 | 12 | _ | 47 | 89]
3. Insert key: [5 | 12 | 23 | 47 | 89]

After: [5 | 12 | 23 | 47 | 89] ✓ Sorted order maintained

This shifting operation is O(m) in the worst case but happens in memory after the node is loaded, so it's fast compared to disk I/O.

Duplicate Keys

The 'non-decreasing' allows for duplicate keys in classic B-tree definitions. Different databases handle duplicates differently:

Allow duplicates with ties: K₁ = K₂ = K₃ is valid
Unique key with TID: Store (key, tuple_id) pairs to break ties
Overflow pages: Store duplicate key values in linked overflow structures

The Invariant Maintained:

For all nodes N with keys K₁, K₂, ..., Kₖ:
  K₁ ≤ K₂ ≤ ... ≤ Kₖ

Comparison Cost

Property 5: Subtree Ordering

Formal Statement: For a node with keys K₁, K₂, ..., Kₖ₋₁ and children C₀, C₁, ..., Cₖ₋₁:

All keys in subtree rooted at C₀ are less than K₁
All keys in subtree rooted at Cᵢ (for 0 < i < k-1) are ≥ Kᵢ and < Kᵢ₊₁
All keys in subtree rooted at Cₖ₋₁ are ≥ Kₖ₋₁

This property extends the key ordering from within nodes to the entire tree structure. It's the property that makes B-tree search work correctly.

Visual Representation

Converting Mermaid diagram...

The Global Ordering Property

Subtree ordering creates a global sorted order across the entire tree. An in-order traversal of a B-tree visits all keys in sorted order:

Recursively visit subtree C₀
Visit key K₁
Recursively visit subtree C₁
Visit key K₂
... and so on

This is why B-trees excel at range queries: once you find the starting key, all subsequent keys in range are reachable by forward traversal.

Search Algorithm Correctness

Subtree ordering is what makes B-tree search correct. When searching for key K:

Start at root
At each node, find position i where Kᵢ₋₁ ≤ K < Kᵢ (or edge cases)
Follow child pointer Cᵢ
Repeat until finding K or reaching a leaf

btree-search.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
function search(node, targetKey):
    i = 0
    
    // Find the first key greater than targetKey
    while i < node.numKeys AND targetKey > node.keys[i]:
        i = i + 1
    
    // Check if we found exact match
    if i < node.numKeys AND targetKey == node.keys[i]:
        return (node, i)  // Found!
    
    // If leaf, key doesn't exist
    if node.isLeaf:
        return NOT_FOUND
    
    // Otherwise, recurse into appropriate child
    // Subtree ordering guarantees correctness!
    return search(node.children[i], targetKey)

The Power of Invariants

Property 6: Perfect Leaf Balance

Formal Statement: All leaf nodes appear at the same depth (the tree has uniform depth).

Why Perfect Balance Is Revolutionary

Consider the implications:

BST / AVL / Red-Black

•Leaves can be at different depths
•Search time varies: O(log n) average but path lengths differ
•Worstcase path may be 2x longest path
•Rotations during insert/delete to rebalance
•Balancing is reactive (fix after breaking)

B-tree

•ALL leaves at exactly the same depth
•Search time is exactly O(log n) for ALL searches
•Every search traverses identical number of levels
•Splits and merges maintain balance inherently
•Balancing is proactive (structure prevents imbalance)

How B-trees Maintain Perfect Balance

The key insight is that B-trees grow and shrink at the root, not at the leaves:

Tree Growth (Height Increase):

Insertions happen at leaf level
If leaf overflows, it splits; median goes to parent
If parent overflows, it splits; median goes to grandparent
This propagates up to root
If root overflows, it splits; a new root is created with one key
Result: ALL leaves are now one level deeper

Tree Shrinkage (Height Decrease):

Deletions happen at leaf level (after swapping if necessary)
If leaf underflows, merge or redistribute
Merge propagates up if parent underflows
If root has only one child after merge, that child becomes new root
Result: ALL leaves are now one level shallower

The Invariant Maintained:

For all leaf nodes L₁, L₂: depth(L₁) = depth(L₂) = h
where h is the tree height

Predictable Performance

Properties Working Together

The six properties we've discussed don't exist in isolation—they work together as a cohesive system that guarantees B-tree performance. Let's see how they interact.

Property Interactions
Property	Primary Purpose	Works With	Guarantees
Bounded Node Size	Limit max size	Minimum Occupancy	Node fits in disk block
Minimum Occupancy	Prevent sparse nodes	Bounded Size, Leaf Balance	50%+ utilization, height bound
Key-Child Relationship	Structure integrity	Key Ordering, Subtree Ordering	Correct navigation
Key Ordering	Efficient in-node search	Subtree Ordering	Binary search within node
Subtree Ordering	Global sorted order	All other properties	Correct search, range queries
Perfect Leaf Balance	Uniform depth	Min Occupancy, Bounded Size	Predictable O(log n) height

The Height Guarantee

Combining minimum occupancy and leaf balance, we can derive the height bound:

Each internal node has at least ⌈m/2⌉ children
With n keys, height h satisfies: n + 1 ≤ 2 × (⌈m/2⌉)^h
Solving: h ≤ log_{⌈m/2⌉}((n+1)/2)
For large m: h ≈ log_m(n)

The Space Guarantee

Combining all properties:

Maximum space: n keys stored in at most n + n/(⌈m/2⌉-1) + ... ≈ n × (1 + 2/(m-2)) nodes
Minimum utilization: ~50% of allocated node space contains data
Wastage bounded: at most 50% overhead regardless of insertion/deletion pattern

The Time Guarantee

All operations (search, insert, delete):

Traverse at most h levels: O(log_m n) disk accesses
At each level, O(log m) comparisons for in-node search
Total comparisons: O(log_m n × log m) = O(log n)
Dominant cost: O(log_m n) disk I/O

The Complete Picture

Summary: B-tree Properties

We've explored each B-tree property in depth. Let's consolidate:

Key Takeaways

•Bounded node size ensures nodes fit in disk blocks, enabling single-I/O node access.
•Minimum occupancy guarantees 50%+ utilization and bounds tree height through sufficient fanout.
•Key-child relationship (k-1 keys for k children) establishes the separator-pointer structure.
•Key ordering within nodes enables efficient O(log m) binary search.
•Subtree ordering extends ordering globally, making correct search possible.
•Perfect leaf balance guarantees all searches traverse exactly h levels—no variance.
•Properties work together to provide provable O(log_m n) I/O, 50%+ utilization, and correct semantics.

What's Next

Page Complete

2 / 5