Database Management SystemsB+-Tree Structure

B+-Tree Structure

LevelIntermediate

Duration60 mins

TopicB+-Tree Structure

4 / 5

Internal Nodes

The Navigation Layer

If leaf nodes are the treasure vaults of a B+-tree—containing all the actual data—then internal nodes are the signposts and highways that guide queries to the right vault. Internal nodes contain no data themselves; their sole purpose is to route searches efficiently from the root to the appropriate leaf.

This navigation-only design might seem wasteful—why have nodes that don't store data? But this specialization is precisely what enables B+-trees' exceptional performance. By freeing internal nodes from data storage responsibilities, we can optimize them purely for navigation speed, maximizing fanout and minimizing tree height.

What You Will Learn

By the end of this page, you will understand the detailed structure of internal nodes, how they guide searches, how they're maintained during tree modifications, and the sophisticated optimizations real databases apply to maximize their efficiency.

Internal Node Structure

An internal node in a B+-tree contains keys and child pointers. The keys serve as separators, defining the boundaries between subtrees. The pointers indicate which child to visit for a given search key.

Physical Layout:

┌────────────────────────────────────────────────────────────┐
│ Header │ P₀ │ K₁ │ P₁ │ K₂ │ P₂ │ ... │ Kₙ │ Pₙ │ Free    │
└────────────────────────────────────────────────────────────┘

Where:

Header: Page metadata (type, entry count, etc.)
P₀: Pointer to leftmost child (keys < K₁)
Kᵢ: Separator key
Pᵢ: Pointer to child for keys ≥ Kᵢ and < Kᵢ₊₁
Pₙ: Pointer to rightmost child (keys ≥ Kₙ)

Note the asymmetry: there's always one more pointer than keys. If a node has n keys, it has n+1 child pointers.

Internal Node Components
Component	Size	Description
Page Header	16-64 bytes	Page ID, type flag, entry count, LSN
Leftmost Pointer (P₀)	8 bytes	Points to children with keys < first separator
Separator Key (Kᵢ)	Variable	Divides key space between Pᵢ₋₁ and Pᵢ children
Child Pointer (Pᵢ)	8 bytes	Points to child containing keys ≥ Kᵢ
High Key (optional)	Variable	Max key in subtree (for B-link trees)
Right Sibling (optional)	8 bytes	Link to right sibling (for B-link trees)

Keys Are Separators, Not Data

Unlike in B-trees where internal node keys correspond to actual data records, B+-tree internal keys are purely navigational. They need only be sufficient to correctly route searches—they don't need to match any specific data record exactly.

Search Routing Mechanics

When a search reaches an internal node, it must determine which child pointer to follow. This decision is made by comparing the search key against the separator keys.

Routing Algorithm:

find_child(internal_node, search_key):
    keys = internal_node.keys         // [K₁, K₂, ..., Kₙ]
    pointers = internal_node.pointers // [P₀, P₁, P₂, ..., Pₙ]
    
    // Binary search for first key > search_key
    i = binary_search_upper(keys, search_key)
    
    // Follow pointer Pᵢ
    return pointers[i]

Detailed Semantics:

Given keys [K₁, K₂, K₃] and pointers [P₀, P₁, P₂, P₃]:

If search_key is...	Follow pointer...	Why
< K₁	P₀	Key is less than all separators
≥ K₁ and < K₂	P₁	Key is in K₁'s subtree
≥ K₂ and < K₃	P₂	Key is in K₂'s subtree
≥ K₃	P₃	Key is greater than all separators

Binary Search Within Nodes:

With potentially hundreds of keys per internal node, linear search would be inefficient. Internal node search uses binary search for O(log m) comparisons, where m is the number of keys.

Example: An internal node with 200 keys requires at most 8 comparisons (log₂ 200 ≈ 7.6).

SIMD Optimization:

Modern databases use SIMD (Single Instruction, Multiple Data) instructions to search internal nodes even faster:

Load 4-8 keys simultaneously into vector registers
Compare against search key in parallel
Identify correct slot without sequential comparisons

This reduces internal node search time to near-constant, making tree traversal dominated by I/O rather than CPU.

Comparison Semantics Vary

Different implementations use slightly different comparison semantics (< vs ≤). What matters is consistency: the tree build and search must use the same convention. The splitting algorithm must ensure keys end up in the subtree where search will find them.

How Internal Nodes Are Built

Internal nodes are created and modified through splits and merges. Understanding these operations clarifies how the tree structure evolves.

Initial State:

When a B+-tree is first created, it consists of a single leaf node (which is also the root). The first internal node appears when this root leaf splits.

Leaf Split Creates Internal Structure:

Before (single root leaf):
        [10, 20, 30, 40]  ← Root and only leaf

Insert key 25 causes split:

                [30]         ← New internal root
               /    \
        [10, 20, 25]  [30, 40]  ← Two leaf nodes

Step by Step:

Leaf [10, 20, 30, 40] is full; inserting 25 causes split
Split produces two leaves: [10, 20, 25] and [30, 40]
Separator key 30 (first key of right leaf) is copied up
New internal node is created as root, containing [30]
New root's pointers: P₀ → left leaf, P₁ → right leaf

Internal Node Split:

When an internal node overflows (too many keys after a child split), it too must split:

Before:
        [30, 50, 70]         ← Internal node is full
       /   |    |    \
     ...  ...  ...  ...      ← Children

Child split adds key 60:

        [30, 50, 60, 70]     ← Now overflows
       /   |    |   |   \
     ...  ... ... ...  ...   ← Children

Internal split:

              [50]            ← Middle key moves up to parent
             /    \
        [30]      [60, 70]    ← Two internal nodes
       /   \     /   |   \
     ...  ... ...  ...  ...   ← Children redistributed

Key Difference: In internal node splits, the middle key moves to the parent (unlike leaf splits where it's copied). This maintains the invariant that each key appears exactly once in internal nodes.

Copy vs. Move Semantics

Leaf splits: separator key is COPIED up (key remains in right leaf). Internal splits: separator key MOVES up (key is removed from split node). This distinction is fundamental and a common source of implementation bugs.

Internal Node Merging

When deletions cause an internal node to fall below minimum occupancy, it must either borrow keys from a sibling or merge with a sibling.

Redistribution (Borrowing):

Before: Left sibling has spare keys

              [40]
             /    \
    [10, 20, 30]  [50]  ← Right child underfull

After redistribution:

              [30]
             /    \
      [10, 20]   [40, 50]  ← Balanced

Mechanism:

Move separator key 40 from parent into underfull child
Move key 30 from left sibling up to parent as new separator
Adjust child pointers accordingly

Merging:

When a sibling doesn't have spare keys to share, merge two internal nodes:

Before: Both siblings at minimum

                    [60]
                   /    \
            [30, 40]    [70]  ← Right child underfull
           /   |   \    /  \
         ...  ... ... ... ...  ← Children

After merge:

            [30, 40, 60, 70]  ← Combined node
           /   |   |   |   \
         ...  ... ... ... ...  ← All children together

Mechanism:

Pull separator key 60 down from parent
Combine keys: [30, 40] + [60] + [70] = [30, 40, 60, 70]
Combine all child pointers
Deallocate empty sibling
Parent now has one fewer key (may cascade upward)

Cascading to Root

Merges can cascade upward. If the root's last two children merge, the root becomes empty and is deleted, reducing tree height. This is the only way a B+-tree shrinks in height—when the root has exactly two children that merge.

Separator Key Optimization

Since internal node keys are separators (not data), they can be optimized in ways that actual data keys cannot. These optimizations increase fanout and reduce I/O.

Prefix Compression (Trump Compression):

Consider adjacent keys in an internal node:

Left child contains keys ending with "Johnson"
Right child contains keys starting with "Jones"

The separator doesn't need to be "Jones"—it only needs to distinguish the ranges:

"Jon" works (greater than "Johnson", less than or equal to "Jones")
"Jonε" ("Jon" + smallest possible suffix) works even better

Space Savings:

Full Key	Compressed Separator	Savings
"Christopher"	"Chr"	76%
"2024-01-15T14:30:00Z"	"2024-01-15T14"	30%
"category:electronics:phones:apple"	"category:electronics:p"	40%

Suffix Truncation:

For composite or long keys, only enough prefix to separate ranges is stored:

Leaf 1 ends with:   key = "database_systems_introduction"
Leaf 2 starts with: key = "database_systems_query_optimization"

Full separator: "database_systems_query_optimization"
Truncated:      "database_systems_q"
                 ──────────────────
Savings: 19 bytes (49% reduction)

Minimum Separator:

The optimal separator is the lexicographically smallest string greater than the max key of the left child:

min_separator(left_max, right_min):
    # Find shortest string S where left_max < S ≤ right_min
    for length from 1 to len(right_min):
        candidate = right_min[0:length]
        if left_max < candidate:
            return candidate
    return right_min  # Fallback to full key

This optimization can dramatically increase internal node fanout for string keys.

When Compression Matters Most

Separator compression is most valuable when keys are long strings with common prefixes (URLs, file paths, compound keys). For integer keys, there's usually nothing to compress, so these optimizations don't apply.

Internal Node Caching Strategies

Because internal nodes are accessed on every search, they're prime candidates for caching. Databases employ several strategies to maximize internal node cache hits.

Caching Strategies

•Pinned Root: The root node is permanently kept in memory—it's accessed on every query, so caching it is always beneficial.
•Priority Caching: Internal nodes get higher priority in buffer pool; they're evicted only when necessary and after leaf nodes.
•Dedicated Cache: Some systems maintain a separate cache just for internal nodes, ensuring leaves don't compete with them for memory.
•Prefetch Upper Levels: When the tree is first accessed, proactively load the top N levels into cache.
•Memory-Mapped Files: Internal nodes may be memory-mapped rather than buffer-managed, relying on OS virtual memory.

Analysis: How Much Memory for Internal Nodes?

For a B+-tree of order 500 with 1 billion records:

Level	Nodes	Size (8KB pages)	Cumulative
0 (root)	1	8 KB	8 KB
1	~500	4 MB	4 MB
2	~250,000	2 GB	2 GB
3 (leaves)	~2,000,000	16 GB	18 GB

Key Insight: All internal nodes fit in ~2 GB of RAM. If your buffer pool is at least 2 GB, every search requires only one disk I/O (for the leaf). This explains why database administrators often recommend buffer pools larger than the internal node footprint.

Memory Sizing Rule of Thumb:

Internal nodes ≈ Total data size / Fanout ratio

For 1 TB data, fanout 500:
  ~1 TB / 500 = ~2 GB for internal nodes

Cloud Database Implications

Cloud databases often run on instances with 8-64 GB RAM but terabytes of data. Understanding internal node sizes helps choose the right instance: enough RAM to cache all internal nodes means each query costs exactly one I/O, dramatically improving performance.

Concurrency on Internal Nodes

Internal nodes present unique concurrency challenges. They're on the path of every query, making them potential bottlenecks. However, their stability (they change only during splits/merges) enables optimizations.

Challenges

•Root node accessed by every query
•Exclusive locks during splits block all operations
•Deep trees mean long lock chains
•High concurrency amplifies contention
•Hot keys create localized bottlenecks

Solutions

•Latch coupling (release parent early)
•Optimistic traversal (retry on split)
•Right-link traversal (B-link trees)
•Split preview (detect full nodes early)
•Lock-free readers (versioned updates)

Latch Coupling (Crabbing):

The classic protocol for safe traversal:

traverse_with_crabbing(root, key):
    current = acquire_read_latch(root)
    
    while current is not leaf:
        child_id = find_child(current, key)
        child = acquire_read_latch(child_id)
        release_latch(current)  # Safe: child is latched
        current = child
    
    return current  # Still holding latch

By releasing the parent latch once the child is acquired, we minimize contention. At any moment, a query holds at most 2 latches.

Optimistic Latching:

For even better concurrency:

optimistic_traverse(root, key):
    path = []
    current = root
    
    # Traverse without latches
    while current is not leaf:
        path.append(current)
        current = find_child(current, key)
    
    # Now acquire necessary latches
    # If any node changed (version mismatch), restart

This works because internal nodes rarely change—most traversals succeed without restarts.

Why Internal Nodes Enable Optimistic Methods

Internal nodes change only during splits/merges—rare compared to leaf modifications. This low change frequency makes optimistic concurrency practical: readers rarely encounter mid-split nodes, so restarts are infrequent.

Summary: Internal Nodes

Internal nodes are the navigation backbone of B+-trees. Their design—pure routing with no data—enables the tree's exceptional performance. Let's consolidate the key insights:

Key Takeaways

•Structure: Keys as separators, N keys yield N+1 child pointers, routing determines which subtree contains the search key
•Building: Created through splits; leaf splits copy keys up, internal splits move keys up
•Maintenance: Redistribution and merging maintain minimum occupancy; cascades may adjust height
•Optimization: Prefix compression and suffix truncation increase fanout for string keys
•Caching: Internal nodes are small enough to fit in RAM; pinning them eliminates most I/O
•Concurrency: Stability enables optimistic traversal and efficient latch coupling protocols

What's Next:

Having explored both leaf nodes and internal nodes in detail, we're now ready for a comprehensive comparison between B+-trees and standard B-trees. The final page of this module, Comparison with B-tree, synthesizes everything we've learned and provides a definitive analysis of why B+-trees dominate in database applications.

Page Complete

You now have a deep understanding of internal nodes: their structure, how they route searches, how they're created and maintained, and how they can be optimized for maximum performance. This knowledge, combined with your understanding of leaf nodes, provides a complete picture of B+-tree architecture. Next, we'll compare B+-trees with B-trees definitively.

4 / 5

Loading learning content...

Database Management SystemsB+-Tree Structure

B+-Tree Structure

LevelIntermediate

Duration60 mins

TopicB+-Tree Structure

4 / 5

Internal Nodes

The Navigation Layer

What You Will Learn

Internal Node Structure

Physical Layout:

┌────────────────────────────────────────────────────────────┐
│ Header │ P₀ │ K₁ │ P₁ │ K₂ │ P₂ │ ... │ Kₙ │ Pₙ │ Free    │
└────────────────────────────────────────────────────────────┘

Where:

Header: Page metadata (type, entry count, etc.)
P₀: Pointer to leftmost child (keys < K₁)
Kᵢ: Separator key
Pᵢ: Pointer to child for keys ≥ Kᵢ and < Kᵢ₊₁
Pₙ: Pointer to rightmost child (keys ≥ Kₙ)

Note the asymmetry: there's always one more pointer than keys. If a node has n keys, it has n+1 child pointers.

Internal Node Components
Component	Size	Description
Page Header	16-64 bytes	Page ID, type flag, entry count, LSN
Leftmost Pointer (P₀)	8 bytes	Points to children with keys < first separator
Separator Key (Kᵢ)	Variable	Divides key space between Pᵢ₋₁ and Pᵢ children
Child Pointer (Pᵢ)	8 bytes	Points to child containing keys ≥ Kᵢ
High Key (optional)	Variable	Max key in subtree (for B-link trees)
Right Sibling (optional)	8 bytes	Link to right sibling (for B-link trees)

Keys Are Separators, Not Data

Search Routing Mechanics

When a search reaches an internal node, it must determine which child pointer to follow. This decision is made by comparing the search key against the separator keys.

Routing Algorithm:

find_child(internal_node, search_key):
    keys = internal_node.keys         // [K₁, K₂, ..., Kₙ]
    pointers = internal_node.pointers // [P₀, P₁, P₂, ..., Pₙ]
    
    // Binary search for first key > search_key
    i = binary_search_upper(keys, search_key)
    
    // Follow pointer Pᵢ
    return pointers[i]

Detailed Semantics:

Given keys [K₁, K₂, K₃] and pointers [P₀, P₁, P₂, P₃]:

If search_key is...	Follow pointer...	Why
< K₁	P₀	Key is less than all separators
≥ K₁ and < K₂	P₁	Key is in K₁'s subtree
≥ K₂ and < K₃	P₂	Key is in K₂'s subtree
≥ K₃	P₃	Key is greater than all separators

Binary Search Within Nodes:

With potentially hundreds of keys per internal node, linear search would be inefficient. Internal node search uses binary search for O(log m) comparisons, where m is the number of keys.

Example: An internal node with 200 keys requires at most 8 comparisons (log₂ 200 ≈ 7.6).

SIMD Optimization:

Modern databases use SIMD (Single Instruction, Multiple Data) instructions to search internal nodes even faster:

Load 4-8 keys simultaneously into vector registers
Compare against search key in parallel
Identify correct slot without sequential comparisons

This reduces internal node search time to near-constant, making tree traversal dominated by I/O rather than CPU.

Comparison Semantics Vary

How Internal Nodes Are Built

Internal nodes are created and modified through splits and merges. Understanding these operations clarifies how the tree structure evolves.

Initial State:

When a B+-tree is first created, it consists of a single leaf node (which is also the root). The first internal node appears when this root leaf splits.

Leaf Split Creates Internal Structure:

Before (single root leaf):
        [10, 20, 30, 40]  ← Root and only leaf

Insert key 25 causes split:

                [30]         ← New internal root
               /    \
        [10, 20, 25]  [30, 40]  ← Two leaf nodes

Step by Step:

Leaf [10, 20, 30, 40] is full; inserting 25 causes split
Split produces two leaves: [10, 20, 25] and [30, 40]
Separator key 30 (first key of right leaf) is copied up
New internal node is created as root, containing [30]
New root's pointers: P₀ → left leaf, P₁ → right leaf

Internal Node Split:

When an internal node overflows (too many keys after a child split), it too must split:

Before:
        [30, 50, 70]         ← Internal node is full
       /   |    |    \
     ...  ...  ...  ...      ← Children

Child split adds key 60:

        [30, 50, 60, 70]     ← Now overflows
       /   |    |   |   \
     ...  ... ... ...  ...   ← Children

Internal split:

              [50]            ← Middle key moves up to parent
             /    \
        [30]      [60, 70]    ← Two internal nodes
       /   \     /   |   \
     ...  ... ...  ...  ...   ← Children redistributed

Copy vs. Move Semantics

Internal Node Merging

When deletions cause an internal node to fall below minimum occupancy, it must either borrow keys from a sibling or merge with a sibling.

Redistribution (Borrowing):

Before: Left sibling has spare keys

              [40]
             /    \
    [10, 20, 30]  [50]  ← Right child underfull

After redistribution:

              [30]
             /    \
      [10, 20]   [40, 50]  ← Balanced

Mechanism:

Move separator key 40 from parent into underfull child
Move key 30 from left sibling up to parent as new separator
Adjust child pointers accordingly

Merging:

When a sibling doesn't have spare keys to share, merge two internal nodes:

Before: Both siblings at minimum

                    [60]
                   /    \
            [30, 40]    [70]  ← Right child underfull
           /   |   \    /  \
         ...  ... ... ... ...  ← Children

After merge:

            [30, 40, 60, 70]  ← Combined node
           /   |   |   |   \
         ...  ... ... ... ...  ← All children together

Mechanism:

Pull separator key 60 down from parent
Combine keys: [30, 40] + [60] + [70] = [30, 40, 60, 70]
Combine all child pointers
Deallocate empty sibling
Parent now has one fewer key (may cascade upward)

Cascading to Root

Separator Key Optimization

Since internal node keys are separators (not data), they can be optimized in ways that actual data keys cannot. These optimizations increase fanout and reduce I/O.

Prefix Compression (Trump Compression):

Consider adjacent keys in an internal node:

Left child contains keys ending with "Johnson"
Right child contains keys starting with "Jones"

The separator doesn't need to be "Jones"—it only needs to distinguish the ranges:

"Jon" works (greater than "Johnson", less than or equal to "Jones")
"Jonε" ("Jon" + smallest possible suffix) works even better

Space Savings:

Full Key	Compressed Separator	Savings
"Christopher"	"Chr"	76%
"2024-01-15T14:30:00Z"	"2024-01-15T14"	30%
"category:electronics:phones:apple"	"category:electronics:p"	40%

Suffix Truncation:

For composite or long keys, only enough prefix to separate ranges is stored:

Leaf 1 ends with:   key = "database_systems_introduction"
Leaf 2 starts with: key = "database_systems_query_optimization"

Full separator: "database_systems_query_optimization"
Truncated:      "database_systems_q"
                 ──────────────────
Savings: 19 bytes (49% reduction)

Minimum Separator:

The optimal separator is the lexicographically smallest string greater than the max key of the left child:

min_separator(left_max, right_min):
    # Find shortest string S where left_max < S ≤ right_min
    for length from 1 to len(right_min):
        candidate = right_min[0:length]
        if left_max < candidate:
            return candidate
    return right_min  # Fallback to full key

This optimization can dramatically increase internal node fanout for string keys.

When Compression Matters Most

Internal Node Caching Strategies

Because internal nodes are accessed on every search, they're prime candidates for caching. Databases employ several strategies to maximize internal node cache hits.

Caching Strategies

•Pinned Root: The root node is permanently kept in memory—it's accessed on every query, so caching it is always beneficial.
•Priority Caching: Internal nodes get higher priority in buffer pool; they're evicted only when necessary and after leaf nodes.
•Dedicated Cache: Some systems maintain a separate cache just for internal nodes, ensuring leaves don't compete with them for memory.
•Prefetch Upper Levels: When the tree is first accessed, proactively load the top N levels into cache.
•Memory-Mapped Files: Internal nodes may be memory-mapped rather than buffer-managed, relying on OS virtual memory.

Analysis: How Much Memory for Internal Nodes?

For a B+-tree of order 500 with 1 billion records:

Level	Nodes	Size (8KB pages)	Cumulative
0 (root)	1	8 KB	8 KB
1	~500	4 MB	4 MB
2	~250,000	2 GB	2 GB
3 (leaves)	~2,000,000	16 GB	18 GB

Memory Sizing Rule of Thumb:

Internal nodes ≈ Total data size / Fanout ratio

For 1 TB data, fanout 500:
  ~1 TB / 500 = ~2 GB for internal nodes

Cloud Database Implications

Concurrency on Internal Nodes

Challenges

•Root node accessed by every query
•Exclusive locks during splits block all operations
•Deep trees mean long lock chains
•High concurrency amplifies contention
•Hot keys create localized bottlenecks

Solutions

•Latch coupling (release parent early)
•Optimistic traversal (retry on split)
•Right-link traversal (B-link trees)
•Split preview (detect full nodes early)
•Lock-free readers (versioned updates)

Latch Coupling (Crabbing):

The classic protocol for safe traversal:

traverse_with_crabbing(root, key):
    current = acquire_read_latch(root)
    
    while current is not leaf:
        child_id = find_child(current, key)
        child = acquire_read_latch(child_id)
        release_latch(current)  # Safe: child is latched
        current = child
    
    return current  # Still holding latch

By releasing the parent latch once the child is acquired, we minimize contention. At any moment, a query holds at most 2 latches.

Optimistic Latching:

For even better concurrency:

optimistic_traverse(root, key):
    path = []
    current = root
    
    # Traverse without latches
    while current is not leaf:
        path.append(current)
        current = find_child(current, key)
    
    # Now acquire necessary latches
    # If any node changed (version mismatch), restart

This works because internal nodes rarely change—most traversals succeed without restarts.

Why Internal Nodes Enable Optimistic Methods

Summary: Internal Nodes

Internal nodes are the navigation backbone of B+-trees. Their design—pure routing with no data—enables the tree's exceptional performance. Let's consolidate the key insights:

Key Takeaways

•Structure: Keys as separators, N keys yield N+1 child pointers, routing determines which subtree contains the search key
•Building: Created through splits; leaf splits copy keys up, internal splits move keys up
•Maintenance: Redistribution and merging maintain minimum occupancy; cascades may adjust height
•Optimization: Prefix compression and suffix truncation increase fanout for string keys
•Caching: Internal nodes are small enough to fit in RAM; pinning them eliminates most I/O
•Concurrency: Stability enables optimistic traversal and efficient latch coupling protocols

What's Next:

Page Complete

4 / 5