Index Terminology - Learning Module

Loading content...

0/241

Internal Nodes: The Navigation Layer

Guides Without Data

In a B+-tree, internal nodes serve a singular purpose: to direct searches toward the correct leaf node as efficiently as possible. They contain no actual data—only keys (separators) and pointers that form a roadmap through the tree.

This design philosophy—separating navigation from data storage—is fundamental to B+-tree efficiency. Internal nodes are small, few in number, and almost always resident in the buffer cache. They're the 'fast path' that makes billion-row lookups complete in mere milliseconds.

Understanding internal nodes reveals why B+-trees achieve such high fanout, how tree traversal actually works, and why the root and upper levels can be treated as essentially 'free' in I/O cost calculations.

What You Will Learn

By the end of this page, you will understand the structure of internal nodes, how separator keys guide searches, why internal nodes are almost always cached, and the relationship between internal node fanout and tree efficiency. You'll be able to reason about tree traversal at a structural level.

Defining Internal Nodes: The Tree's Backbone

Internal nodes (also called inner nodes, branch nodes, or directory nodes) are all non-leaf nodes in the B+-tree. They form the hierarchical structure that enables logarithmic-time search.

Formal Definition:

An internal node is a tree node that contains separator keys and child pointers. Each separator key K divides subsets of values: keys less than K are in the left subtree, keys greater than or equal to K are in the right subtree.

Key Properties:

Internal Node Characteristics

•Navigation only — Internal nodes exist solely to direct searches. They contain no row data, only routing information.
•Separator keys — Keys in internal nodes are 'guide posts' marking boundaries between child subtrees. They may or may not exist in the actual data.
•Child pointers — Each internal node has n keys and n+1 child pointers, where each pointer leads to a subtree containing keys in a specific range.
•High fanout — Without data payload, internal nodes pack many keys and pointers per page, maximizing the branching factor.
•Usually cached — Their small total count and frequent access means internal nodes reside in buffer cache almost permanently.

Internal Node Terminology
Term	Definition	Significance
Root Node	The single internal node at tree top	Every search starts here; always cached
Branch Node	Any internal node between root and leaves	Guides descent; high fanout
Separator Key	Key value marking child subtree boundary	Enables binary search within node
Child Pointer	Reference to next-level node	Typically 6-8 bytes per pointer
Fence Keys	Lowest and highest keys in subtree	Some implementations store for validation

Anatomy of an Internal Node

Internal nodes have a simpler structure than leaves—no sibling links, no data payload, just keys and pointers arranged for efficient binary search.

Internal Node Page Structure

Structure

┌──────────────────────────────────────────────────────────────┐
│                       PAGE HEADER                            │
│  ┌─────────────┬─────────────┬─────────────┬───────────────┐ │
│  │ Page ID     │ Page Type   │ Entry Count │ Level         │ │
│  │ (8 bytes)   │ (INTERNAL)  │ (2 bytes)   │ (distance to  │ │
│  │             │             │             │  leaves)      │ │
│  └─────────────┴─────────────┴─────────────┴───────────────┘ │
├──────────────────────────────────────────────────────────────┤
│                     KEY-POINTER PAIRS                        │
│                                                              │
│  ┌─────────┐   ┌─────────┐   ┌─────────┐       ┌─────────┐  │
│  │  Ptr₀   │   │  Ptr₁   │   │  Ptr₂   │  ...  │  Ptrₙ   │  │
│  │(6-8 B)  │   │(6-8 B)  │   │(6-8 B)  │       │(6-8 B)  │  │
│  └────┬────┘   └────┬────┘   └────┬────┘       └────┬────┘  │
│       │             │             │                  │       │
│       ▼             ▼             ▼                  ▼       │
│   Keys < K₁    K₁ ≤ Keys < K₂  K₂ ≤ Keys < K₃   Keys ≥ Kₙ   │
│                                                              │
│       ┌─────────┐   ┌─────────┐       ┌─────────┐           │
│       │  Key₁   │   │  Key₂   │  ...  │  Keyₙ   │           │
│       │(4-var B)│   │(4-var B)│       │(4-var B)│           │
│       └─────────┘   └─────────┘       └─────────┘           │
│                                                              │
├──────────────────────────────────────────────────────────────┤
│                     ALTERNATIVE LAYOUT                       │
│  Some systems interleave: [Ptr₀|Key₁|Ptr₁|Key₂|Ptr₂|...]   │
│  Others store separately: [Ptr₀|Ptr₁|...|Ptrₙ][Key₁|...|Keyₙ]│
└──────────────────────────────────────────────────────────────┘
 
SEARCH ALGORITHM (within single internal node):
1. Binary search for smallest Key_i where Key_i > search_key
2. If found Key_i at position i: follow Ptr_{i-1}
3. If no such Key_i exists: follow Ptr_n (rightmost)
 
Example: Searching for 'M' with keys [D, H, P, T]
- M > D, M > H, M < P
- Follow Ptr₂ (child containing H ≤ keys < P)

n Keys, n+1 Pointers

A fundamental property: an internal node with n keys has n+1 child pointers. Each key acts as a 'separator' between two adjacent children. This means fanout (child count) = key count + 1. When we say a node has fanout 500, it holds 499 keys and 500 pointers.

Separator Keys: Guide Posts Through Data

The keys stored in internal nodes aren't arbitrary—they're carefully chosen separator keys that efficiently partition the key space.

Separator Key Properties:

Delimiting function — Key Kᵢ separates child subtree i-1 (containing keys < Kᵢ) from subtree i (containing keys ≥ Kᵢ)
Need not exist in data — Separators must correctly partition the key space but needn't be actual data values. Some systems use 'fence keys' or 'suffix truncation' to create minimal separators.
Sorted within node — Keys are maintained in sorted order to enable O(log fanout) binary search within each node.

How Separator Keys Are Chosen:

Separator Key Selection Strategies
Strategy	Description	Advantages
Copy-up	Copy first key of right child as separator	Simple, always valid separator
Push-up	Move first key of right child (removing from child)	Saves space; used in B-trees (not B+)
Truncated	Use shortest prefix distinguishing children	Higher fanout with long keys
Fence keys	Store min and max of each child subtree	Enables validation and optimization

Truncation for Higher Fanout:

Consider two adjacent leaf nodes with these key ranges:

Left child: ['algorithm', 'apple', 'application']
Right child: ['database', 'design', 'developer']

A naive separator would be 'database' (10 bytes). But 'd' (1 byte) is sufficient to distinguish any key in the left child from any in the right!

This suffix truncation can dramatically improve internal node fanout when indexing string columns:

Separator Type	Key Size	Fanout (8KB page)
Full key (avg 30 bytes)	30 bytes	~230
Truncated (avg 5 bytes)	5 bytes	~740

That's 3x higher fanout, potentially reducing tree height by one level.

PostgreSQL's Suffix Truncation

PostgreSQL 12+ implements B-tree suffix truncation automatically for text columns. The 'pivot tuple' (separator) stores only enough of the key to distinguish children. This is particularly effective for URLs, file paths, or any data with common prefixes, significantly reducing index size and improving performance.

Internal Node Fanout: Maximizing Branches

Because internal nodes contain no data payload, they achieve higher fanout than leaf nodes in most scenarios. This difference is crucial for tree efficiency.

Why Internal Fanout Is Higher

•No data payload — Internal nodes store only keys + pointers, no row data or row locators beyond the pointer itself.
•No sibling links — Space used for prev/next pointers in leaves is available for more keys.
•Separator truncation — Internal keys can be truncated to minimum distinguishing prefix, unlike leaf keys which may need full value for matching.
•No MVCC overhead — No transaction IDs or visibility information needed (though this depends on implementation).

Fanout Comparison: Internal vs Leaf Nodes
Scenario	Internal Node Fanout	Leaf Node Fanout	Ratio
INT key, non-clustered index	~800	~600	1.3x
BIGINT key, clustered index	~500	~100*	5x
VARCHAR(100) key, non-clustered	~250	~200	1.25x
VARCHAR(100) with truncation	~600	~200	3x
Composite 3-column key	~200	~150	1.3x

*Clustered index leaf fanout is low because leaves contain entire rows, not just keys.

Fanout Calculation for Internal Nodes:

Internal Fanout = ⌊(Page Size - Header) / (Key Size + Pointer Size)⌋ + 1

For 8KB page, 50B header, 8B key, 6B pointer:
Fanout = ⌊(8192 - 50) / (8 + 6)⌋ + 1 = 581 + 1 = 582

Why Internal Fanout Matters Most:

Tree height is determined by internal node fanout (since we traverse internal nodes during descent). Leaf fanout affects storage size and range scan performance but not descent cost. Therefore:

Optimize internal keys for fanout — Use truncation, prefer small types
Leaf keys can be fuller — They need complete values for matching anyway

This is why INCLUDE columns (PostgreSQL 11+, SQL Server) are stored only in leaves—they add query-covering ability without impacting internal fanout.

The Root Node: Special Case Internal Node

The root node is an internal node with special properties. It's the entry point for every index operation and receives unique treatment by the database system.

Root Node Properties:

Root Node Characteristics

•Single instance — Every B+-tree has exactly one root node. Its page ID is stored in index metadata.
•May be a leaf — For very small indexes, the root can also be the only leaf (height 1 tree). It behaves as both.
•Minimum fill exception — The root is exempt from minimum fill requirements. It can contain as few as 1 key (2 children).
•Split increases height — When the root splits, a new root is created above, incrementing tree height. This is the only way height increases.
•Always cached — The root is accessed by every operation. It's always resident in buffer pool, effectively 'pinned'.

Root Split: The Height Increase Event:

Tree height only increases when the root splits. This is the mechanics:

Before (root is full, height = 2):
        [Root: A-M-Z]
       /     |      \
     [Leaf] [Leaf] [Leaf]

After inserting 'P' causes root split (height = 3):
            [New Root: M]
           /            \
    [Old Root L: A-K]  [Old Root R: M-P-Z]
     /    |    \          /    |    \
  [Leaf][Leaf][Leaf]  [Leaf][Leaf][Leaf]

Implications:

Height increases are rare — Requires filling root (fanout entries) then splitting. With fanout 500, root splits only after 500+ children exist at level 2.
Height never decreases during operations — Even if data is deleted, empty levels persist. Only REINDEX restores optimal height.
Root-only trees are common — Small tables fit in one page, making root = only leaf. This single-page index has O(1) lookup!

Root Page Location

The root page ID is stored in index metadata (system catalogs). When queries access the index, they first read metadata to find the root, then begin descent. Since both metadata and root are always cached, this adds no I/O to the operation.

Internal Node Caching: The Free Lookup

One of the most important practical properties of internal nodes is their cache behavior. Understanding this explains why indexes are faster than raw I/O arithmetic suggests.

Internal Node Caching Analysis (100M row table, fanout 500)
Tree Level	Node Count	Total Size	Access Frequency	Cache Status
Level 1 (Root)	1	8 KB	Every query	100% cached (pinned)
Level 2	500	4 MB	~Every query	99%+ cached
Level 3	250,000	2 GB	Hit by subset of queries	Partially cached
Leaves	125M	~1 TB	Each query hits 1-few	Rarely cached

Why Internal Nodes Stay Cached:

Small total size — With fanout 500, internal nodes up to level 3 total just ~2GB for a 100-million-row table. This easily fits in a modest buffer pool.
Frequent access — Every index lookup traverses the same upper levels. Even with key skew, the root and level-2 nodes are hit by virtually all queries.
LRU behavior — Frequently accessed pages remain in cache. Internal nodes are the most frequently accessed pages in the entire database.
Explicit pinning — Some databases pin root pages in memory, never evicting them.

Effective I/O Cost:

For a height-4 B+-tree with well-cached internal nodes:

Theoretical I/O	4 pages (root → leaf)
Actual I/O	1-2 pages (typically just leaf)
Why	Levels 1-2 cached, level 3 often cached

This is why database performance often appears better than worst-case analysis predicts.

Buffer Pool Sizing Rule

A practical rule: ensure your buffer pool is large enough to hold all internal nodes of your most critical indexes, plus working set of leaf pages. For most workloads, this means sizing buffer pool to hold:

• All internal nodes (~0.1-1% of total index size) • Hot leaf pages (varies by access pattern) • Data pages for covering queries

Pg_statio_user_indexes and pg_buffercache (PostgreSQL) or sys.dm_os_buffer_descriptors (SQL Server) can show actual cache residency.

Internal Node Traversal: Search Path Mechanics

Understanding how searches actually traverse internal nodes reveals the elegance of B+-tree design and enables reasoning about access patterns.

B+-Tree Search Algorithm

Pseudocode

FUNCTION b_plus_tree_search(tree, search_key):
    // Start at root
    current_node = tree.root_page
    
    // Descend through internal nodes
    WHILE current_node.type == INTERNAL:
        // Binary search within node for correct child
        child_index = binary_search_for_child(current_node, search_key)
        // Follow pointer to next level
        current_node = current_node.children[child_index]
    
    // Now at leaf level - linear or binary search for key
    RETURN leaf_search(current_node, search_key)
 
 
FUNCTION binary_search_for_child(internal_node, search_key):
    // Find smallest i where key[i] > search_key
    // If no such i, return rightmost child
    
    keys = internal_node.keys  // [K₁, K₂, ..., Kₙ]
    low, high = 0, keys.length - 1
    
    WHILE low <= high:
        mid = (low + high) / 2
        IF keys[mid] > search_key:
            high = mid - 1
        ELSE:
            low = mid + 1
    
    // 'low' is now the index of first key > search_key
    // Return that child pointer index
    RETURN low  // children[low] contains keys in range [K_{low-1}, K_{low})
 
 
// Example trace: Search for key 'M' in tree with root keys [D, H, P, T]
// 
// Iteration 1: mid=1 (H), M > H → low=2
// Iteration 2: mid=2 (P), M < P → high=1  
// Iteration 3: low > high, exit
// Result: low=2, follow children[2] (contains keys H ≤ k < P)

Traversal Cost Analysis:

Operation	CPU Cost	I/O Cost
Binary search in node	O(log₂ fanout) = ~9 comparisons	0 (node already in memory)
Page fetch	O(1)	0-1 (cached or not)
Per-level total	~9 key comparisons	0-1 page reads
Full traversal	O(h × log₂ F) = ~36 comparisons	0-h page reads

For typical trees (h=4, F=500):

CPU: ~36 key comparisons total
I/O: 1-2 page reads (internal nodes cached)

Comparison Operations:

Key comparison is the CPU-bound aspect of traversal. For:

INT keys: Direct comparison, nanoseconds
VARCHAR keys: Byte-by-byte comparison, may involve collation
Composite keys: Compare first column, then second if equal, etc.

This is another reason to prefer simple key types—not just for fanout, but for comparison speed during traversal.

SIMD Optimization

Modern databases are beginning to use SIMD (Single Instruction Multiple Data) instructions to parallelize key comparisons within internal nodes. Instead of binary search visiting keys sequentially, SIMD can compare against 8+ keys simultaneously, reducing the ~9 comparisons per node to ~2-3. This optimization is transparent but explains why newer database versions show improved point query performance.

Summary: Mastering Internal Nodes

Internal nodes are the silent workhorses of B+-tree indexes — pure navigation structures that guide every search with minimal overhead. Their design enables the remarkable efficiency that makes B+-trees the backbone of database indexing.

Key Takeaways

•Navigation only, no data — Internal nodes contain only separator keys and child pointers, maximizing fanout by excluding data payload.
•Separator keys partition key space — Each key divides children into left (less than) and right (greater-equal) subtrees, enabling binary search.
•Higher fanout than leaves — Without data and with truncated separators, internal nodes achieve 1.3-5x higher fanout than leaf nodes.
•Root is special — Exempt from minimum fill, always cached, and its split is the only event that increases tree height.
•Caching makes them 'free' — Internal nodes are small, few, and frequently accessed, making them essentially always cached. Actual I/O is mostly leaf access.
•Binary search per level — Each internal node is searched with O(log₂ fanout) comparisons, contributing ~9 comparisons per level traversed.

What's Next:

With internal nodes understood, we'll explore performance metrics — the measurements and calculations that quantify index efficiency. You'll learn how to analyze I/O costs, estimate search times, and compare index configurations using concrete metrics.

Page Complete

You now understand internal nodes — the navigation layer that makes B+-tree searches efficient. This knowledge enables you to reason about tree traversal, understand caching behavior, and appreciate why B+-trees achieve such remarkable performance. Next, we examine the performance metrics that quantify index efficiency.