Index Terminology - Learning Module

Loading content...

0/241

Leaf Nodes: Where Data Meets Structure

The Foundation of Index Data Access

While internal nodes guide searches through the tree, leaf nodes are where the journey ends. Every index lookup ultimately arrives at a leaf node to retrieve the data—or the pointer to the data—that satisfies the query.

Leaf nodes are special in multiple ways: they contain all the indexed values (not just separator keys), they're linked together for efficient range scans, and their structure differs significantly between clustered and non-clustered indexes. Understanding leaf nodes transforms your ability to reason about query performance, especially for range queries and covering indexes.

This page explores the anatomy of leaf nodes, their linking mechanism, how they interact with table data, and why their design makes B+-trees superior for database workloads compared to vanilla B-trees.

What You Will Learn

By the end of this page, you will understand the internal structure of leaf nodes, how the leaf-to-leaf linked list enables range scans, the difference between leaf content in clustered vs non-clustered indexes, and how leaf nodes impact query performance patterns.

Defining Leaf Nodes: The Bottom of the Tree

In a B+-tree, leaf nodes occupy the lowest (deepest) level of the tree. Unlike internal nodes, which serve purely navigational purposes, leaf nodes contain the actual indexed data entries.

Formal Definition:

A leaf node is a tree node that has no child pointers to lower levels. It contains key-value pairs where the key is the indexed column(s) and the value is either the actual row data (clustered index) or a pointer to the row location (non-clustered index).

Key Properties of Leaf Nodes:

Leaf Node Characteristics

•No child pointers — Leaf nodes do not point to lower levels (there are none). Space that would hold child pointers in internal nodes is used for data.
•All keys present — Every indexed value in the table appears in exactly one leaf node. Internal nodes only contain separator keys; leaves have all keys.
•Sibling pointers — Leaf nodes contain pointers to their left and right sibling leaves, forming a doubly-linked list for sequential access.
•Same depth — All leaf nodes exist at exactly the same level (B+-trees are perfectly balanced), ensuring consistent access time.
•Majority of index size — Leaves typically comprise 99%+ of total index pages since the tree fans out exponentially.

Leaf Nodes vs Internal Nodes
Property	Leaf Nodes	Internal Nodes
Purpose	Store data entries	Guide navigation
Contains all keys?	Yes	No (separator keys only)
Child pointers	None	Yes (to next level)
Sibling pointers	Both prev and next	Typically none
Data/payload	Row data or row pointers	None
Percentage of pages	~99%+	~1% or less
Caching priority	Lower (too numerous)	Higher (small, reused)

Anatomy of a Leaf Node: Internal Structure

A leaf node page has a specific internal layout optimized for both direct key lookups and sequential scans. Understanding this structure reveals how databases achieve efficient data access.

Typical Leaf Node Layout:

Leaf Node Page Structure

Structure

┌─────────────────────────────────────────────────────────────┐
│                      PAGE HEADER                            │
│  ┌─────────────┬─────────────┬─────────────┬──────────────┐ │
│  │ Page ID     │ Page Type   │ Item Count  │ Free Space   │ │
│  │ (8 bytes)   │ (LEAF)      │ (2 bytes)   │ Offset       │ │
│  └─────────────┴─────────────┴─────────────┴──────────────┘ │
├─────────────────────────────────────────────────────────────┤
│                   SIBLING POINTERS                          │
│  ┌───────────────────────┬─────────────────────────────────┐│
│  │ Previous Leaf Page ID │ Next Leaf Page ID              ││
│  │ (8 bytes)             │ (8 bytes)                      ││
│  └───────────────────────┴─────────────────────────────────┘│
├─────────────────────────────────────────────────────────────┤
│                    SLOT DIRECTORY                           │
│  ┌────────┬────────┬────────┬────────┬─────────────────────┐│
│  │ Slot 0 │ Slot 1 │ Slot 2 │  ...   │ (grows downward)    ││
│  │ offset │ offset │ offset │        │                     ││
│  └────────┴────────┴────────┴────────┴─────────────────────┘│
│                           │                                 │
│                           ▼ (offsets point into data area)  │
│                      FREE SPACE                             │
│                                                             │
│                           ▲ (data grows upward)             │
├─────────────────────────────────────────────────────────────┤
│                      DATA ENTRIES                           │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Entry N: [Key | Pointer/Data]                        │   │
│  ├──────────────────────────────────────────────────────┤   │
│  │ Entry N-1: [Key | Pointer/Data]                      │   │
│  ├──────────────────────────────────────────────────────┤   │
│  │ ...                                                  │   │
│  ├──────────────────────────────────────────────────────┤   │
│  │ Entry 0: [Key | Pointer/Data]                        │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Structure Components Explained

•Page Header — Metadata including page ID, type flag (leaf vs internal), entry count, and free space tracking. Usually 20-100 bytes depending on database.
•Sibling Pointers — Previous and next leaf page IDs enabling the linked-list traversal for range scans. These are the key B+-tree feature.
•Slot Directory — Array of offsets pointing to each entry's location in the data area. Enables binary search within the page without moving data on insert/delete.
•Free Space — Gap between slot directory (growing down) and data entries (growing up). Inserts consume this space from both directions.
•Data Entries — The actual key-value pairs. Format varies by index type (clustered vs non-clustered) and database implementation.

Slotted Page Architecture

The 'slot directory pointing to data' pattern (called 'slotted pages') is universal in database storage. It allows maintaining sorted order in the slot directory without physically moving data entries. When a key is deleted, only the slot is removed; the data space becomes free for reuse. This minimizes I/O by avoiding page reorganization.

The Leaf Linked List: Enabling Range Scans

The defining feature that separates B+-trees from B-trees is the linked list connecting all leaf nodes. This seemingly simple addition has profound implications for database performance.

How the Linked List Works:

When the B+-tree is constructed, leaf nodes are linked in key order:

Leaf₁ ←→ Leaf₂ ←→ Leaf₃ ←→ ... ←→ Leaf_n

Where Leaf₁ contains the smallest keys,
Leaf_n contains the largest keys,
and each leaf knows its neighbors.

Why This Matters for Range Queries:

Consider the query: SELECT * FROM users WHERE age BETWEEN 25 AND 35

Without leaf linking (standard B-tree):

Navigate to leaf containing age=25
Return to parent, check for more matches
Navigate to next leaf via parent traversal
Repeat until age > 35
Cost: Multiple root-to-leaf traversals

With leaf linking (B+-tree):

Navigate to leaf containing age=25 (one tree descent)
Follow next-leaf pointer sequentially
Continue until age > 35
Cost: One tree descent + sequential leaf reads

Range Scan Cost: B-Tree vs B+-Tree
Range Size	B-Tree I/Os	B+-Tree I/Os	Improvement
10 rows (1 leaf)	4	4	None
100 rows (10 leaves)	40	13	3x faster
1,000 rows (100 leaves)	400	103	4x faster
10,000 rows (1000 leaves)	4,000	1,003	4x faster

Sequential Scan Efficiency

Modern storage systems heavily optimize for sequential access. Following leaf pointers often results in sequential disk reads (especially if leaves were allocated together initially), further improving range scan performance beyond the I/O count reduction. SSDs reduce this gap but don't eliminate it—sequential reads still outperform random even on NVMe.

Bidirectional Linking:

Most B+-tree implementations use doubly-linked leaves:

Forward pointer (next) — Enables ascending range scans, ORDER BY ASC
Backward pointer (prev) — Enables descending range scans, ORDER BY DESC

Without backward pointers, queries like SELECT * FROM products ORDER BY price DESC LIMIT 10 would require scanning the entire index or using additional logic.

Leaf Link Maintenance:

When leaves split or merge, sibling pointers must be updated:

Before split of Leaf₂:
Leaf₁ ←→ Leaf₂ ←→ Leaf₃

After split into Leaf₂a and Leaf₂b:
Leaf₁ ←→ Leaf₂a ←→ Leaf₂b ←→ Leaf₃

This maintenance is localized—only the immediate neighbors need pointer updates, making it O(1) additional work per split.

Leaf Node Content: Clustered vs Non-Clustered

What leaf nodes actually contain depends critically on whether the index is clustered or non-clustered. This distinction affects storage, performance, and query optimization strategies.

Clustered Index Leaves

•Contain actual row data
•Leaves ARE the table data pages
•One clustered index per table (data can only be sorted one way)
•Range scan retrieves complete rows
•No additional lookup for non-key columns
•Wider entries = lower leaf fanout
•Insert order matters for fragmentation

Non-Clustered Index Leaves

•Contain pointers to row data
•Leaves point to heap or clustered index
•Many non-clustered indexes per table
•Range scan retrieves row locators
•Requires 'bookmark lookup' for extra columns
•Smaller entries = higher leaf fanout
•Table organization affects pointer stability

Leaf Entry Formats:

Clustered Index (InnoDB example):

[Primary Key] [Col2] [Col3] ... [ColN]
└── Key ──┘   └────── Row Data ─────┘

Non-Clustered Index (heap table):

[Index Key] [Row ID / Physical Location]
                    └── Points to heap page + slot

Non-Clustered Index (with clustered table):

[Index Key] [Primary Key]
            └── Used to look up clustered index

Important Implication:

In InnoDB (MySQL) and SQL Server, non-clustered index leaves store the primary key as the row locator, not a physical address. This means:

Non-clustered lookups require two B+-tree traversals (secondary → clustered)
Primary key size affects all non-clustered index sizes
Recommended: Keep primary keys small (INT/BIGINT, not wide VARCHAR)

Wide Primary Key Impact

If your primary key is a compound key of 5 VARCHAR(50) columns, every non-clustered index carries ~200+ bytes per entry just for the row locator. This dramatically reduces leaf fanout and inflates index storage. Consider a surrogate INT/BIGINT primary key with a unique constraint on the natural key columns.

Leaf Node Operations: Insert, Delete, and Split

Leaf nodes are where data modification actually occurs. Understanding these operations reveals why certain access patterns perform better than others.

Leaf Insert Operation

•Find correct leaf — Traverse tree to locate leaf where new key belongs
•Check space — If free space exists, insert entry and update slot directory
•Split if necessary — If leaf is full, split into two leaves and propagate separator to parent
•Update sibling links — New leaf must be spliced into the linked list
•Write-ahead log — Log changes before modifying pages (for crash recovery)

Leaf Node Split Process

Pseudocode

FUNCTION insert_into_leaf(leaf, key, value):
    IF leaf.has_space():
        leaf.insert_entry(key, value)
        RETURN
    
    // Leaf is full - split required
    new_leaf = allocate_new_page()
    
    // Distribute entries: first half stays, second half moves
    entries = leaf.get_all_entries() + [(key, value)]
    entries.sort_by_key()
    
    midpoint = len(entries) // 2
    leaf.set_entries(entries[:midpoint])
    new_leaf.set_entries(entries[midpoint:])
    
    // Update linked list
    new_leaf.next = leaf.next
    new_leaf.prev = leaf
    IF leaf.next IS NOT NULL:
        leaf.next.prev = new_leaf
    leaf.next = new_leaf
    
    // Propagate split to parent
    separator_key = new_leaf.first_key()
    insert_into_parent(leaf.parent, separator_key, new_leaf)
 
// Note: This may cascade up the tree if parent is also full,
// potentially reaching the root and increasing tree height.

Delete Operations:

Leaf deletions are generally simpler than inserts:

Find the entry — Navigate to correct leaf
Remove entry — Mark slot as deleted, optionally compact
Check underflow — If leaf falls below minimum fill, consider merging with sibling
Merge if needed — Combine two under-filled leaves, remove separator from parent

In practice, many databases skip merging — maintaining slightly under-filled leaves is often cheaper than the merge/split oscillation that can occur with frequent insert/delete patterns (this is called 'lazy deletion').

Performance Implications:

Leaf Operation Performance Characteristics
Operation	Best Case	Worst Case	Impact
Point Insert	1 leaf write	log_F(N) writes (splits cascade to root)	Usually 1-2 writes
Sequential Insert	Appends to rightmost leaf	Occasional split	Very efficient
Random Insert	Any leaf may split	Fragmentation accumulates	Can trigger rebuilds
Point Delete	1 leaf write	Merge + parent update	Usually 1 write
Range Delete	Multiple leaf modifications	Many merges possible	May warrant rebuild after

Leaf Density: Fill Factor and Fragmentation

How fully-packed leaf nodes are significantly impacts both storage efficiency and range scan performance. This is controlled by fill factor and degrades over time through fragmentation.

Leaf Fill Factor Concepts

•Fill Factor — Percentage of each leaf page that should be filled during index creation/rebuild. Typically 70-100%.
•Actual Density — Real-world fill percentage after operations. Degrades from fill factor due to deletes and leave-space-for-inserts splitting.
•Fragmentation — When logical order (linked list) differs from physical order (disk location), causing extra seeks during range scans.
•Page Splits — When inserts into full pages create two ~50% full pages, reducing average density.

Impact of Low Leaf Density:

Density	Pages Needed for 1M Rows	Range Scan I/O	Storage Overhead
100%	1,000 pages	1,000 reads	Baseline
75%	1,333 pages	1,333 reads	+33%
50%	2,000 pages	2,000 reads	+100%
25%	4,000 pages	4,000 reads	+300%

Half-empty leaves double your I/O for range scans and double your storage costs.

When Density Degrades:

Random inserts — Middle-of-index inserts cause splits throughout the tree
Frequent deletes — Gaps accumulate; lazy deletion leaves tombstones
Updates to varying-length columns — In-place updates that grow may cause page splits
Bulk loads into sparse space — Initial fill factor below 100% never recovers without rebuild

Monitoring Leaf Density

PostgreSQL: SELECT * FROM pgstattuple('index_name') shows avg_leaf_density SQL Server: sys.dm_db_index_physical_stats shows avg_page_space_used_in_percent MySQL: ANALYZE TABLE + SHOW INDEX shows rough cardinality but not density directly

When leaf density drops below 60-70%, consider REINDEX or REBUILD to restore efficiency.

Leaf Nodes and Covering Indexes

The contents of leaf nodes directly determine whether an index can satisfy a query entirely—a covering index—or whether additional table access is required.

Covering Index Definition:

A covering index (or index-only scan) is when all columns needed by a query exist in the index leaf entries, eliminating the need to access the base table.

How Leaf Content Enables Covering:

Index Covering Capability by Type
Index Type	Leaf Contains	Covers Query If...
Clustered	All row columns	Always (leaves ARE the table)
Non-clustered (basic)	Index key + row locator	Query only needs index key columns
Non-clustered + INCLUDE	Key + included columns	Query needs key or included columns
Composite non-clustered	All composite key columns	Query needs any subset of key columns

Covering Index Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Query: SELECT email, created_at FROM users WHERE email LIKE 'john%'
 
-- Option 1: Basic index (NOT covering)
CREATE INDEX idx_email ON users(email);
-- Leaf contains: [email] [PK pointer]
-- Requires bookmark lookup to get created_at
 
-- Option 2: Composite index (covering)
CREATE INDEX idx_email_created ON users(email, created_at);
-- Leaf contains: [email, created_at] [PK pointer]
-- All needed columns in leaf - no table access!
 
-- Option 3: INCLUDE columns (covering, optimized)
CREATE INDEX idx_email_incl ON users(email) INCLUDE (created_at);
-- Leaf contains: [email] [PK pointer] [created_at]
-- created_at stored in leaf but not in internal nodes
-- Better fanout than full composite since internal nodes smaller
 
-- Check for covering scan in execution plan:
-- Look for "Index Scan" without "Key Lookup" or "RID Lookup"

INCLUDE vs Composite Key Columns

INCLUDE columns are stored only in leaf nodes, not internal nodes. This is important:

• Composite key: Column is in both internal and leaf nodes, supports filtering/ordering • INCLUDE: Column only in leaves, supports covering but not filtering

Use INCLUDE when you need the column for output but won't filter/sort on it. Internal nodes stay smaller, maintaining higher fanout.

Summary: Mastering Leaf Nodes

Leaf nodes are where the tree meets the data — the terminal level that every query ultimately reaches. Understanding their structure and behavior is essential for index design and query optimization.

Key Takeaways

•Leaves contain all indexed values — Unlike internal nodes which only have separators, leaf nodes hold every key in the table.
•Sibling links enable range scans — The doubly-linked list of leaves allows sequential scan after initial tree descent, making B+-trees far superior to B-trees for range queries.
•Clustered vs non-clustered differs fundamentally — Clustered leaves contain row data; non-clustered leaves contain pointers requiring additional lookup (unless covering).
•Leaves comprise 99%+ of index size — The exponential fanout means leaves dominate storage, making leaf efficiency critical.
•Fill factor affects performance — Low leaf density from fragmentation or splits increases I/O proportionally for range scans.
•Covering indexes need leaf content — Queries are covered when all needed columns exist in leaf entries, eliminating table access.

What's Next:

With leaf nodes understood, we'll explore internal nodes — the navigational backbone of the B+-tree that guides searches to the correct leaves. You'll learn how internal nodes differ structurally, why they're almost always cached, and how separator keys enable efficient tree traversal.

Page Complete

You now understand leaf nodes — where B+-trees store actual data or data pointers. This knowledge enables you to reason about range query performance, understand covering indexes, diagnose fragmentation issues, and make informed decisions about index key design. Next, we examine internal nodes that guide the path to leaves.