Loading content...
While internal nodes guide searches through the tree, leaf nodes are where the journey ends. Every index lookup ultimately arrives at a leaf node to retrieve the data—or the pointer to the data—that satisfies the query.
Leaf nodes are special in multiple ways: they contain all the indexed values (not just separator keys), they're linked together for efficient range scans, and their structure differs significantly between clustered and non-clustered indexes. Understanding leaf nodes transforms your ability to reason about query performance, especially for range queries and covering indexes.
This page explores the anatomy of leaf nodes, their linking mechanism, how they interact with table data, and why their design makes B+-trees superior for database workloads compared to vanilla B-trees.
By the end of this page, you will understand the internal structure of leaf nodes, how the leaf-to-leaf linked list enables range scans, the difference between leaf content in clustered vs non-clustered indexes, and how leaf nodes impact query performance patterns.
In a B+-tree, leaf nodes occupy the lowest (deepest) level of the tree. Unlike internal nodes, which serve purely navigational purposes, leaf nodes contain the actual indexed data entries.
Formal Definition:
A leaf node is a tree node that has no child pointers to lower levels. It contains key-value pairs where the key is the indexed column(s) and the value is either the actual row data (clustered index) or a pointer to the row location (non-clustered index).
Key Properties of Leaf Nodes:
| Property | Leaf Nodes | Internal Nodes |
|---|---|---|
| Purpose | Store data entries | Guide navigation |
| Contains all keys? | Yes | No (separator keys only) |
| Child pointers | None | Yes (to next level) |
| Sibling pointers | Both prev and next | Typically none |
| Data/payload | Row data or row pointers | None |
| Percentage of pages | ~99%+ | ~1% or less |
| Caching priority | Lower (too numerous) | Higher (small, reused) |
A leaf node page has a specific internal layout optimized for both direct key lookups and sequential scans. Understanding this structure reveals how databases achieve efficient data access.
Typical Leaf Node Layout:
1234567891011121314151617181920212223242526272829303132333435
┌─────────────────────────────────────────────────────────────┐│ PAGE HEADER ││ ┌─────────────┬─────────────┬─────────────┬──────────────┐ ││ │ Page ID │ Page Type │ Item Count │ Free Space │ ││ │ (8 bytes) │ (LEAF) │ (2 bytes) │ Offset │ ││ └─────────────┴─────────────┴─────────────┴──────────────┘ │├─────────────────────────────────────────────────────────────┤│ SIBLING POINTERS ││ ┌───────────────────────┬─────────────────────────────────┐││ │ Previous Leaf Page ID │ Next Leaf Page ID │││ │ (8 bytes) │ (8 bytes) │││ └───────────────────────┴─────────────────────────────────┘│├─────────────────────────────────────────────────────────────┤│ SLOT DIRECTORY ││ ┌────────┬────────┬────────┬────────┬─────────────────────┐││ │ Slot 0 │ Slot 1 │ Slot 2 │ ... │ (grows downward) │││ │ offset │ offset │ offset │ │ │││ └────────┴────────┴────────┴────────┴─────────────────────┘││ │ ││ ▼ (offsets point into data area) ││ FREE SPACE ││ ││ ▲ (data grows upward) │├─────────────────────────────────────────────────────────────┤│ DATA ENTRIES ││ ┌──────────────────────────────────────────────────────┐ ││ │ Entry N: [Key | Pointer/Data] │ ││ ├──────────────────────────────────────────────────────┤ ││ │ Entry N-1: [Key | Pointer/Data] │ ││ ├──────────────────────────────────────────────────────┤ ││ │ ... │ ││ ├──────────────────────────────────────────────────────┤ ││ │ Entry 0: [Key | Pointer/Data] │ ││ └──────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────┘The 'slot directory pointing to data' pattern (called 'slotted pages') is universal in database storage. It allows maintaining sorted order in the slot directory without physically moving data entries. When a key is deleted, only the slot is removed; the data space becomes free for reuse. This minimizes I/O by avoiding page reorganization.
The defining feature that separates B+-trees from B-trees is the linked list connecting all leaf nodes. This seemingly simple addition has profound implications for database performance.
How the Linked List Works:
When the B+-tree is constructed, leaf nodes are linked in key order:
Leaf₁ ←→ Leaf₂ ←→ Leaf₃ ←→ ... ←→ Leaf_n
Where Leaf₁ contains the smallest keys,
Leaf_n contains the largest keys,
and each leaf knows its neighbors.
Why This Matters for Range Queries:
Consider the query: SELECT * FROM users WHERE age BETWEEN 25 AND 35
Without leaf linking (standard B-tree):
With leaf linking (B+-tree):
| Range Size | B-Tree I/Os | B+-Tree I/Os | Improvement |
|---|---|---|---|
| 10 rows (1 leaf) | 4 | 4 | None |
| 100 rows (10 leaves) | 40 | 13 | 3x faster |
| 1,000 rows (100 leaves) | 400 | 103 | 4x faster |
| 10,000 rows (1000 leaves) | 4,000 | 1,003 | 4x faster |
Modern storage systems heavily optimize for sequential access. Following leaf pointers often results in sequential disk reads (especially if leaves were allocated together initially), further improving range scan performance beyond the I/O count reduction. SSDs reduce this gap but don't eliminate it—sequential reads still outperform random even on NVMe.
Bidirectional Linking:
Most B+-tree implementations use doubly-linked leaves:
Without backward pointers, queries like SELECT * FROM products ORDER BY price DESC LIMIT 10 would require scanning the entire index or using additional logic.
Leaf Link Maintenance:
When leaves split or merge, sibling pointers must be updated:
Before split of Leaf₂:
Leaf₁ ←→ Leaf₂ ←→ Leaf₃
After split into Leaf₂a and Leaf₂b:
Leaf₁ ←→ Leaf₂a ←→ Leaf₂b ←→ Leaf₃
This maintenance is localized—only the immediate neighbors need pointer updates, making it O(1) additional work per split.
What leaf nodes actually contain depends critically on whether the index is clustered or non-clustered. This distinction affects storage, performance, and query optimization strategies.
Leaf Entry Formats:
Clustered Index (InnoDB example):
[Primary Key] [Col2] [Col3] ... [ColN]
└── Key ──┘ └────── Row Data ─────┘
Non-Clustered Index (heap table):
[Index Key] [Row ID / Physical Location]
└── Points to heap page + slot
Non-Clustered Index (with clustered table):
[Index Key] [Primary Key]
└── Used to look up clustered index
Important Implication:
In InnoDB (MySQL) and SQL Server, non-clustered index leaves store the primary key as the row locator, not a physical address. This means:
If your primary key is a compound key of 5 VARCHAR(50) columns, every non-clustered index carries ~200+ bytes per entry just for the row locator. This dramatically reduces leaf fanout and inflates index storage. Consider a surrogate INT/BIGINT primary key with a unique constraint on the natural key columns.
Leaf nodes are where data modification actually occurs. Understanding these operations reveals why certain access patterns perform better than others.
1234567891011121314151617181920212223242526272829
FUNCTION insert_into_leaf(leaf, key, value): IF leaf.has_space(): leaf.insert_entry(key, value) RETURN // Leaf is full - split required new_leaf = allocate_new_page() // Distribute entries: first half stays, second half moves entries = leaf.get_all_entries() + [(key, value)] entries.sort_by_key() midpoint = len(entries) // 2 leaf.set_entries(entries[:midpoint]) new_leaf.set_entries(entries[midpoint:]) // Update linked list new_leaf.next = leaf.next new_leaf.prev = leaf IF leaf.next IS NOT NULL: leaf.next.prev = new_leaf leaf.next = new_leaf // Propagate split to parent separator_key = new_leaf.first_key() insert_into_parent(leaf.parent, separator_key, new_leaf) // Note: This may cascade up the tree if parent is also full,// potentially reaching the root and increasing tree height.Delete Operations:
Leaf deletions are generally simpler than inserts:
In practice, many databases skip merging — maintaining slightly under-filled leaves is often cheaper than the merge/split oscillation that can occur with frequent insert/delete patterns (this is called 'lazy deletion').
Performance Implications:
| Operation | Best Case | Worst Case | Impact |
|---|---|---|---|
| Point Insert | 1 leaf write | log_F(N) writes (splits cascade to root) | Usually 1-2 writes |
| Sequential Insert | Appends to rightmost leaf | Occasional split | Very efficient |
| Random Insert | Any leaf may split | Fragmentation accumulates | Can trigger rebuilds |
| Point Delete | 1 leaf write | Merge + parent update | Usually 1 write |
| Range Delete | Multiple leaf modifications | Many merges possible | May warrant rebuild after |
How fully-packed leaf nodes are significantly impacts both storage efficiency and range scan performance. This is controlled by fill factor and degrades over time through fragmentation.
Impact of Low Leaf Density:
| Density | Pages Needed for 1M Rows | Range Scan I/O | Storage Overhead |
|---|---|---|---|
| 100% | 1,000 pages | 1,000 reads | Baseline |
| 75% | 1,333 pages | 1,333 reads | +33% |
| 50% | 2,000 pages | 2,000 reads | +100% |
| 25% | 4,000 pages | 4,000 reads | +300% |
Half-empty leaves double your I/O for range scans and double your storage costs.
When Density Degrades:
PostgreSQL: SELECT * FROM pgstattuple('index_name') shows avg_leaf_density SQL Server: sys.dm_db_index_physical_stats shows avg_page_space_used_in_percent MySQL: ANALYZE TABLE + SHOW INDEX shows rough cardinality but not density directly
When leaf density drops below 60-70%, consider REINDEX or REBUILD to restore efficiency.
The contents of leaf nodes directly determine whether an index can satisfy a query entirely—a covering index—or whether additional table access is required.
Covering Index Definition:
A covering index (or index-only scan) is when all columns needed by a query exist in the index leaf entries, eliminating the need to access the base table.
How Leaf Content Enables Covering:
| Index Type | Leaf Contains | Covers Query If... |
|---|---|---|
| Clustered | All row columns | Always (leaves ARE the table) |
| Non-clustered (basic) | Index key + row locator | Query only needs index key columns |
| Non-clustered + INCLUDE | Key + included columns | Query needs key or included columns |
| Composite non-clustered | All composite key columns | Query needs any subset of key columns |
1234567891011121314151617181920
-- Query: SELECT email, created_at FROM users WHERE email LIKE 'john%' -- Option 1: Basic index (NOT covering)CREATE INDEX idx_email ON users(email);-- Leaf contains: [email] [PK pointer]-- Requires bookmark lookup to get created_at -- Option 2: Composite index (covering)CREATE INDEX idx_email_created ON users(email, created_at);-- Leaf contains: [email, created_at] [PK pointer]-- All needed columns in leaf - no table access! -- Option 3: INCLUDE columns (covering, optimized)CREATE INDEX idx_email_incl ON users(email) INCLUDE (created_at);-- Leaf contains: [email] [PK pointer] [created_at]-- created_at stored in leaf but not in internal nodes-- Better fanout than full composite since internal nodes smaller -- Check for covering scan in execution plan:-- Look for "Index Scan" without "Key Lookup" or "RID Lookup"INCLUDE columns are stored only in leaf nodes, not internal nodes. This is important:
• Composite key: Column is in both internal and leaf nodes, supports filtering/ordering • INCLUDE: Column only in leaves, supports covering but not filtering
Use INCLUDE when you need the column for output but won't filter/sort on it. Internal nodes stay smaller, maintaining higher fanout.
Leaf nodes are where the tree meets the data — the terminal level that every query ultimately reaches. Understanding their structure and behavior is essential for index design and query optimization.
What's Next:
With leaf nodes understood, we'll explore internal nodes — the navigational backbone of the B+-tree that guides searches to the correct leaves. You'll learn how internal nodes differ structurally, why they're almost always cached, and how separator keys enable efficient tree traversal.
You now understand leaf nodes — where B+-trees store actual data or data pointers. This knowledge enables you to reason about range query performance, understand covering indexes, diagnose fragmentation issues, and make informed decisions about index key design. Next, we examine internal nodes that guide the path to leaves.