Loading content...
How does a database know where to find a specific record on a page? If records can be variable-length and their positions can change during compaction, how do indexes maintain valid pointers? The answer lies in one of the most elegant abstractions in database design: the slot directory.
The slot directory is a level of indirection between logical record identifiers and physical byte offsets. This seemingly simple concept—an array of pointers—enables the entire machinery of efficient record management, including in-place updates, garbage collection, and stable external references. Without the slot directory, databases would face a choice between rigid fixed-length records or chaotic, fragile direct pointers.
By the end of this page, you will understand the complete mechanics of slot directories, including their structure, how they support record addressing, the relationship between slot numbers and tuple IDs (TIDs/RIDs), and how they enable compaction without invalidating external references. You'll also explore edge cases and implementation variations across major database systems.
Before understanding the solution, let's deeply understand the problem. Consider a page containing multiple records of varying sizes:
The Direct Addressing Problem
Imagine we simply stored records contiguously and addressed them by byte offset:
Page with direct addressing:
┌──────────────────────────────────────────────────────────┐
│ Offset 24: Record A (100 bytes) │
│ Offset 124: Record B (250 bytes) │
│ Offset 374: Record C (75 bytes) │
│ Offset 449: Record D (300 bytes) │
└──────────────────────────────────────────────────────────┘
Index entry for Record C: (PageID=47, Offset=374)
Now, what happens when we delete Record B?
The Cascading Disaster
With compaction, Records C and D move to new offsets:
But the index entry for Record C still says offset 374! Every reference to these records is now invalid. We face an impossible choice:
Computer science has a famous saying: 'All problems in computer science can be solved by another level of indirection.' The slot directory is a prime example. By inserting a stable layer between references and data, we gain the flexibility to reorganize data without breaking references.
The slot directory is an array of fixed-size entries, each containing information about one record on the page. The array typically grows from just after the page header toward the center of the page.
Slot Entry Contents
Each slot entry contains at minimum:
| Field | Size | Purpose |
|---|---|---|
| Offset | 2-4 bytes | Physical byte offset of record within page |
| Length | 2 bytes | Size of record in bytes (optional; can derive from next) |
| Flags | 1-2 bytes | Status bits: in-use, dead, redirected, etc. |
Total slot entry size is typically 4-8 bytes per slot.
Key Observations
Dead slots cannot be immediately reused because slot numbers may be referenced externally (in indexes, by other transactions). Reusing a slot number could cause an old reference to reach a new, unrelated record. Slot reuse requires careful coordination—typically only after all potential references are guaranteed gone (e.g., after vacuum in PostgreSQL).
The combination of page number and slot number forms the Tuple Identifier (TID) or Row Identifier (RID)—the universal way to refer to a specific record in the database.
TID Format
TID = (BlockNumber, OffsetNumber)
\____________/\___________/
Page ID Slot Number
(4-6 bytes) (2 bytes)
Example: (47, 3) means "Page 47, Slot 3"
Where TIDs Appear
TIDs are used throughout the database system:
| Context | How TID Is Used | Example |
|---|---|---|
| B-tree Index Leaf | Index entry contains key + TID pointing to heap | (key='Smith', TID=(47,3)) |
| Index-Only Scan | TID checked against visibility map | If page 47 all-visible, skip heap fetch |
| MVCC Tuple Chains | Old versions point to newer via TID | Tuple at (47,3) → successor at (47,15) |
| Foreign Key Checks | Internal FK validation uses TIDs | Child row references parent at (100,7) |
| CTID System Column | PostgreSQL exposes TID as 'ctid' pseudo-column | SELECT ctid FROM orders → (47,3) |
| HOT Chain | Heap-Only Tuple updates chain via TID | (47,3) → (47,12) → (47,18) on same page |
| Vacuum/Autovacuum | Dead TIDs collected for cleanup | Remove index entries pointing to dead TIDs |
TID Stability Guarantees
The slot directory ensures that a TID remains valid as long as:
This stability is critical for:
While TIDs are stable during a row's lifetime, they can change after VACUUM FULL, CLUSTER, or similar reorganization operations that move rows between pages. Applications should never store TIDs externally as permanent keys—use primary keys instead. TIDs are for internal database use.
Let's examine how the slot directory is manipulated during common operations:
INSERT: Allocating a New Slot
Operation: INSERT new 150-byte record
Before:
Lower = 40 (4 slots × 4 bytes/slot + 24 header)
Upper = 7000 (first free byte before records)
Free space = 7000 - 40 = 6960 bytes
Step 1: Check space
Need: 150 (record) + 4 (new slot) = 154 bytes
Have: 6960 bytes ✓
Step 2: Allocate record space
New Upper = 7000 - 150 = 6850
Copy record to offset 6850
Step 3: Allocate slot
New slot 4 at offset 40
Slot 4 = {offset=6850, length=150, flags=USED}
New Lower = 40 + 4 = 44
Step 4: Update header
Lower: 40 → 44
Upper: 7000 → 6850
After:
Free space = 6850 - 44 = 6806 bytes
New TID = (PageID, 4)
DELETE: Marking a Slot Dead
Operation: DELETE record at slot 2
Before:
Slot 2 = {offset=7500, length=200, flags=USED}
Step 1: Mark slot dead
Slot 2 = {offset=7500, length=200, flags=DEAD}
Step 2: (Optional) Clear record content
Some systems zero out the record bytes for security
Note: Lower and Upper unchanged!
The 200 bytes at offset 7500 are now 'dead space'
They cannot be reclaimed until:
- Page compaction occurs, OR
- Vacuum frees the slot for reuse
After:
Logical free space unchanged (Upper-Lower)
Actual reclaimable space = original + 200 dead bytes
Dead slot remains to preserve TID stability
UPDATE: Modifying in Place or Chain
Updates are complex because:
Scenario A: In-Place Update (record shrinks or same size)
Slot 3 = {offset=7200, length=100, flags=USED}
Update to 80-byte record
Result:
Copy new 80 bytes to offset 7200
Slot 3 = {offset=7200, length=80, flags=USED}
20 bytes at 7280-7299 become internal fragmentation
Scenario B: MVCC Update (PostgreSQL style)
Old tuple at Slot 3, need new version
Result:
Insert new tuple → gets Slot 7 at offset 6500
Slot 3.flags = DEAD (but keeps for MVCC readers)
Old tuple's t_ctid points to (PageID, 7)
Slot 3 = {offset=7200, length=100, flags=DEAD, chain→7}
PostgreSQL's Heap-Only Tuple (HOT) optimization creates update chains entirely within one page. If an update doesn't change indexed columns, the new version is added to the same page, linked from the old slot. Index entries remain valid (pointing to the chain head), avoiding expensive index updates. HOT can dramatically reduce index maintenance overhead for update-heavy workloads.
Compaction (also called page pruning or defragmentation) reclaims the fragmented space left by deletions and updates. The slot directory makes this safe by providing a stable indirection layer.
Compaction Algorithm
Page Compaction Procedure:
1. LOCK page exclusively
2. SCAN slot directory, build list of live records:
live_records = []
for slot in slots:
if slot.flags == LIVE:
live_records.append({slot_num, slot.offset, slot.length})
3. SORT live records by offset (highest first = bottom of record region)
4. COMPACT records from page bottom:
new_offset = PAGE_SIZE // Start from end
for record in live_records (sorted):
new_offset -= record.length
memmove(page + new_offset, page + record.offset, record.length)
slots[record.slot_num].offset = new_offset
5. UPDATE header:
Upper = new_offset // New boundary of record region
// Lower unchanged (slot directory same size)
6. ZERO dead space (security, debugging):
memset(page + Lower, 0, Upper - Lower)
7. WRITE WAL record (full-page image for safety)
8. RELEASE lock
Why slot numbers don't change:
The critical insight is that we only update the offset field in each slot entry. The slot number (index in the array) remains constant. Any external reference using (PageID, SlotNumber) remains valid—it now resolves to the record's new physical location.
Databases don't compact on every deletion. Common triggers include: (1) INSERT fails due to fragmented but reclaimable space, (2) Explicit VACUUM or maintenance command, (3) Dead tuple threshold exceeded, (4) Background page cleaner process. Eager compaction wastes CPU on pages that may see more deletions; lazy compaction may leave pages inefficiently used longer.
While the slot directory concept is universal, implementations vary across database systems:
| Database | Slot Entry Name | Entry Size | Special Features |
|---|---|---|---|
| PostgreSQL | ItemId (Line Pointer) | 4 bytes | 15-bit offset, 15-bit flags, redirect chains for HOT |
| MySQL InnoDB | Record Directory | Variable | Implicit in record headers; uses next-record pointers |
| Oracle | Row Directory | 2 bytes/entry | Offset only; row header contains length |
| SQL Server | Slot Array | 2 bytes/slot | Grows from page end; offset to record start |
| SQLite | Cell Pointer Array | 2 bytes/cell | Sorted by key for B-tree pages; unsorted for others |
PostgreSQL's ItemId in Detail
PostgreSQL's line pointer (ItemId) is a 4-byte packed structure:
typedef struct ItemIdData {
unsigned lp_off:15, // Offset to tuple (0-32767)
lp_flags:2, // LP_UNUSED, LP_NORMAL, LP_REDIRECT, LP_DEAD
lp_len:15; // Tuple length in bytes
} ItemIdData;
The lp_flags field enables sophisticated behaviors:
InnoDB's Implicit Directory
InnoDB doesn't have a separate slot directory. Instead:
PostgreSQL's explicit directory enables O(1) access to any slot but consumes 4 bytes per record. InnoDB's linked-list approach saves per-record overhead but makes random slot access O(n). Since InnoDB pages are primarily accessed via B-tree traversal (which leads directly to records), the linked list is acceptable. PostgreSQL's heap tables benefit from direct slot access for sequential scans and TID lookups.
Real-world slot directory management involves numerous edge cases:
The Slot Bloat Problem
Consider a pattern where a page sees many insert-delete cycles:
Initial: 100 records, 100 slots
Delete all 100 records: 0 live records, 100 dead slots
Insert 100 new records: 100 live records, 200 total slots
Repeat many times...
After N cycles: ~100 live records, N×100 slot entries
The slot directory grows without bound because dead slots persist until vacuum. Solutions:
In PostgreSQL, you can query pg_stat_user_tables for dead tuple counts and bloat estimates. High n_dead_tup values indicate vacuum isn't keeping pace. The pageinspect extension lets you examine actual slot directory contents: SELECT lp, lp_flags, lp_off, lp_len FROM heap_page_items(get_raw_page('tablename', 0));
The slot directory is a deceptively simple structure with profound implications. Let's consolidate the key takeaways:
What's Next:
Now that you understand how records are addressed within pages, we'll examine the records themselves. How are individual column values laid out within a record? What about NULL handling, variable-length fields, and type encoding? The next page explores record layout in detail.
You now understand the slot directory—the indirection layer that makes variable-length record management possible. This knowledge is essential for understanding MVCC, index operations, vacuum processes, and storage optimization. Next, we'll dive into how records themselves are structured.