Loading learning content...
How does a database know where to find a specific record on a page? If records can be variable-length and their positions can change during compaction, how do indexes maintain valid pointers? The answer lies in one of the most elegant abstractions in database design: the slot directory.\n\nThe slot directory is a level of indirection between logical record identifiers and physical byte offsets. This seemingly simple concept—an array of pointers—enables the entire machinery of efficient record management, including in-place updates, garbage collection, and stable external references. Without the slot directory, databases would face a choice between rigid fixed-length records or chaotic, fragile direct pointers.
By the end of this page, you will understand the complete mechanics of slot directories, including their structure, how they support record addressing, the relationship between slot numbers and tuple IDs (TIDs/RIDs), and how they enable compaction without invalidating external references. You'll also explore edge cases and implementation variations across major database systems.
Before understanding the solution, let's deeply understand the problem. Consider a page containing multiple records of varying sizes:\n\nThe Direct Addressing Problem\n\nImagine we simply stored records contiguously and addressed them by byte offset:\n\n\nPage with direct addressing:\n┌──────────────────────────────────────────────────────────┐\n│ Offset 24: Record A (100 bytes) │\n│ Offset 124: Record B (250 bytes) │\n│ Offset 374: Record C (75 bytes) │\n│ Offset 449: Record D (300 bytes) │\n└──────────────────────────────────────────────────────────┘\n\nIndex entry for Record C: (PageID=47, Offset=374)\n\n\nNow, what happens when we delete Record B?
The Cascading Disaster\n\nWith compaction, Records C and D move to new offsets:\n- Record C: 374 → 124\n- Record D: 449 → 199\n\nBut the index entry for Record C still says offset 374! Every reference to these records is now invalid. We face an impossible choice:\n\n1. Never compact — Accept permanent space loss, eventual page exhaustion\n2. Update all references — Scan all indexes, all foreign key chains, all cached pointers. Prohibitively expensive\n3. Use fixed-length records — All records same size, can assign fixed slot positions. Inflexible, wasteful\n4. Add an indirection layer — The slot directory solution
Computer science has a famous saying: 'All problems in computer science can be solved by another level of indirection.' The slot directory is a prime example. By inserting a stable layer between references and data, we gain the flexibility to reorganize data without breaking references.
The slot directory is an array of fixed-size entries, each containing information about one record on the page. The array typically grows from just after the page header toward the center of the page.
Slot Entry Contents\n\nEach slot entry contains at minimum:\n\n| Field | Size | Purpose |\n|-------|------|---------|\n| Offset | 2-4 bytes | Physical byte offset of record within page |\n| Length | 2 bytes | Size of record in bytes (optional; can derive from next) |\n| Flags | 1-2 bytes | Status bits: in-use, dead, redirected, etc. |\n\nTotal slot entry size is typically 4-8 bytes per slot.
Key Observations\n\n1. Slot numbers are stable — Slot 0 is always slot 0, regardless of where its record moves\n2. Physical offsets can change — During compaction, records move but slot entries are updated accordingly\n3. Dead slots remain — Slot 2 is marked DEAD; the slot entry persists but points nowhere\n4. Records are not ordered — Record storage order doesn't match slot order\n5. External references use (PageID, SlotNum) — This tuple is called a TID (Tuple ID) or RID (Row ID)
Dead slots cannot be immediately reused because slot numbers may be referenced externally (in indexes, by other transactions). Reusing a slot number could cause an old reference to reach a new, unrelated record. Slot reuse requires careful coordination—typically only after all potential references are guaranteed gone (e.g., after vacuum in PostgreSQL).
The combination of page number and slot number forms the Tuple Identifier (TID) or Row Identifier (RID)—the universal way to refer to a specific record in the database.\n\nTID Format\n\n\nTID = (BlockNumber, OffsetNumber)\n \____________/\___________/\n Page ID Slot Number\n (4-6 bytes) (2 bytes)\n\nExample: (47, 3) means "Page 47, Slot 3"\n\n\nWhere TIDs Appear\n\nTIDs are used throughout the database system:
| Context | How TID Is Used | Example |
|---|---|---|
| B-tree Index Leaf | Index entry contains key + TID pointing to heap | (key='Smith', TID=(47,3)) |
| Index-Only Scan | TID checked against visibility map | If page 47 all-visible, skip heap fetch |
| MVCC Tuple Chains | Old versions point to newer via TID | Tuple at (47,3) → successor at (47,15) |
| Foreign Key Checks | Internal FK validation uses TIDs | Child row references parent at (100,7) |
| CTID System Column | PostgreSQL exposes TID as 'ctid' pseudo-column | SELECT ctid FROM orders → (47,3) |
| HOT Chain | Heap-Only Tuple updates chain via TID | (47,3) → (47,12) → (47,18) on same page |
| Vacuum/Autovacuum | Dead TIDs collected for cleanup | Remove index entries pointing to dead TIDs |
TID Stability Guarantees\n\nThe slot directory ensures that a TID remains valid as long as:\n1. The record exists (live or dead-but-not-vacuumed)\n2. The slot number has not been reused\n\nThis stability is critical for:\n- Index consistency — Index entries don't need updates when heap records move within a page\n- Cursor stability — Open cursors can rely on TIDs to refetch rows\n- Concurrency — Transactions can hold TID references without locking
While TIDs are stable during a row's lifetime, they can change after VACUUM FULL, CLUSTER, or similar reorganization operations that move rows between pages. Applications should never store TIDs externally as permanent keys—use primary keys instead. TIDs are for internal database use.
Let's examine how the slot directory is manipulated during common operations:
INSERT: Allocating a New Slot\n\n\nOperation: INSERT new 150-byte record\n\nBefore:\n Lower = 40 (4 slots × 4 bytes/slot + 24 header)\n Upper = 7000 (first free byte before records)\n Free space = 7000 - 40 = 6960 bytes\n \nStep 1: Check space\n Need: 150 (record) + 4 (new slot) = 154 bytes\n Have: 6960 bytes ✓\n \nStep 2: Allocate record space\n New Upper = 7000 - 150 = 6850\n Copy record to offset 6850\n \nStep 3: Allocate slot\n New slot 4 at offset 40\n Slot 4 = {offset=6850, length=150, flags=USED}\n New Lower = 40 + 4 = 44\n \nStep 4: Update header\n Lower: 40 → 44\n Upper: 7000 → 6850\n\nAfter:\n Free space = 6850 - 44 = 6806 bytes\n New TID = (PageID, 4)\n
DELETE: Marking a Slot Dead\n\n\nOperation: DELETE record at slot 2\n\nBefore:\n Slot 2 = {offset=7500, length=200, flags=USED}\n \nStep 1: Mark slot dead\n Slot 2 = {offset=7500, length=200, flags=DEAD}\n \nStep 2: (Optional) Clear record content\n Some systems zero out the record bytes for security\n \nNote: Lower and Upper unchanged!\n The 200 bytes at offset 7500 are now 'dead space'\n They cannot be reclaimed until:\n - Page compaction occurs, OR\n - Vacuum frees the slot for reuse\n \nAfter:\n Logical free space unchanged (Upper-Lower)\n Actual reclaimable space = original + 200 dead bytes\n Dead slot remains to preserve TID stability\n
UPDATE: Modifying in Place or Chain\n\nUpdates are complex because:\n- If new record fits in old's space: in-place update possible\n- If new record is larger: delete old + insert new (potentially different page!)\n- With MVCC: old version retained, new version added\n\n\nScenario A: In-Place Update (record shrinks or same size)\n Slot 3 = {offset=7200, length=100, flags=USED}\n Update to 80-byte record\n \n Result:\n Copy new 80 bytes to offset 7200\n Slot 3 = {offset=7200, length=80, flags=USED}\n 20 bytes at 7280-7299 become internal fragmentation\n \nScenario B: MVCC Update (PostgreSQL style)\n Old tuple at Slot 3, need new version\n \n Result:\n Insert new tuple → gets Slot 7 at offset 6500\n Slot 3.flags = DEAD (but keeps for MVCC readers)\n Old tuple's t_ctid points to (PageID, 7)\n Slot 3 = {offset=7200, length=100, flags=DEAD, chain→7}\n
PostgreSQL's Heap-Only Tuple (HOT) optimization creates update chains entirely within one page. If an update doesn't change indexed columns, the new version is added to the same page, linked from the old slot. Index entries remain valid (pointing to the chain head), avoiding expensive index updates. HOT can dramatically reduce index maintenance overhead for update-heavy workloads.
Compaction (also called page pruning or defragmentation) reclaims the fragmented space left by deletions and updates. The slot directory makes this safe by providing a stable indirection layer.
Compaction Algorithm\n\n\nPage Compaction Procedure:\n\n1. LOCK page exclusively\n\n2. SCAN slot directory, build list of live records:\n live_records = []\n for slot in slots:\n if slot.flags == LIVE:\n live_records.append({slot_num, slot.offset, slot.length})\n\n3. SORT live records by offset (highest first = bottom of record region)\n\n4. COMPACT records from page bottom:\n new_offset = PAGE_SIZE // Start from end\n for record in live_records (sorted):\n new_offset -= record.length\n memmove(page + new_offset, page + record.offset, record.length)\n slots[record.slot_num].offset = new_offset\n\n5. UPDATE header:\n Upper = new_offset // New boundary of record region\n // Lower unchanged (slot directory same size)\n\n6. ZERO dead space (security, debugging):\n memset(page + Lower, 0, Upper - Lower)\n\n7. WRITE WAL record (full-page image for safety)\n\n8. RELEASE lock\n\n\nWhy slot numbers don't change:\n\nThe critical insight is that we only update the offset field in each slot entry. The slot number (index in the array) remains constant. Any external reference using (PageID, SlotNumber) remains valid—it now resolves to the record's new physical location.
Databases don't compact on every deletion. Common triggers include: (1) INSERT fails due to fragmented but reclaimable space, (2) Explicit VACUUM or maintenance command, (3) Dead tuple threshold exceeded, (4) Background page cleaner process. Eager compaction wastes CPU on pages that may see more deletions; lazy compaction may leave pages inefficiently used longer.
While the slot directory concept is universal, implementations vary across database systems:
| Database | Slot Entry Name | Entry Size | Special Features |
|---|---|---|---|
| PostgreSQL | ItemId (Line Pointer) | 4 bytes | 15-bit offset, 15-bit flags, redirect chains for HOT |
| MySQL InnoDB | Record Directory | Variable | Implicit in record headers; uses next-record pointers |
| Oracle | Row Directory | 2 bytes/entry | Offset only; row header contains length |
| SQL Server | Slot Array | 2 bytes/slot | Grows from page end; offset to record start |
| SQLite | Cell Pointer Array | 2 bytes/cell | Sorted by key for B-tree pages; unsorted for others |
PostgreSQL's ItemId in Detail\n\nPostgreSQL's line pointer (ItemId) is a 4-byte packed structure:\n\nc\ntypedef struct ItemIdData {\n unsigned lp_off:15, // Offset to tuple (0-32767)\n lp_flags:2, // LP_UNUSED, LP_NORMAL, LP_REDIRECT, LP_DEAD\n lp_len:15; // Tuple length in bytes\n} ItemIdData;\n\n\nThe lp_flags field enables sophisticated behaviors:\n- LP_UNUSED — Slot is free for reuse\n- LP_NORMAL — Points to a regular tuple\n- LP_REDIRECT — Points to another slot (for HOT chains)\n- LP_DEAD — Tuple is dead, awaiting vacuum\n\nInnoDB's Implicit Directory\n\nInnoDB doesn't have a separate slot directory. Instead:\n- Records form a singly-linked list via 'next record' pointers in each record header\n- The page header contains a pointer to the first record\n- Finding a record requires traversing the list (but B-tree structure usually navigates directly)\n- Infimum/Supremum pseudo-records mark list boundaries
PostgreSQL's explicit directory enables O(1) access to any slot but consumes 4 bytes per record. InnoDB's linked-list approach saves per-record overhead but makes random slot access O(n). Since InnoDB pages are primarily accessed via B-tree traversal (which leads directly to records), the linked list is acceptable. PostgreSQL's heap tables benefit from direct slot access for sequential scans and TID lookups.
Real-world slot directory management involves numerous edge cases:
The Slot Bloat Problem\n\nConsider a pattern where a page sees many insert-delete cycles:\n\n\nInitial: 100 records, 100 slots\nDelete all 100 records: 0 live records, 100 dead slots\nInsert 100 new records: 100 live records, 200 total slots\nRepeat many times...\n\nAfter N cycles: ~100 live records, N×100 slot entries\n\n\nThe slot directory grows without bound because dead slots persist until vacuum. Solutions:\n1. Aggressive vacuum — Reclaim dead slots promptly\n2. Page rewrite — VACUUM FULL rewrites pages with fresh slot directories\n3. Slot reuse policy — Some systems aggressively reuse slots
In PostgreSQL, you can query pg_stat_user_tables for dead tuple counts and bloat estimates. High n_dead_tup values indicate vacuum isn't keeping pace. The pageinspect extension lets you examine actual slot directory contents: SELECT lp, lp_flags, lp_off, lp_len FROM heap_page_items(get_raw_page('tablename', 0));
The slot directory is a deceptively simple structure with profound implications. Let's consolidate the key takeaways:
What's Next:\n\nNow that you understand how records are addressed within pages, we'll examine the records themselves. How are individual column values laid out within a record? What about NULL handling, variable-length fields, and type encoding? The next page explores record layout in detail.
You now understand the slot directory—the indirection layer that makes variable-length record management possible. This knowledge is essential for understanding MVCC, index operations, vacuum processes, and storage optimization. Next, we'll dive into how records themselves are structured.