Database Management SystemsPage Layout

Page Layout and Organization

LevelIntermediate

Duration60 mins

TopicPage Layout

1 / 5

Page Structure

The Fundamental Unit of Database Storage

Every piece of data you've ever stored in a database—every row, every column, every index entry—ultimately lives on a page. The page is the fundamental unit of I/O in virtually all database management systems, serving as the atomic container that bridges the gap between persistent storage and in-memory data manipulation.\n\nUnderstanding page structure isn't merely academic detail; it's the foundation upon which all higher-level database operations are built. Query performance, storage efficiency, concurrency control, and recovery mechanisms all depend intimately on how pages are organized. A Principal Engineer working on database internals or optimizing storage-intensive applications must understand these concepts at a deep, intuitive level.

What You Will Learn

By the end of this page, you will understand the complete anatomy of a database page, including its header, free space management, and how different components interact. You'll grasp why page design decisions have cascading effects throughout the entire database system and how to reason about page-level tradeoffs in real-world scenarios.

Why Pages Exist

Before diving into page structure, we must understand why databases use pages in the first place. This isn't an arbitrary design choice—it's a carefully considered solution to fundamental constraints imposed by hardware and operating systems.\n\nThe Disk I/O Reality\n\nMagnetic disks and even modern SSDs don't read or write individual bytes. They operate in fixed-size blocks, typically 512 bytes or 4KB for disk sectors and larger blocks for SSD pages. Any read or write operation, no matter how small the actual data, transfers an entire block.\n\nConsider what happens without page abstraction:\n- Read 1 byte? Transfer 4KB from disk\n- Write 1 byte? Read 4KB, modify 1 byte, write 4KB back\n- N random single-byte accesses? N × 4KB transfers\n\nThis mismatch between byte-addressable needs and block-addressable reality drove the invention of the page as a management abstraction.

Hardware I/O Characteristics Driving Page Design
Characteristic	Magnetic Disk (HDD)	Solid State Drive (SSD)	Implication for Pages
Minimum Transfer Unit	512B - 4KB sector	4KB - 16KB page	Pages must align with device block boundaries
Random Read Latency	5-10ms (seek + rotation)	50-200μs	Larger pages amortize fixed access overhead
Sequential Bandwidth	100-200 MB/s	500-7000 MB/s	Large sequential page reads are efficient
Write Amplification	N/A (rewrite in place)	Significant (erase blocks)	Page size affects SSD wear and performance
Parallelism	Single head per platter	Multiple channels	Page I/O can be parallelized on SSDs

The Buffer Pool Connection\n\nPages serve as the unit of transfer between disk and the buffer pool (main memory cache). When the database needs to access a row, it:\n\n1. Determines which page contains the row (via page ID)\n2. Checks if that page is in the buffer pool\n3. If not, reads the entire page from disk into a buffer frame\n4. Accesses the row within the in-memory page\n5. Eventually, if the page was modified, writes the entire page back to disk\n\nThis page-at-a-time model has profound implications:\n- Multiple rows on the same page share the I/O cost\n- Rows physically near each other (on the same page) have temporal locality benefits\n- Page size directly affects buffer pool hit rates and memory utilization

The Page Size Tradeoff

Larger pages (e.g., 32KB) transfer more data per I/O but may waste bandwidth if only a small portion is needed. Smaller pages (e.g., 4KB) are more granular but incur more I/O operations for large scans. Most databases default to 4KB-16KB pages, matching OS page size and SSD characteristics. PostgreSQL uses 8KB; Oracle defaults to 8KB but allows 2KB-32KB; SQL Server uses 8KB.

Page Anatomy: The Complete Picture

A database page is a carefully organized data structure, typically fixed at 4KB, 8KB, or 16KB. While implementations vary across database systems, the fundamental components remain consistent:\n\nThe Three-Region Model\n\nMost database pages follow a conceptual model with three primary regions:

Converting Mermaid diagram...

Core Page Components

•Header Region — Fixed-size metadata at the page start containing identification, versioning, and management information. Every page operation reads the header first.
•Slot Directory — Variable-size array of pointers that map logical slot numbers to physical byte offsets within the page. Grows downward from the header.
•Free Space — Unallocated bytes between the slot directory and record region. This is the available capacity for new records and expanded slot entries.
•Record Region — Actual tuple data stored contiguously (or nearly so) from the page end, growing upward toward the center.

Why This Layout?\n\nThe bidirectional growth pattern (slots growing down, records growing up) is intentional:\n\n1. Maximizes contiguous free space — All unused bytes form a single block, simplifying allocation\n2. Eliminates internal fragmentation during initial insertions — New records and their slot entries simply claim space from opposite ends\n3. Enables efficient compaction — When free space becomes fragmented, records can be compacted toward the bottom without invalidating slot numbers\n4. Supports variable-length records — Unlike fixed-offset schemes, records of any size can coexist on the same page

Page Header: The Command Center

The page header is the control structure that makes everything else on the page interpretable. Without the header, the remaining bytes would be meaningless. Let's examine each component in detail:

Typical Page Header Fields (PostgreSQL-style 8KB page)
Field	Size	Purpose	Critical For
Page LSN	8 bytes	Log Sequence Number of last modification	Recovery (WAL), Checkpointing
Checksum	2-4 bytes	CRC or hash for data integrity verification	Detecting corruption, Silent data loss prevention
Flags	2 bytes	Page state bits (leaf/internal, full, torn)	Page type identification, Space management
Lower Pointer	2 bytes	Offset to start of free space (end of slot dir)	Slot directory boundary
Upper Pointer	2 bytes	Offset to end of free space (start of records)	Record region boundary
Special Pointer	2 bytes	Offset to special/index-specific data	Index pages (B-tree, etc.)
Page Version	2 bytes	Schema/format version for upgrade compatibility	Online schema evolution
Prune XID	4-8 bytes	Oldest transaction that may need to see dead tuples	MVCC, Vacuum operations

The LSN: Cornerstone of Recovery\n\nThe Page LSN (Log Sequence Number) deserves special attention. This 8-byte field records the LSN of the most recent WAL (Write-Ahead Log) record that modified this page. During crash recovery:\n\n1. Recovery scans the WAL from the checkpoint\n2. For each WAL record, it reads the target page\n3. It compares the WAL record's LSN to the page's LSN\n4. If WAL LSN > Page LSN, the modification is replayed (redo)\n5. If WAL LSN ≤ Page LSN, the page already has this change (skip)\n\nThis mechanism ensures idempotent redo—replaying the same WAL record multiple times yields the same result. The Page LSN makes this possible by indicating the page's current state in the modification timeline.

Checksum Importance

Checksums detect silent data corruption from failing disks, cosmic rays (bit flips), or firmware bugs. PostgreSQL enables checksums via initdb --data-checksums. MySQL/InnoDB has innodb_checksum_algorithm. Disabling checksums saves ~1-2% CPU but risks undetected corruption. Production systems should ALWAYS enable page checksums.

Lower and Upper Pointers: The Free Space Boundaries\n\nThese two offsets define the free space region:\n\n\n┌─────────────────────────────────────────────────────────┐\n│ Header │ Slot Dir... │ FREE SPACE │ Records... │\n│ (24B) │ │ │ │\n└────────┴──────────────────────┴────────────┴────────────┘\n ↑ ↑ ↑\n 0 LOWER UPPER 8191\n (100) (7800)\n │────── allocated ─────│── free ───│── allocated ──│\n\n\n- Lower points to the first byte after the slot directory\n- Upper points to the first byte of the record region\n- Free Space = Upper - Lower bytes\n\nWhen inserting a record:\n1. Check if (record size + slot entry size) ≤ (Upper - Lower)\n2. If yes: copy record starting at Upper, decrement Upper by record size\n3. Add slot entry at Lower, increment Lower by slot entry size

Page Types in Database Systems

Not all pages are created equal. Databases use different page types for different purposes, each with specialized structures optimized for their role:

Common Database Page Types
Page Type	Purpose	Special Characteristics	Examples
Heap Page	Store table row data	Slot directory, variable-length records, MVCC info	PostgreSQL heap, MySQL InnoDB data pages
Index Page (Leaf)	Store index entries pointing to data	Sorted entries, sibling pointers, high fanout	B-tree leaf nodes
Index Page (Internal)	Navigate index structure	Keys + child page pointers, no data payload	B-tree internal nodes
Overflow Page	Store large values that don't fit on main page	Continuation pointers, blob chunks	TOAST pages (PostgreSQL)
Free Space Map Page	Track free space across all pages	Bitmap or byte-per-page representation	FSM in PostgreSQL
Visibility Map Page	Track all-visible pages for index-only scans	2 bits per heap page	VM in PostgreSQL
Undo Page	Store before-images for transaction rollback	Chain of undo records, transaction linkage	Oracle undo, InnoDB undo logs
System Page	Database metadata (schema, statistics)	Catalog entries, system table rows	pg_class entries, InnoDB data dictionary

Heap Pages vs. Index Pages\n\nThe distinction between heap (data) pages and index pages is fundamental:\n\nHeap Pages:\n- Store actual row data with all column values\n- Records are not necessarily sorted\n- Support in-place updates (with MVCC caveats)\n- Primary concern: space utilization, hot updates\n\nIndex Pages:\n- Store keys and pointers (either to heap or to other index pages)\n- Entries are sorted by key value\n- Leaf pages linked for range scans\n- Primary concern: fanout, balance, split/merge overhead\n\nBoth page types share the basic header + slot directory + record structure, but their internal record formats and management algorithms differ significantly.

Page Type Identification

The page header's flags field typically encodes the page type. Recovery operations, consistency checks, and space management routines read this flag to apply type-appropriate logic. A corrupted type flag can cause catastrophic misinterpretation of page contents.

Page Size: A Critical Configuration Choice

Choosing the right page size is one of the most consequential decisions in database design. While most systems provide sensible defaults, understanding the tradeoffs enables informed tuning for specific workloads.

Advantages of Larger Pages (16KB-64KB)

•Higher sequential read throughput — Fewer I/O operations for full table scans
•Better compression ratios — More context for compression algorithms
•Reduced header overhead — Header bytes amortized over more data
•Fewer page splits in indexes — Larger fanout, shallower trees
•Efficient for large rows — Reduces overflow page chaining

Disadvantages of Larger Pages

•Higher read amplification — Point queries fetch unnecessary data
•More memory consumption — Buffer pool holds fewer pages
•Lock contention — More rows per page means more conflicts
•Longer recovery checkpoint — Larger pages take longer to write
•Wasted space on partial fill — More internal fragmentation

Workload-Specific Recommendations\n\n| Workload Type | Recommended Page Size | Rationale |\n|---------------|----------------------|-----------|\n| OLTP (many point queries) | 4KB - 8KB | Minimize read amplification, reduce lock contention |\n| OLAP (large scans) | 16KB - 32KB | Maximize sequential throughput, better compression |\n| Mixed workload | 8KB - 16KB | Balance between competing concerns |\n| Wide rows (many columns) | 16KB+ | Avoid excessive overflow pages |\n| High concurrency | Smaller pages | Reduce per-page lock conflicts |\n\nModern SSD Considerations\n\nSSDs have changed some traditional assumptions:\n- SSD internal page size is typically 4KB-16KB\n- Write amplification occurs when database page size doesn't align with SSD page size\n- NVMe SSDs can handle smaller random I/Os efficiently, reducing the penalty for smaller pages\n- Some databases (e.g., RocksDB, TiKV) use variable-size blocks rather than fixed pages

Changing Page Size

Most databases require reinitialization or dump/restore to change page size. PostgreSQL's page size (BLCKSZ) is set at compile time—changing it requires rebuilding PostgreSQL and reinitializing the cluster. Plan page size carefully during initial deployment; it's difficult to change later.

Free Space Management Within Pages

Managing free space within a page is a microcosm of the broader memory management problem. The database must track available space, allocate from it efficiently, and reclaim space from deleted or updated records.

States of Space on a Page\n\nSpace on a database page exists in several states:\n\n1. Allocated (Live) — Currently used by live records\n2. Allocated (Dead) — Used by deleted/obsolete records awaiting cleanup\n3. Free (Contiguous) — Between slot directory and record region\n4. Free (Fragmented) — Holes within the record region from deletions\n\nThe Fragmentation Problem\n\nConsider a page with three 1000-byte records, then the middle one is deleted:\n\n\n Before deletion: After deletion:\n ┌───────────────────┐ ┌───────────────────┐\n │ Header + Slots │ │ Header + Slots │\n ├───────────────────┤ ├───────────────────┤\n │ Free: 5000 bytes │ │ Free: 5000 bytes │\n ├───────────────────┤ ├───────────────────┤\n │ Record C (1000B) │ │ Record C (1000B) │\n │ Record B (1000B) │ │ ████ HOLE 1000B ██│\n │ Record A (1000B) │ │ Record A (1000B) │\n └───────────────────┘ └───────────────────┘\n\n\nAfter deletion:\n- Contiguous free space: 5000 bytes\n- Total free space: 6000 bytes (including the 1000-byte hole)\n- Usable for a 5500-byte record? No! The hole fragments the space.

Free Space Reclamation Strategies

•Immediate Compaction — Reorganize records upon every deletion. Expensive but keeps space contiguous. Rarely used in practice.
•Deferred Compaction (Page Pruning) — Compact only when needed (e.g., when insertion requires it). PostgreSQL does this opportunistically.
•Background Vacuum — Separate process periodically reclaims dead space. Essential for MVCC systems where dead tuples accumulate.
•HOT Updates — Reuse space from the same tuple's prior version without updating indexes. PostgreSQL's Heap-Only Tuples optimization.
•Fill Factor — Reserve intentional free space for updates to avoid overflow. Typically configurable per table (e.g., 70-90%).

The Vacuum Imperative

In MVCC systems (PostgreSQL, MySQL InnoDB), dead tuples accumulate as transactions update or delete rows. Without regular vacuum/purge, pages become bloated with dead data, causing table bloat, slower scans, and eventual transaction ID wraparound issues. Monitor bloat and ensure vacuum keeps pace with modification rates.

Core Page Operations

Every database operation ultimately translates to a sequence of page-level operations. Understanding these primitives reveals why certain operations are expensive and how the database maintains consistency.

Fundamental Page Operations
Operation	Steps	Complexity	Logging Required
Insert Record	Find page with space\n2. Check free space sufficient\n3. Copy record to Upper - recordSize\n4. Add slot entry at Lower\n5. Update Lower, Upper pointers	O(1) for the page operation itself	Full-page image or redo log
Delete Record (Mark)	Locate record via slot\n2. Mark slot as dead/deleted\n3. Optionally mark record in place	O(1) lookup + constant work	Slot modification log
Delete Record (Physical)	Mark slot as unused\n2. Shift subsequent records down\n3. Update all affected slot offsets	O(n) where n = records after deleted one	Typically deferred to vacuum
Update Record (In-Place)	Locate via slot\n2. Overwrite if new size ≤ old size\n3. Else: delete + insert (possibly new page)	O(1) if sizes match, else higher	Before/after images or redo
Page Compaction	Copy all live records to temp\n2. Rewrite contiguously from page bottom\n3. Rebuild slot directory\n4. Update Lower/Upper	O(n) where n = live records	Full-page image typically

The Insert Path in Detail\n\n\nINSERT INTO orders (id, customer, amount) VALUES (42, 'Alice', 100.00);\n\n┌──────────────────────────────────────────────────────┐\n│ 1. Executor requests: insert this tuple on table heap │\n└──────────────────────────────────────────────────────┘\n │\n ▼\n┌──────────────────────────────────────────────────────┐\n│ 2. Buffer manager: find page with sufficient space │\n│ - Consult Free Space Map (FSM) │\n│ - FSM says page 47 has ~2KB free │\n│ - Request page 47 into buffer pool │\n└──────────────────────────────────────────────────────┘\n │\n ▼\n┌──────────────────────────────────────────────────────┐\n│ 3. Lock page 47 (exclusive or content lock) │\n│ 4. Verify free space: Upper - Lower ≥ tuple + slot │\n│ 5. Generate WAL record (before modifying page) │\n│ 6. Write tuple at offset = Upper - tupleSize │\n│ 7. Add slot entry: slots[nslots++] = new offset │\n│ 8. Update header: Upper -= tupleSize, Lower += 4 │\n│ 9. Mark page dirty in buffer pool │\n│ 10. Release page lock │\n└──────────────────────────────────────────────────────┘\n\n\nNote how the page insert involves multiple subsystems: executor, buffer manager, lock manager, WAL writer, and the page manipulation itself. Each step is carefully ordered to ensure recoverability and concurrency safety.

Summary: Page Structure Fundamentals

We've explored the fundamental building block of database storage—the page. Let's consolidate the key takeaways:

Key Takeaways

•Pages are the atomic unit of I/O — All disk reads and writes transfer complete pages, making page-level organization critical to performance.
•The three-region model — Header, slot directory growing down, records growing up, with free space in between.
•Page headers enable recovery — The LSN field connects pages to the WAL, enabling crash recovery and idempotent redo.
•Different page types serve different purposes — Heap pages for data, index pages for navigation, overflow pages for large values.
•Page size involves tradeoffs — Larger pages favor scans and compression; smaller pages favor point queries and concurrency.
•Free space management is complex — Fragmentation, compaction, vacuum, and fill factors all play roles in space efficiency.
•Every operation is a page operation — Understanding page-level mechanics reveals the true cost of database operations.

What's Next:\n\nNow that you understand the overall page structure, we'll dive deeper into the slot directory—the indirection layer that enables stable record addressing, efficient updates, and garbage collection. The slot directory is where logical record IDs meet physical byte offsets.

Page Complete

You now understand the fundamental structure of database pages—the atomic containers that make all database operations possible. This knowledge is essential for understanding storage engineering, query optimization, and recovery mechanisms. Next, we'll explore the slot directory in detail.

1 / 5

Loading learning content...

Database Management SystemsPage Layout

Page Layout and Organization

LevelIntermediate

Duration60 mins

TopicPage Layout

1 / 5

Page Structure

The Fundamental Unit of Database Storage

What You Will Learn

Why Pages Exist

Hardware I/O Characteristics Driving Page Design
Characteristic	Magnetic Disk (HDD)	Solid State Drive (SSD)	Implication for Pages
Minimum Transfer Unit	512B - 4KB sector	4KB - 16KB page	Pages must align with device block boundaries
Random Read Latency	5-10ms (seek + rotation)	50-200μs	Larger pages amortize fixed access overhead
Sequential Bandwidth	100-200 MB/s	500-7000 MB/s	Large sequential page reads are efficient
Write Amplification	N/A (rewrite in place)	Significant (erase blocks)	Page size affects SSD wear and performance
Parallelism	Single head per platter	Multiple channels	Page I/O can be parallelized on SSDs

The Page Size Tradeoff

Page Anatomy: The Complete Picture

Converting Mermaid diagram...

Core Page Components

•Header Region — Fixed-size metadata at the page start containing identification, versioning, and management information. Every page operation reads the header first.
•Slot Directory — Variable-size array of pointers that map logical slot numbers to physical byte offsets within the page. Grows downward from the header.
•Free Space — Unallocated bytes between the slot directory and record region. This is the available capacity for new records and expanded slot entries.
•Record Region — Actual tuple data stored contiguously (or nearly so) from the page end, growing upward toward the center.

Page Header: The Command Center

The page header is the control structure that makes everything else on the page interpretable. Without the header, the remaining bytes would be meaningless. Let's examine each component in detail:

Typical Page Header Fields (PostgreSQL-style 8KB page)
Field	Size	Purpose	Critical For
Page LSN	8 bytes	Log Sequence Number of last modification	Recovery (WAL), Checkpointing
Checksum	2-4 bytes	CRC or hash for data integrity verification	Detecting corruption, Silent data loss prevention
Flags	2 bytes	Page state bits (leaf/internal, full, torn)	Page type identification, Space management
Lower Pointer	2 bytes	Offset to start of free space (end of slot dir)	Slot directory boundary
Upper Pointer	2 bytes	Offset to end of free space (start of records)	Record region boundary
Special Pointer	2 bytes	Offset to special/index-specific data	Index pages (B-tree, etc.)
Page Version	2 bytes	Schema/format version for upgrade compatibility	Online schema evolution
Prune XID	4-8 bytes	Oldest transaction that may need to see dead tuples	MVCC, Vacuum operations

Checksum Importance

Page Types in Database Systems

Not all pages are created equal. Databases use different page types for different purposes, each with specialized structures optimized for their role:

Common Database Page Types
Page Type	Purpose	Special Characteristics	Examples
Heap Page	Store table row data	Slot directory, variable-length records, MVCC info	PostgreSQL heap, MySQL InnoDB data pages
Index Page (Leaf)	Store index entries pointing to data	Sorted entries, sibling pointers, high fanout	B-tree leaf nodes
Index Page (Internal)	Navigate index structure	Keys + child page pointers, no data payload	B-tree internal nodes
Overflow Page	Store large values that don't fit on main page	Continuation pointers, blob chunks	TOAST pages (PostgreSQL)
Free Space Map Page	Track free space across all pages	Bitmap or byte-per-page representation	FSM in PostgreSQL
Visibility Map Page	Track all-visible pages for index-only scans	2 bits per heap page	VM in PostgreSQL
Undo Page	Store before-images for transaction rollback	Chain of undo records, transaction linkage	Oracle undo, InnoDB undo logs
System Page	Database metadata (schema, statistics)	Catalog entries, system table rows	pg_class entries, InnoDB data dictionary

Page Type Identification

Page Size: A Critical Configuration Choice

Advantages of Larger Pages (16KB-64KB)

•Higher sequential read throughput — Fewer I/O operations for full table scans
•Better compression ratios — More context for compression algorithms
•Reduced header overhead — Header bytes amortized over more data
•Fewer page splits in indexes — Larger fanout, shallower trees
•Efficient for large rows — Reduces overflow page chaining

Disadvantages of Larger Pages

•Higher read amplification — Point queries fetch unnecessary data
•More memory consumption — Buffer pool holds fewer pages
•Lock contention — More rows per page means more conflicts
•Longer recovery checkpoint — Larger pages take longer to write
•Wasted space on partial fill — More internal fragmentation

Changing Page Size

Free Space Management Within Pages

Free Space Reclamation Strategies

•Immediate Compaction — Reorganize records upon every deletion. Expensive but keeps space contiguous. Rarely used in practice.
•Deferred Compaction (Page Pruning) — Compact only when needed (e.g., when insertion requires it). PostgreSQL does this opportunistically.
•Background Vacuum — Separate process periodically reclaims dead space. Essential for MVCC systems where dead tuples accumulate.
•HOT Updates — Reuse space from the same tuple's prior version without updating indexes. PostgreSQL's Heap-Only Tuples optimization.
•Fill Factor — Reserve intentional free space for updates to avoid overflow. Typically configurable per table (e.g., 70-90%).

The Vacuum Imperative

Core Page Operations

Fundamental Page Operations
Operation	Steps	Complexity	Logging Required
Insert Record	Find page with space\n2. Check free space sufficient\n3. Copy record to Upper - recordSize\n4. Add slot entry at Lower\n5. Update Lower, Upper pointers	O(1) for the page operation itself	Full-page image or redo log
Delete Record (Mark)	Locate record via slot\n2. Mark slot as dead/deleted\n3. Optionally mark record in place	O(1) lookup + constant work	Slot modification log
Delete Record (Physical)	Mark slot as unused\n2. Shift subsequent records down\n3. Update all affected slot offsets	O(n) where n = records after deleted one	Typically deferred to vacuum
Update Record (In-Place)	Locate via slot\n2. Overwrite if new size ≤ old size\n3. Else: delete + insert (possibly new page)	O(1) if sizes match, else higher	Before/after images or redo
Page Compaction	Copy all live records to temp\n2. Rewrite contiguously from page bottom\n3. Rebuild slot directory\n4. Update Lower/Upper	O(n) where n = live records	Full-page image typically

Summary: Page Structure Fundamentals

We've explored the fundamental building block of database storage—the page. Let's consolidate the key takeaways:

Key Takeaways

•Pages are the atomic unit of I/O — All disk reads and writes transfer complete pages, making page-level organization critical to performance.
•The three-region model — Header, slot directory growing down, records growing up, with free space in between.
•Page headers enable recovery — The LSN field connects pages to the WAL, enabling crash recovery and idempotent redo.
•Different page types serve different purposes — Heap pages for data, index pages for navigation, overflow pages for large values.
•Page size involves tradeoffs — Larger pages favor scans and compression; smaller pages favor point queries and concurrency.
•Free space management is complex — Fragmentation, compaction, vacuum, and fill factors all play roles in space efficiency.
•Every operation is a page operation — Understanding page-level mechanics reveals the true cost of database operations.

Page Complete

1 / 5