Page Layout - Learning Module

Loading content...

0/252

Record Layout

Inside the Record: Bytes That Become Data

When you execute SELECT name, salary FROM employees WHERE id = 42, the database ultimately reads a sequence of bytes from a page and transforms them into the structured result you see. But how is that transformation defined? How does the database know where the name field ends and salary begins? How are NULL values represented without consuming space for absent data?

Record layout—the internal structure of a single row stored on disk—is where the abstraction of tables and columns meets the reality of bytes and offsets. Understanding record layout is essential for anyone optimizing storage, debugging data corruption, or implementing database internals.

What You Will Learn

By the end of this page, you will understand the complete anatomy of a database record, including record headers, NULL bitmaps, fixed-length vs variable-length field encoding, alignment requirements, and how different database systems approach record layout. You'll be able to mentally decode raw bytes into structured tuples.

Record Layout: The Big Picture

A database record (also called a tuple or row) must encode several pieces of information:

Record metadata — Length, version, transaction information
NULL indicators — Which columns have no value
Fixed-length field values — Integers, floats, dates, fixed-size strings
Variable-length field values — VARCHARs, BLOBs, JSON
Padding — Alignment bytes for CPU efficiency

The general structure of a record follows this pattern:

Converting Mermaid diagram...

Why This Ordering?

The fixed-then-variable ordering is deliberate:

Fixed fields have predictable offsets — Field n is always at offset HEADER_SIZE + NULL_BITMAP_SIZE + sum(sizes of fields 0..n-1)
Variable fields require iteration — Must traverse length-value pairs to find the nth variable field
Fixed fields accessed more frequently — Primary keys, foreign keys, small integers are often fixed-length
Alignment is easier — Fixed fields can be aligned naturally; variable fields just need length field alignment

No Universal Standard

Every database system has its own record format. PostgreSQL's HeapTupleHeader differs from InnoDB's COMPACT row format, which differs from SQL Server's record structure. The concepts are similar, but byte-level details vary. We'll cover common patterns and then examine specific implementations.

Record Headers: The Control Block

The record header contains metadata needed to interpret the rest of the record. Different systems include different fields, but common elements include:

Common Record Header Fields
Field	Typical Size	Purpose	Used By
Record Length	2-4 bytes	Total size of record including header	All systems (sometimes implicit from slot)
Info Mask / Flags	2-4 bytes	Bit flags for NULL, TOAST, HOT, visibility	PostgreSQL: t_infomask, t_infomask2
Transaction ID (xmin)	4-8 bytes	ID of transaction that created this version	MVCC: PostgreSQL, Oracle
Transaction ID (xmax)	4-8 bytes	ID of transaction that deleted/updated	MVCC: PostgreSQL
Command ID (cmin/cmax)	4 bytes	Position within transaction for visibility	PostgreSQL tuple visibility
Tuple Offset (t_hoff)	1 byte	Offset from record start to actual data	PostgreSQL (alignment)
Next Record Pointer	2-6 bytes	Offset to next record in page (linked list)	InnoDB COMPACT format
Row Format Version	1-2 bytes	Schema version for backward compatibility	Systems supporting online DDL

PostgreSQL HeapTupleHeader

PostgreSQL's tuple header is well-documented and illustrative:

typedef struct HeapTupleHeaderData {
    union {
        HeapTupleFields t_heap;   // For heap tuples
        DatumTupleFields t_datum; // For composite type values
    } t_choice;

    ItemPointerData t_ctid;       // 6 bytes: Current TID of this or newer version

    uint16 t_infomask2;           // Various flags + number of attributes
    uint16 t_infomask;            // Large set of flag bits
    uint8  t_hoff;                // Offset to user data (aligned)

    // Bits8 t_bits[]; follows   // NULL bitmap if HEAP_HASNULL
} HeapTupleHeaderData;

// HeapTupleFields portion:
typedef struct HeapTupleFields {
    TransactionId t_xmin;    // 4 bytes: Inserting transaction
    TransactionId t_xmax;    // 4 bytes: Deleting/updating transaction
    union {
        CommandId t_cid;     // 4 bytes: Command ID
        TransactionId t_xvac;// Used by vacuum
    } t_field3;
} HeapTupleFields;

Total header size: 23 bytes (plus NULL bitmap if needed), rounded up to alignment boundary.

Header Overhead

A PostgreSQL tuple header consumes 23-27 bytes before any user data. For a table with many small rows (e.g., a junction table with two 4-byte integer foreign keys), the header may be larger than the payload! This overhead motivates denormalization and batch inserts in some scenarios.

The NULL Bitmap: Absence Without Space

NULL values present a design challenge. They must be distinguishable from any actual value (including empty strings or zeros), yet ideally shouldn't consume storage when absent.

The NULL Bitmap Solution

Most databases use a bitmap—one bit per column—to indicate which fields are NULL:

Table: users (id INT, name VARCHAR, email VARCHAR, phone VARCHAR)

Record for user with no phone:
  id=1, name='Alice', email='alice@example.com', phone=NULL

NULL bitmap: 0 0 0 1 (bit 3 = 1 means column 3 is NULL)
             │ │ │ └── phone: NULL
             │ │ └── email: present
             │ └── name: present
             └── id: present

Serialized (as byte): 0b00001000 = 0x08

Bitmap Encoding Variations

Convention	0 Means	1 Means	Used By
PostgreSQL	NULL	NOT NULL (present)	HeapTupleHeader t_bits
InnoDB	NOT NULL	NULL	Variable-length columns
SQL Server	NOT NULL	NULL	NULL bitmap in record

Size Calculation

NULL bitmap size = ⌈number_of_columns / 8⌉ bytes

8 columns: 1 byte
9 columns: 2 bytes
64 columns: 8 bytes
100 columns: 13 bytes

When No Bitmap?

Some systems optimize away the NULL bitmap:

If the table has no NULLable columns, no bitmap needed
PostgreSQL sets HEAP_HASNULL flag only when NULLs exist
Reduces header size for NOT NULL constrained tables

NOT NULL for Storage Efficiency

Declaring columns as NOT NULL where appropriate has a dual benefit: data integrity AND potential storage savings. In PostgreSQL, a table where ALL columns are NOT NULL avoids the NULL bitmap entirely. For tables with many rows, this can save significant space.

NULL Field Storage

NULL columns consume no space in the data portion—only the bit in the bitmap. This has implications:

Table: wide_table (col1 TEXT, col2 TEXT, ..., col100 TEXT)

Row with col1='x', all others NULL:
  Header: ~23 bytes
  NULL bitmap: 13 bytes (100 columns)
  Data: 2 bytes (1-byte length + 1-byte 'x')
  Total: ~38 bytes

Row with all 100 columns filled with 'x':
  Header: ~23 bytes
  NULL bitmap: 0 bytes (no NULLs, if system optimizes)
  Data: 200 bytes (100 × 2 bytes each)
  Total: ~223 bytes

The sparse row is dramatically smaller because NULL columns are represented only in the bitmap.

Fixed-Length Fields: Predictable Access

Fixed-length fields store values in a predetermined number of bytes, enabling direct offset calculation for any field.

Common Fixed-Length SQL Types
SQL Type	Size	Range/Precision	Encoding
BOOLEAN	1 byte	true/false/NULL	0x00=false, 0x01=true (varies)
TINYINT	1 byte	-128 to 127 (signed)	Two's complement
SMALLINT	2 bytes	-32,768 to 32,767	Two's complement, endianness varies
INTEGER / INT	4 bytes	±2.1 billion	Two's complement
BIGINT	8 bytes	±9.2 quintillion	Two's complement
REAL / FLOAT4	4 bytes	~7 significant digits	IEEE 754 single-precision
DOUBLE / FLOAT8	8 bytes	~15 significant digits	IEEE 754 double-precision
DECIMAL(p,s) fixed	Varies	Exact decimal, p digits	BCD or integer + scale
DATE	4 bytes	Days since epoch	Integer (epoch varies)
TIMESTAMP	8 bytes	Microseconds since epoch	Integer or IEEE 754
UUID	16 bytes	128-bit identifier	Raw bytes or two int64s
CHAR(n)	n bytes	Fixed string, space-padded	Encoding (UTF-8, etc.) + padding
IPv4 Address (INET)	4 bytes	32-bit address	Network byte order

Direct Offset Calculation

For fixed-length fields (ignoring NULLs and alignment):

Table schema: CREATE TABLE t (
    a INT,      -- 4 bytes, offset 0
    b BIGINT,   -- 8 bytes, offset 4
    c SMALLINT, -- 2 bytes, offset 12
    d INT       -- 4 bytes, offset 14
);

To access field 'c' in a record at address P:
  offset_c = HEADER_SIZE + 0 + 4 + 8 = HEADER_SIZE + 12
  value_c = *(int16*)(P + offset_c)

This O(1) access is why fixed-length fields are positioned first in most record layouts.

Endianness Considerations

Multi-byte integers can be stored as:

Little-endian — Least significant byte first (x86, AMD64)
Big-endian — Most significant byte first (network byte order, older SPARC)

Databases typically use native endianness for performance, meaning data files aren't portable across architectures without conversion.

Value: 0x12345678

Little-endian: 78 56 34 12
Big-endian:    12 34 56 78

PostgreSQL uses native endianness (usually little-endian on Linux/x86).
Network protocols often use big-endian.
Portable formats (like Parquet) specify endianness explicitly.

CHAR vs VARCHAR

CHAR(n) is fixed-length: always n bytes, space-padded for shorter values. VARCHAR(n) is variable-length: stores actual length + characters. For columns that are always or nearly always the same length (like ISO country codes, phone numbers with consistent format), CHAR may be more efficient due to simpler offset calculation.

Memory Alignment: CPU Efficiency at the Byte Level

Modern CPUs access memory most efficiently when data is aligned to its natural boundary—a 4-byte integer at an address divisible by 4, an 8-byte double at an address divisible by 8.

Misalignment Penalties

Accessing misaligned data causes:

x86/AMD64: Performance penalty (2-3x slower, or much worse crossing cache lines)
ARM (older): Hardware exception / crash
ARM (newer): Performance penalty

Databases must either:

Align all fields — Insert padding bytes between fields
Accept misalignment — Simpler layout, potential performance cost
Copy to aligned buffer — Overhead of memory copy

Alignment Example

Schema: (a SMALLINT, b INT, c SMALLINT, d BIGINT)

Without alignment:
  Offset 0: a (2 bytes)
  Offset 2: b (4 bytes) ← MISALIGNED (needs offset 4)
  Offset 6: c (2 bytes)
  Offset 8: d (8 bytes) ← aligned by luck
  Total: 16 bytes

With alignment:
  Offset 0: a (2 bytes)
  Offset 2: padding (2 bytes)
  Offset 4: b (4 bytes) ← aligned
  Offset 8: c (2 bytes)
  Offset 10: padding (6 bytes)
  Offset 16: d (8 bytes) ← aligned
  Total: 24 bytes (50% overhead!)

Reordered for minimal padding:
  Offset 0: d (8 bytes) ← 8-byte aligned
  Offset 8: b (4 bytes) ← 4-byte aligned
  Offset 12: a (2 bytes) ← 2-byte aligned
  Offset 14: c (2 bytes) ← 2-byte aligned
  Total: 16 bytes (no padding!)

PostgreSQL Column Ordering

PostgreSQL stores columns in their declared order and adds alignment padding. For OLTP tables with many rows, reordering columns by decreasing alignment requirement (8-byte columns first, then 4-byte, then 2-byte, then 1-byte) can save significant space. Some ORMs and migration tools support automatic column reordering for this purpose.

PostgreSQL Alignment Rules

Type Alignment	Types	Aligned To
char (c)	boolean, char(1), "char"	1 byte (none)
short (s)	int2, smallint	2 bytes
int (i)	int4, integer, date, float4	4 bytes
double (d)	int8, bigint, timestamp, float8	8 bytes

The t_hoff field in PostgreSQL's tuple header indicates where user data begins (after header and NULL bitmap, aligned to 8 bytes for double types).

Complete Fixed-Length Record Example

Let's trace through a complete record with only fixed-length fields:

Schema:

CREATE TABLE measurements (
    id INTEGER NOT NULL,           -- 4 bytes, alignment 4
    timestamp TIMESTAMP NOT NULL,  -- 8 bytes, alignment 8
    sensor_id SMALLINT NOT NULL,   -- 2 bytes, alignment 2
    temperature REAL,              -- 4 bytes, alignment 4 (nullable)
    humidity REAL                  -- 4 bytes, alignment 4 (nullable)
);

Record Data:

id = 12345
timestamp = 2024-01-15 10:30:00 (encoded as microseconds)
sensor_id = 42
temperature = 23.5
humidity = NULL

Byte-Level Layout (PostgreSQL-style):

Offset  Field                 Value (hex)        Notes
------  --------------------  -----------------  ----------------------
0       t_xmin               00 00 10 5A        Transaction 4186 inserted
4       t_xmax               00 00 00 00        Not deleted
8       t_cid                00 00 00 00        Command 0
12      t_ctid.block         00 00 00 2F        Page 47
16      t_ctid.offset        00 03              Slot 3
18      t_infomask2          00 05              5 columns
20      t_infomask           08 02              HASNULL + ...
22      t_hoff               18                 24 = data offset (aligned)
23      (padding)            00                 Align to 24
24      t_bits               10                 NULL bitmap: 0b00010000
                                                  bit 4 = humidity NULL
25-26   (padding)            00 00              Align to 4 for first field
27      (padding)            00                 Align to 4

28      id                   00 00 30 39        12345 (little-endian)
32      (padding)            00 00 00 00        Align to 8 for timestamp
36-39   (padding)            00 00 00 00        Continue padding
40      timestamp            00 05 FA ... (8B)  Encoded timestamp
48      sensor_id            00 2A              42
50      (padding)            00 00              Align to 4 for temperature
52      temperature          41 BC 00 00        23.5f (IEEE 754)
56      humidity             (not stored)       NULL - no bytes
------  --------------------  -----------------
Total: 56 bytes

Note: The humidity field consumes NO space because it's NULL—only recorded in the NULL bitmap.

Alignment Overhead Visible

This 56-byte record has 8 bytes of padding (14% overhead) due to alignment. The actual user data is only 22 bytes (4+8+2+4+4 minus NULL). Header + NULL bitmap adds 25 bytes. Better column ordering could reduce padding.

Record Layout in Major Database Systems

Each database system has its own record format optimized for its specific needs:

PostgreSQL Heap Tuple Format

┌─────────────────────────────────────────────────┐
│ HeapTupleHeaderData (23 bytes base)             │
│   t_xmin, t_xmax, t_cid, t_ctid                 │
│   t_infomask, t_infomask2, t_hoff               │
├─────────────────────────────────────────────────┤
│ NULL Bitmap (if HEAP_HASNULL flag set)          │
│   Ceil(natts/8) bytes, bit=1 means NOT NULL     │
├─────────────────────────────────────────────────┤
│ OID (4 bytes, deprecated, if HEAP_HASOID)       │
├─────────────────────────────────────────────────┤
│ Padding to MAXALIGN (8-byte boundary)           │
├─────────────────────────────────────────────────┤
│ User Data: columns in declared order            │
│   - Each column aligned per type's typalign     │
│   - varlena types: 1-4 byte length + data       │
│   - NULL columns: no storage (check bitmap)     │
└─────────────────────────────────────────────────┘

Key Features:

MVCC info (xmin/xmax) in every tuple
NULL bitmap only when needed
TOAST for values > ~2KB (stored externally)
HOT chain support via t_ctid

Format Evolution

Each system has evolved its format over time. InnoDB's COMPACT replaced REDUNDANT; DYNAMIC and COMPRESSED formats came later. SQL Server's formats differ between heap and clustered tables. PostgreSQL's format has remained relatively stable, but TOAST thresholds and compression have evolved.

Summary: Record Layout

Record layout is where logical schema meets physical bytes. Let's consolidate the key takeaways:

Key Takeaways

•Records consist of header + NULL bitmap + data — The header contains metadata, the bitmap tracks NULLs, and data holds column values.
•Fixed-length fields have predictable offsets — Direct O(1) access without parsing; positioned first in most formats.
•NULL columns consume no data space — Only 1 bit in the bitmap; a major storage advantage for sparse data.
•Alignment improves CPU access speed — Padding bytes may be inserted; column ordering affects overhead.
•Each database has its own format — PostgreSQL, InnoDB, SQL Server all differ in details while sharing concepts.
•Header overhead is significant for small rows — 23+ bytes in PostgreSQL; consider denormalization for very small records.
•MVCC info lives in record headers — Transaction IDs enable visibility determination without external lookups.

What's Next:

We've covered fixed-length fields, but real databases must handle VARCHARs, TEXT, BLOBs, and other variable-length data. The next page explores variable-length record encoding—the techniques that enable efficient storage and access of dynamically-sized values.

Page Complete

You now understand how records are structured at the byte level—headers, NULL bitmaps, fixed fields, and alignment. This knowledge is essential for storage optimization, debugging data issues, and understanding database performance characteristics. Next, we'll tackle the complexity of variable-length records.