Loading content...
When you execute SELECT name, salary FROM employees WHERE id = 42, the database ultimately reads a sequence of bytes from a page and transforms them into the structured result you see. But how is that transformation defined? How does the database know where the name field ends and salary begins? How are NULL values represented without consuming space for absent data?
Record layout—the internal structure of a single row stored on disk—is where the abstraction of tables and columns meets the reality of bytes and offsets. Understanding record layout is essential for anyone optimizing storage, debugging data corruption, or implementing database internals.
By the end of this page, you will understand the complete anatomy of a database record, including record headers, NULL bitmaps, fixed-length vs variable-length field encoding, alignment requirements, and how different database systems approach record layout. You'll be able to mentally decode raw bytes into structured tuples.
A database record (also called a tuple or row) must encode several pieces of information:
The general structure of a record follows this pattern:
Why This Ordering?
The fixed-then-variable ordering is deliberate:
n is always at offset HEADER_SIZE + NULL_BITMAP_SIZE + sum(sizes of fields 0..n-1)Every database system has its own record format. PostgreSQL's HeapTupleHeader differs from InnoDB's COMPACT row format, which differs from SQL Server's record structure. The concepts are similar, but byte-level details vary. We'll cover common patterns and then examine specific implementations.
The record header contains metadata needed to interpret the rest of the record. Different systems include different fields, but common elements include:
| Field | Typical Size | Purpose | Used By |
|---|---|---|---|
| Record Length | 2-4 bytes | Total size of record including header | All systems (sometimes implicit from slot) |
| Info Mask / Flags | 2-4 bytes | Bit flags for NULL, TOAST, HOT, visibility | PostgreSQL: t_infomask, t_infomask2 |
| Transaction ID (xmin) | 4-8 bytes | ID of transaction that created this version | MVCC: PostgreSQL, Oracle |
| Transaction ID (xmax) | 4-8 bytes | ID of transaction that deleted/updated | MVCC: PostgreSQL |
| Command ID (cmin/cmax) | 4 bytes | Position within transaction for visibility | PostgreSQL tuple visibility |
| Tuple Offset (t_hoff) | 1 byte | Offset from record start to actual data | PostgreSQL (alignment) |
| Next Record Pointer | 2-6 bytes | Offset to next record in page (linked list) | InnoDB COMPACT format |
| Row Format Version | 1-2 bytes | Schema version for backward compatibility | Systems supporting online DDL |
PostgreSQL HeapTupleHeader
PostgreSQL's tuple header is well-documented and illustrative:
typedef struct HeapTupleHeaderData {
union {
HeapTupleFields t_heap; // For heap tuples
DatumTupleFields t_datum; // For composite type values
} t_choice;
ItemPointerData t_ctid; // 6 bytes: Current TID of this or newer version
uint16 t_infomask2; // Various flags + number of attributes
uint16 t_infomask; // Large set of flag bits
uint8 t_hoff; // Offset to user data (aligned)
// Bits8 t_bits[]; follows // NULL bitmap if HEAP_HASNULL
} HeapTupleHeaderData;
// HeapTupleFields portion:
typedef struct HeapTupleFields {
TransactionId t_xmin; // 4 bytes: Inserting transaction
TransactionId t_xmax; // 4 bytes: Deleting/updating transaction
union {
CommandId t_cid; // 4 bytes: Command ID
TransactionId t_xvac;// Used by vacuum
} t_field3;
} HeapTupleFields;
Total header size: 23 bytes (plus NULL bitmap if needed), rounded up to alignment boundary.
A PostgreSQL tuple header consumes 23-27 bytes before any user data. For a table with many small rows (e.g., a junction table with two 4-byte integer foreign keys), the header may be larger than the payload! This overhead motivates denormalization and batch inserts in some scenarios.
NULL values present a design challenge. They must be distinguishable from any actual value (including empty strings or zeros), yet ideally shouldn't consume storage when absent.
The NULL Bitmap Solution
Most databases use a bitmap—one bit per column—to indicate which fields are NULL:
Table: users (id INT, name VARCHAR, email VARCHAR, phone VARCHAR)
Record for user with no phone:
id=1, name='Alice', email='alice@example.com', phone=NULL
NULL bitmap: 0 0 0 1 (bit 3 = 1 means column 3 is NULL)
│ │ │ └── phone: NULL
│ │ └── email: present
│ └── name: present
└── id: present
Serialized (as byte): 0b00001000 = 0x08
Bitmap Encoding Variations
| Convention | 0 Means | 1 Means | Used By |
|---|---|---|---|
| PostgreSQL | NULL | NOT NULL (present) | HeapTupleHeader t_bits |
| InnoDB | NOT NULL | NULL | Variable-length columns |
| SQL Server | NOT NULL | NULL | NULL bitmap in record |
Size Calculation
NULL bitmap size = ⌈number_of_columns / 8⌉ bytes
8 columns: 1 byte
9 columns: 2 bytes
64 columns: 8 bytes
100 columns: 13 bytes
When No Bitmap?
Some systems optimize away the NULL bitmap:
Declaring columns as NOT NULL where appropriate has a dual benefit: data integrity AND potential storage savings. In PostgreSQL, a table where ALL columns are NOT NULL avoids the NULL bitmap entirely. For tables with many rows, this can save significant space.
NULL Field Storage
NULL columns consume no space in the data portion—only the bit in the bitmap. This has implications:
Table: wide_table (col1 TEXT, col2 TEXT, ..., col100 TEXT)
Row with col1='x', all others NULL:
Header: ~23 bytes
NULL bitmap: 13 bytes (100 columns)
Data: 2 bytes (1-byte length + 1-byte 'x')
Total: ~38 bytes
Row with all 100 columns filled with 'x':
Header: ~23 bytes
NULL bitmap: 0 bytes (no NULLs, if system optimizes)
Data: 200 bytes (100 × 2 bytes each)
Total: ~223 bytes
The sparse row is dramatically smaller because NULL columns are represented only in the bitmap.
Fixed-length fields store values in a predetermined number of bytes, enabling direct offset calculation for any field.
| SQL Type | Size | Range/Precision | Encoding |
|---|---|---|---|
| BOOLEAN | 1 byte | true/false/NULL | 0x00=false, 0x01=true (varies) |
| TINYINT | 1 byte | -128 to 127 (signed) | Two's complement |
| SMALLINT | 2 bytes | -32,768 to 32,767 | Two's complement, endianness varies |
| INTEGER / INT | 4 bytes | ±2.1 billion | Two's complement |
| BIGINT | 8 bytes | ±9.2 quintillion | Two's complement |
| REAL / FLOAT4 | 4 bytes | ~7 significant digits | IEEE 754 single-precision |
| DOUBLE / FLOAT8 | 8 bytes | ~15 significant digits | IEEE 754 double-precision |
| DECIMAL(p,s) fixed | Varies | Exact decimal, p digits | BCD or integer + scale |
| DATE | 4 bytes | Days since epoch | Integer (epoch varies) |
| TIMESTAMP | 8 bytes | Microseconds since epoch | Integer or IEEE 754 |
| UUID | 16 bytes | 128-bit identifier | Raw bytes or two int64s |
| CHAR(n) | n bytes | Fixed string, space-padded | Encoding (UTF-8, etc.) + padding |
| IPv4 Address (INET) | 4 bytes | 32-bit address | Network byte order |
Direct Offset Calculation
For fixed-length fields (ignoring NULLs and alignment):
Table schema: CREATE TABLE t (
a INT, -- 4 bytes, offset 0
b BIGINT, -- 8 bytes, offset 4
c SMALLINT, -- 2 bytes, offset 12
d INT -- 4 bytes, offset 14
);
To access field 'c' in a record at address P:
offset_c = HEADER_SIZE + 0 + 4 + 8 = HEADER_SIZE + 12
value_c = *(int16*)(P + offset_c)
This O(1) access is why fixed-length fields are positioned first in most record layouts.
Endianness Considerations
Multi-byte integers can be stored as:
Databases typically use native endianness for performance, meaning data files aren't portable across architectures without conversion.
Value: 0x12345678
Little-endian: 78 56 34 12
Big-endian: 12 34 56 78
PostgreSQL uses native endianness (usually little-endian on Linux/x86).
Network protocols often use big-endian.
Portable formats (like Parquet) specify endianness explicitly.
CHAR(n) is fixed-length: always n bytes, space-padded for shorter values. VARCHAR(n) is variable-length: stores actual length + characters. For columns that are always or nearly always the same length (like ISO country codes, phone numbers with consistent format), CHAR may be more efficient due to simpler offset calculation.
Modern CPUs access memory most efficiently when data is aligned to its natural boundary—a 4-byte integer at an address divisible by 4, an 8-byte double at an address divisible by 8.
Misalignment Penalties
Accessing misaligned data causes:
Databases must either:
Alignment Example
Schema: (a SMALLINT, b INT, c SMALLINT, d BIGINT)
Without alignment:
Offset 0: a (2 bytes)
Offset 2: b (4 bytes) ← MISALIGNED (needs offset 4)
Offset 6: c (2 bytes)
Offset 8: d (8 bytes) ← aligned by luck
Total: 16 bytes
With alignment:
Offset 0: a (2 bytes)
Offset 2: padding (2 bytes)
Offset 4: b (4 bytes) ← aligned
Offset 8: c (2 bytes)
Offset 10: padding (6 bytes)
Offset 16: d (8 bytes) ← aligned
Total: 24 bytes (50% overhead!)
Reordered for minimal padding:
Offset 0: d (8 bytes) ← 8-byte aligned
Offset 8: b (4 bytes) ← 4-byte aligned
Offset 12: a (2 bytes) ← 2-byte aligned
Offset 14: c (2 bytes) ← 2-byte aligned
Total: 16 bytes (no padding!)
PostgreSQL stores columns in their declared order and adds alignment padding. For OLTP tables with many rows, reordering columns by decreasing alignment requirement (8-byte columns first, then 4-byte, then 2-byte, then 1-byte) can save significant space. Some ORMs and migration tools support automatic column reordering for this purpose.
PostgreSQL Alignment Rules
| Type Alignment | Types | Aligned To |
|---|---|---|
| char (c) | boolean, char(1), "char" | 1 byte (none) |
| short (s) | int2, smallint | 2 bytes |
| int (i) | int4, integer, date, float4 | 4 bytes |
| double (d) | int8, bigint, timestamp, float8 | 8 bytes |
The t_hoff field in PostgreSQL's tuple header indicates where user data begins (after header and NULL bitmap, aligned to 8 bytes for double types).
Let's trace through a complete record with only fixed-length fields:
Schema:
CREATE TABLE measurements (
id INTEGER NOT NULL, -- 4 bytes, alignment 4
timestamp TIMESTAMP NOT NULL, -- 8 bytes, alignment 8
sensor_id SMALLINT NOT NULL, -- 2 bytes, alignment 2
temperature REAL, -- 4 bytes, alignment 4 (nullable)
humidity REAL -- 4 bytes, alignment 4 (nullable)
);
Record Data:
id = 12345
timestamp = 2024-01-15 10:30:00 (encoded as microseconds)
sensor_id = 42
temperature = 23.5
humidity = NULL
Byte-Level Layout (PostgreSQL-style):
Offset Field Value (hex) Notes
------ -------------------- ----------------- ----------------------
0 t_xmin 00 00 10 5A Transaction 4186 inserted
4 t_xmax 00 00 00 00 Not deleted
8 t_cid 00 00 00 00 Command 0
12 t_ctid.block 00 00 00 2F Page 47
16 t_ctid.offset 00 03 Slot 3
18 t_infomask2 00 05 5 columns
20 t_infomask 08 02 HASNULL + ...
22 t_hoff 18 24 = data offset (aligned)
23 (padding) 00 Align to 24
24 t_bits 10 NULL bitmap: 0b00010000
bit 4 = humidity NULL
25-26 (padding) 00 00 Align to 4 for first field
27 (padding) 00 Align to 4
28 id 00 00 30 39 12345 (little-endian)
32 (padding) 00 00 00 00 Align to 8 for timestamp
36-39 (padding) 00 00 00 00 Continue padding
40 timestamp 00 05 FA ... (8B) Encoded timestamp
48 sensor_id 00 2A 42
50 (padding) 00 00 Align to 4 for temperature
52 temperature 41 BC 00 00 23.5f (IEEE 754)
56 humidity (not stored) NULL - no bytes
------ -------------------- -----------------
Total: 56 bytes
Note: The humidity field consumes NO space because it's NULL—only recorded in the NULL bitmap.
This 56-byte record has 8 bytes of padding (14% overhead) due to alignment. The actual user data is only 22 bytes (4+8+2+4+4 minus NULL). Header + NULL bitmap adds 25 bytes. Better column ordering could reduce padding.
Each database system has its own record format optimized for its specific needs:
PostgreSQL Heap Tuple Format
┌─────────────────────────────────────────────────┐
│ HeapTupleHeaderData (23 bytes base) │
│ t_xmin, t_xmax, t_cid, t_ctid │
│ t_infomask, t_infomask2, t_hoff │
├─────────────────────────────────────────────────┤
│ NULL Bitmap (if HEAP_HASNULL flag set) │
│ Ceil(natts/8) bytes, bit=1 means NOT NULL │
├─────────────────────────────────────────────────┤
│ OID (4 bytes, deprecated, if HEAP_HASOID) │
├─────────────────────────────────────────────────┤
│ Padding to MAXALIGN (8-byte boundary) │
├─────────────────────────────────────────────────┤
│ User Data: columns in declared order │
│ - Each column aligned per type's typalign │
│ - varlena types: 1-4 byte length + data │
│ - NULL columns: no storage (check bitmap) │
└─────────────────────────────────────────────────┘
Key Features:
Each system has evolved its format over time. InnoDB's COMPACT replaced REDUNDANT; DYNAMIC and COMPRESSED formats came later. SQL Server's formats differ between heap and clustered tables. PostgreSQL's format has remained relatively stable, but TOAST thresholds and compression have evolved.
Record layout is where logical schema meets physical bytes. Let's consolidate the key takeaways:
What's Next:
We've covered fixed-length fields, but real databases must handle VARCHARs, TEXT, BLOBs, and other variable-length data. The next page explores variable-length record encoding—the techniques that enable efficient storage and access of dynamically-sized values.
You now understand how records are structured at the byte level—headers, NULL bitmaps, fixed fields, and alignment. This knowledge is essential for storage optimization, debugging data issues, and understanding database performance characteristics. Next, we'll tackle the complexity of variable-length records.