Database Management SystemsPage Layout

Page Layout and Organization

LevelIntermediate

Duration60 mins

TopicPage Layout

4 / 5

Variable-Length Records

The Challenge of Dynamic Data

Fixed-length records are elegant but limited. Real-world data is messy: user bios range from empty to thousands of characters, JSON documents vary from {} to megabytes, and images span bytes to gigabytes. Variable-length data is the norm, not the exception.

Handling variable-length data introduces fundamental challenges:

How do you find where one field ends and the next begins?
What happens when a value is too large for a single page?
How do you update a value if the new version is larger than the old?
Can you compress variable-length data to save space?

This page explores the techniques databases use to handle these challenges efficiently.

What You Will Learn

By the end of this page, you will understand length-prefixing and offset arrays for variable fields, TOAST and overflow page mechanisms, inline vs out-of-line storage decisions, compression strategies, and how different databases implement variable-length storage. You'll be able to reason about storage tradeoffs for schema design.

Encoding Strategies for Variable-Length Data

There are two primary strategies for encoding variable-length fields: length-prefixing and offset arrays.

Length-Prefixing

Each variable-length value is preceded by its length:

Field format: [length][data bytes]

Example VARCHAR values:
  'Hello' → [5][H][e][l][l][o]
  '' → [0]
  'A longer string...' → [17][A][ ][l]...

Sequential access:
  Read length → read that many bytes → repeat

Advantages:

Simple implementation
Natural for sequential processing
No separate metadata region

Disadvantages:

O(n) access to nth field (must traverse all prior)
Length field size limits max value size

Offset Arrays

A separate array stores the ending offset (or starting offset) of each field:

Record structure:
  [offset_array][data region]

Offset array: [12][25][30]
  Field 0: bytes 0-11 (length 12)
  Field 1: bytes 12-24 (length 13)
  Field 2: bytes 25-29 (length 5)

Random access:
  offset[n-1] to offset[n] gives field n

Advantages:

O(1) access to any field
Efficient for queries accessing few columns

Disadvantages:

Overhead of offset array (2-4 bytes per field)
More complex update logic

Encoding Strategy Usage in Major Databases
Database	Strategy	Implementation Details
PostgreSQL	Length-prefix (varlena)	1 or 4 byte length; short varlenas use 1-byte header
MySQL InnoDB	Length-prefix + list	Lengths stored before record header in reverse order
SQL Server	Offset array	Variable column offset array after NULL bitmap
SQLite	Length-prefix (varint)	Variable-length integer encoding for lengths
Oracle	Length-prefix	1-3 byte length depending on value size

Hybrid Approaches

Many systems use hybrid approaches. InnoDB stores variable-field lengths before the record (enabling quick size calculation) but the actual data uses length-prefix encoding. SQL Server's offset array gives O(1) access to any variable column, ideal for SELECT queries accessing specific columns.

Length Field Encoding: Balancing Size and Range

The length field itself must balance overhead against maximum representable size.

Length Field Size vs Maximum Value Length
Length Field Size	Max Value Size	Overhead for 10-byte value	Overhead for 1KB value
1 byte	255 bytes	10%	0.1%
2 bytes	65,535 bytes (~64KB)	20%	0.2%
4 bytes	4,294,967,295 bytes (~4GB)	40%	0.4%
Variable (1-9 bytes)	Up to 2^63 bytes	10-40%	0.1-0.4%

PostgreSQL's varlena Format

PostgreSQL uses a clever variable-length header (varlena) that adapts to value size:

Varlena header interpretation (first byte):

┌─────────────────────────────────────────────────────────────┐
│ If first byte has high bit = 0:                             │
│   - 4-byte header: bits 0-29 = length including header      │
│   - bits 30-31 = compression/TOAST flags                    │
│   - Max untoasted size: ~1GB                                │
├─────────────────────────────────────────────────────────────┤
│ If first byte has high bit = 1:                             │
│   - 1-byte header (short varlena)                           │
│   - bits 1-7 = length including header                      │
│   - Max size: 127 bytes                                     │
│   - Saves 3 bytes per small string!                         │
├─────────────────────────────────────────────────────────────┤
│ Special encodings for TOAST pointers, compressed data, etc. │
└─────────────────────────────────────────────────────────────┘

Examples:
  'Hi' → 0x85 'H' 'i' (short varlena: 0x80 | 5 = header+data length)
  100-char string → 0x00000068 ... (4-byte header: 104 total bytes)

This optimization saves 3 bytes per small string—significant for tables with many short VARCHAR columns.

Variable-Length Integer Encoding (Varint)

SQLite and Protocol Buffers use varint encoding where smaller numbers use fewer bytes:

Varint encoding (SQLite style):
  Values 0-127:       1 byte  (high bit = 0)
  Values 128-16383:   2 bytes (first byte high bit = 1, continues)
  Values 16384+:      3+ bytes

Example: Length = 200
  Binary: 11001000 = 0xC8
  Varint: 0x81 0x48 (10000001 01001000)
    First byte: 0x81 -> high bit set, continue; value bits = 0000001
    Second byte: 0x48 -> high bit clear, done; value bits = 1001000
    Combined: 0000001_1001000 = 11001000 = 200

Varint is space-optimal when most lengths are small, but parsing is slightly more complex than fixed-size lengths.

Length Includes What?

Length fields may include just the data, or the header + data. PostgreSQL's varlena length includes the header itself. This affects offset calculations and must be handled consistently. Always verify whether documented 'length' values include headers.

TOAST: Handling Oversized Values

What happens when a value is too large to fit on a single page? PostgreSQL's TOAST (The Oversized-Attribute Storage Technique) provides a comprehensive solution.

The Problem

PostgreSQL pages are 8KB. A single tuple must fit on one page (minus header overhead). But:

TEXT columns can hold strings up to 1GB
JSONB documents can be megabytes
BYTEA can store arbitrary binary data

TOAST Solution

When a row is too large, TOAST kicks in:

Try compression — If column type supports it, compress the value
If still too big: out-of-line storage — Store in separate TOAST table, replace inline value with a TOAST pointer
Chunk large values — Split into ~2KB chunks stored across multiple TOAST pages

Converting Mermaid diagram...

TOAST Strategies

Strategy	Column Directive	Behavior	Best For
PLAIN	n/a (for fixed types)	Never compress or out-of-line	Fixed-length types
EXTENDED	Default for varlena	Compress then out-of-line if needed	Most variable types
EXTERNAL	ALTER COLUMN SET STORAGE	Out-of-line but no compression	Pre-compressed data, images
MAIN	ALTER COLUMN SET STORAGE	Compress, but prefer keeping inline	Values rarely TOASTed

TOAST Pointer Structure

typedef struct varatt_external {
    int32  va_rawsize;      // Original uncompressed size
    int32  va_extsize;      // External stored size (after compression)
    Oid    va_valueid;      // OID of TOAST value
    Oid    va_toastrelid;   // OID of TOAST table
} varatt_external;

// Size: 18 bytes on disk
// Replaces potentially multi-megabyte value

TOAST Performance Implications

Accessing TOASTed values requires additional I/O to fetch chunks from the TOAST table. Queries selecting large TEXT/JSONB columns may have hidden costs. Use SELECT with limited columns to avoid fetching unneeded TOASTed data. Monitor pg_stat_user_tables.n_tup_hot_upd and TOAST table sizes.

Overflow Mechanisms Across Databases

Every database system has its own mechanism for handling values that exceed page capacity:

InnoDB Row Overflow Handling

InnoDB row formats handle large values differently:

COMPACT/REDUNDANT Format:

First 768 bytes stored inline
Remainder stored on overflow pages
20-byte pointer to overflow page chain

DYNAMIC Format (default since 5.7):

Only 20-byte pointer stored inline
All data on overflow pages
Better for very large values (no wasted 768 bytes)

COMPRESSED Format:

Like DYNAMIC but with key_block_size compression
Both index and overflow pages compressed

DYNAMIC format overflow:
┌─────────────────────────────────────────┐
│ Main page row:                          │
│ [header][fixed cols][off-page pointer]  │
│                          │              │
│                          ▼              │
│              ┌───────────────────┐      │
│              │ Overflow page 1   │──────┼──► Overflow page 2 ...
│              │ (up to 16KB data) │      │
│              └───────────────────┘      │
└─────────────────────────────────────────┘

Inline Thresholds:

COMPACT: 768 bytes inline, then overflow
DYNAMIC: 40-byte minimum inline for off-page pointer
Maximum row size: ~8KB (half page) to avoid split issues

Detecting Overflow Issues

Monitor for chained/migrated rows: Oracle: SELECT chain_cnt FROM dba_tables. SQL Server: sys.dm_db_index_physical_stats shows avg_record_size. PostgreSQL: pg_stat_user_tables.n_live_tup vs table size indicates bloat. High ratios of overflow to inline can indicate schema design issues.

Compression: Trading CPU for Storage

Variable-length data often contains redundancy that compression can exploit. Databases offer multiple compression strategies:

Database Compression Approaches
Database	Compression Type	Algorithm	Scope
PostgreSQL	TOAST compression	pglz (custom LZ)	Per-value, transparent
PostgreSQL 14+	TOAST compression	LZ4 option	Per-value, faster than pglz
InnoDB	Page compression	zlib, LZ4, zstd	Per-page, transparent
InnoDB	Table compression	zlib	KEY_BLOCK_SIZE pages
SQL Server	Row compression	Dictionary + RLE	Fixed values, NULLs compacted
SQL Server	Page compression	Column prefix + dictionary	Page-level patterns
Oracle	Basic compression	Deduplicate repeated values	Per-block, read-only friendly
Oracle	Advanced/HCC	Columnar + algorithms	Exadata, data warehouse

Compression Tradeoffs

Compression Decision Matrix:

                 High Compression Ratio ──────────────►
          ┌────────────────────────────────────────────────┐
  Fast    │ LZ4, Snappy           │ zstd (balanced)       │
Decompress│ Good for OLTP,        │ Good general purpose, │
     │    │ hot data, real-time   │ configurable levels   │
     │    ├────────────────────────────────────────────────┤
     ▼    │ No compression        │ zlib, gzip            │
  Slow    │ Simple, CPU-free      │ Best ratio, slow      │
Decompress│ Max speed needed      │ Cold data, archival   │
          └────────────────────────────────────────────────┘

When to Use Compression:

✅ Large text values with redundancy (logs, HTML, XML)
✅ Cold/archival data rarely accessed
✅ I/O-bound systems (SSD/HDD often slower than decompression)
❌ Already-compressed data (images, videos, encrypted)
❌ CPU-bound systems with low I/O pressure
❌ Very small values (overhead > savings)

PostgreSQL LZ4 TOAST

PostgreSQL 14 added LZ4 as an alternative TOAST compression algorithm. LZ4 decompresses significantly faster than pglz (often 10x) with slightly lower compression ratio. Set default_toast_compression = 'lz4' or specify per-column: ALTER TABLE t ALTER COLUMN c SET COMPRESSION lz4;

Variable-Length Data in Indexes

Index pages also must handle variable-length keys, but with different constraints than heap pages:

Why Index Keys Are Special

Keys must fit on a single page — No TOAST/overflow for index entries
Keys must be comparable — Compressed/encoded values must preserve order
Fanout matters — Larger keys = fewer entries per page = taller tree

Key Truncation and Prefix Compression

B-tree indexes can use key abbreviation in internal pages:

Leaf level: Full keys for exact matching
  ['Alice', 'Bob', 'Charlie', 'David', 'Eve']

Internal level: Truncated separator keys
  ['C', 'E']  // Minimal keys that separate children

Prefix compression (InnoDB):
  Page stores common prefix once
  Individual entries store suffix only
  
  Original: ['product_12345', 'product_12346', 'product_12347']
  Prefix: 'product_1234'
  Entries: ['5', '6', '7']

Maximum Key Length

Database	Max Key Size	Notes
PostgreSQL	~2712 bytes	1/3 of page, practical limit lower due to overhead
MySQL InnoDB	3072 bytes	With innodb_large_prefix=ON and DYNAMIC/COMPRESSED
SQL Server	900 bytes	Clustered index; 1700 for nonclustered with INCLUDE
SQLite	~1/4 page	Default 1024 bytes for 4KB pages

Strategies When Keys Are Too Long:

Use prefix index: INDEX(column(100))
Hash the value: INDEX(MD5(column)) with equality-only queries
Use full-text search instead of B-tree
Normalize: Extract long values to separate table
PostgreSQL: Use hash index (lower overhead than B-tree for long keys)

Index Key Size Impacts

Long keys reduce index fanout, increasing tree height and I/O. A B-tree with 100-entry pages has height log₁₀₀(N). With only 10-entry pages (due to large keys), height is log₁₀(N)—can double required I/O for lookups. Monitor index sizes relative to data sizes.

Updating Variable-Length Records

Updates to variable-length fields present unique challenges because the new value may differ in size from the old value.

Update Scenarios

Scenario 1: New value is SMALLER
  Old: 'The quick brown fox jumps over the lazy dog' (45 bytes)
  New: 'Hello' (5 bytes)
  
  Options:
  a) In-place update, leaving 40 bytes as internal fragmentation
  b) Delete old + insert new at different location
  c) Compact page after update
  
Scenario 2: New value is SAME SIZE
  Old: 'Hello' (5 bytes)
  New: 'World' (5 bytes)
  
  Simple in-place overwrite
  
Scenario 3: New value is LARGER
  Old: 'Hi' (2 bytes)
  New: 'Hello World' (11 bytes)
  
  Options:
  a) Extend in-place if contiguous free space available
  b) Relocate within same page if total space available
  c) Move to different page entirely
  d) Original slot becomes forward pointer (Oracle migration)
  e) MVCC: Keep old version, insert new version (PostgreSQL)

Scenario 4: Value exceeds TOAST threshold
  Old: 100-byte string (inline)
  New: 50KB document (must TOAST)
  
  Old inline space freed (or kept for MVCC)
  New value goes to TOAST table
  Pointer replaces inline value

MVCC and Variable-Length Updates

•PostgreSQL HOT Updates — If updated columns aren't indexed, new tuple version can stay on same page. Old version retained for MVCC readers. Slot directory enables chain following.
•InnoDB Undo Log — Before-image stored in undo log, not heap. Update is in-place. MVCC reconstructs old versions from undo chain.
•Oracle Undo Segments — Similar to InnoDB; consistent read reconstructs from undo.
•SQL Server Row Versioning — Tempdb stores old versions when READ_COMMITTED_SNAPSHOT or SNAPSHOT isolation enabled.

Fill Factor for Update-Heavy Tables

Setting a fill factor (e.g., 70%) leaves free space on each page for in-place growth. UPDATE-heavy tables with growing variable columns benefit from lower fill factors. OLAP/append-only tables can use 100% fill factor. PostgreSQL: CREATE TABLE ... WITH (fillfactor=70)

Summary: Variable-Length Records

Variable-length data is the norm in real-world databases, requiring sophisticated encoding and overflow mechanisms. Let's consolidate the key takeaways:

Key Takeaways

•Two primary encodings: length-prefix and offset arrays — Trade sequential simplicity (length-prefix) against random access efficiency (offset arrays).
•Length field size balances overhead vs range — Variable-length integers optimize for common cases of small values.
•TOAST/overflow handles values exceeding page size — Transparent out-of-line storage with pointers replacing inline data.
•Each database has unique overflow mechanisms — PostgreSQL TOAST, InnoDB overflow pages, SQL Server ROW_OVERFLOW_DATA, Oracle chaining.
•Compression trades CPU for storage/I/O — LZ4 for speed, zstd for balance, zlib for maximum compression.
•Index keys have stricter size limits — No TOAST for indexes; long keys hurt fanout and performance.
•Variable-length updates are complex — Size changes may require relocation, MVCC enables non-destructive versioning.
•Fill factor helps update-heavy workloads — Reserved space accommodates in-place growth.

What's Next:

We've explored page structure, slot directories, record layouts, and variable-length handling. The final page brings everything together with page management—the policies and mechanisms that coordinate page allocation, free space tracking, and lifecycle management across the database.

Page Complete

You now understand how databases handle the fundamental challenge of variable-length data—from encoding strategies through TOAST/overflow to compression. This knowledge is crucial for schema design, storage optimization, and understanding query performance characteristics. Next, we'll complete our exploration with page management.

4 / 5

Loading learning content...

Database Management SystemsPage Layout

Page Layout and Organization

LevelIntermediate

Duration60 mins

TopicPage Layout

4 / 5

Variable-Length Records

The Challenge of Dynamic Data

Handling variable-length data introduces fundamental challenges:

How do you find where one field ends and the next begins?
What happens when a value is too large for a single page?
How do you update a value if the new version is larger than the old?
Can you compress variable-length data to save space?

This page explores the techniques databases use to handle these challenges efficiently.

What You Will Learn

Encoding Strategies for Variable-Length Data

There are two primary strategies for encoding variable-length fields: length-prefixing and offset arrays.

Length-Prefixing

Each variable-length value is preceded by its length:

Field format: [length][data bytes]

Example VARCHAR values:
  'Hello' → [5][H][e][l][l][o]
  '' → [0]
  'A longer string...' → [17][A][ ][l]...

Sequential access:
  Read length → read that many bytes → repeat

Advantages:

Simple implementation
Natural for sequential processing
No separate metadata region

Disadvantages:

O(n) access to nth field (must traverse all prior)
Length field size limits max value size

Offset Arrays

A separate array stores the ending offset (or starting offset) of each field:

Record structure:
  [offset_array][data region]

Offset array: [12][25][30]
  Field 0: bytes 0-11 (length 12)
  Field 1: bytes 12-24 (length 13)
  Field 2: bytes 25-29 (length 5)

Random access:
  offset[n-1] to offset[n] gives field n

Advantages:

O(1) access to any field
Efficient for queries accessing few columns

Disadvantages:

Overhead of offset array (2-4 bytes per field)
More complex update logic

Encoding Strategy Usage in Major Databases
Database	Strategy	Implementation Details
PostgreSQL	Length-prefix (varlena)	1 or 4 byte length; short varlenas use 1-byte header
MySQL InnoDB	Length-prefix + list	Lengths stored before record header in reverse order
SQL Server	Offset array	Variable column offset array after NULL bitmap
SQLite	Length-prefix (varint)	Variable-length integer encoding for lengths
Oracle	Length-prefix	1-3 byte length depending on value size

Hybrid Approaches

Length Field Encoding: Balancing Size and Range

The length field itself must balance overhead against maximum representable size.

Length Field Size vs Maximum Value Length
Length Field Size	Max Value Size	Overhead for 10-byte value	Overhead for 1KB value
1 byte	255 bytes	10%	0.1%
2 bytes	65,535 bytes (~64KB)	20%	0.2%
4 bytes	4,294,967,295 bytes (~4GB)	40%	0.4%
Variable (1-9 bytes)	Up to 2^63 bytes	10-40%	0.1-0.4%

PostgreSQL's varlena Format

PostgreSQL uses a clever variable-length header (varlena) that adapts to value size:

Varlena header interpretation (first byte):

┌─────────────────────────────────────────────────────────────┐
│ If first byte has high bit = 0:                             │
│   - 4-byte header: bits 0-29 = length including header      │
│   - bits 30-31 = compression/TOAST flags                    │
│   - Max untoasted size: ~1GB                                │
├─────────────────────────────────────────────────────────────┤
│ If first byte has high bit = 1:                             │
│   - 1-byte header (short varlena)                           │
│   - bits 1-7 = length including header                      │
│   - Max size: 127 bytes                                     │
│   - Saves 3 bytes per small string!                         │
├─────────────────────────────────────────────────────────────┤
│ Special encodings for TOAST pointers, compressed data, etc. │
└─────────────────────────────────────────────────────────────┘

Examples:
  'Hi' → 0x85 'H' 'i' (short varlena: 0x80 | 5 = header+data length)
  100-char string → 0x00000068 ... (4-byte header: 104 total bytes)

This optimization saves 3 bytes per small string—significant for tables with many short VARCHAR columns.

Variable-Length Integer Encoding (Varint)

SQLite and Protocol Buffers use varint encoding where smaller numbers use fewer bytes:

Varint encoding (SQLite style):
  Values 0-127:       1 byte  (high bit = 0)
  Values 128-16383:   2 bytes (first byte high bit = 1, continues)
  Values 16384+:      3+ bytes

Example: Length = 200
  Binary: 11001000 = 0xC8
  Varint: 0x81 0x48 (10000001 01001000)
    First byte: 0x81 -> high bit set, continue; value bits = 0000001
    Second byte: 0x48 -> high bit clear, done; value bits = 1001000
    Combined: 0000001_1001000 = 11001000 = 200

Varint is space-optimal when most lengths are small, but parsing is slightly more complex than fixed-size lengths.

Length Includes What?

TOAST: Handling Oversized Values

What happens when a value is too large to fit on a single page? PostgreSQL's TOAST (The Oversized-Attribute Storage Technique) provides a comprehensive solution.

The Problem

PostgreSQL pages are 8KB. A single tuple must fit on one page (minus header overhead). But:

TEXT columns can hold strings up to 1GB
JSONB documents can be megabytes
BYTEA can store arbitrary binary data

TOAST Solution

When a row is too large, TOAST kicks in:

Try compression — If column type supports it, compress the value
If still too big: out-of-line storage — Store in separate TOAST table, replace inline value with a TOAST pointer
Chunk large values — Split into ~2KB chunks stored across multiple TOAST pages

Converting Mermaid diagram...

TOAST Strategies

Strategy	Column Directive	Behavior	Best For
PLAIN	n/a (for fixed types)	Never compress or out-of-line	Fixed-length types
EXTENDED	Default for varlena	Compress then out-of-line if needed	Most variable types
EXTERNAL	ALTER COLUMN SET STORAGE	Out-of-line but no compression	Pre-compressed data, images
MAIN	ALTER COLUMN SET STORAGE	Compress, but prefer keeping inline	Values rarely TOASTed

TOAST Pointer Structure

typedef struct varatt_external {
    int32  va_rawsize;      // Original uncompressed size
    int32  va_extsize;      // External stored size (after compression)
    Oid    va_valueid;      // OID of TOAST value
    Oid    va_toastrelid;   // OID of TOAST table
} varatt_external;

// Size: 18 bytes on disk
// Replaces potentially multi-megabyte value

TOAST Performance Implications

Overflow Mechanisms Across Databases

Every database system has its own mechanism for handling values that exceed page capacity:

InnoDB Row Overflow Handling

InnoDB row formats handle large values differently:

COMPACT/REDUNDANT Format:

First 768 bytes stored inline
Remainder stored on overflow pages
20-byte pointer to overflow page chain

DYNAMIC Format (default since 5.7):

Only 20-byte pointer stored inline
All data on overflow pages
Better for very large values (no wasted 768 bytes)

COMPRESSED Format:

Like DYNAMIC but with key_block_size compression
Both index and overflow pages compressed

DYNAMIC format overflow:
┌─────────────────────────────────────────┐
│ Main page row:                          │
│ [header][fixed cols][off-page pointer]  │
│                          │              │
│                          ▼              │
│              ┌───────────────────┐      │
│              │ Overflow page 1   │──────┼──► Overflow page 2 ...
│              │ (up to 16KB data) │      │
│              └───────────────────┘      │
└─────────────────────────────────────────┘

Inline Thresholds:

COMPACT: 768 bytes inline, then overflow
DYNAMIC: 40-byte minimum inline for off-page pointer
Maximum row size: ~8KB (half page) to avoid split issues

Detecting Overflow Issues

Compression: Trading CPU for Storage

Variable-length data often contains redundancy that compression can exploit. Databases offer multiple compression strategies:

Database Compression Approaches
Database	Compression Type	Algorithm	Scope
PostgreSQL	TOAST compression	pglz (custom LZ)	Per-value, transparent
PostgreSQL 14+	TOAST compression	LZ4 option	Per-value, faster than pglz
InnoDB	Page compression	zlib, LZ4, zstd	Per-page, transparent
InnoDB	Table compression	zlib	KEY_BLOCK_SIZE pages
SQL Server	Row compression	Dictionary + RLE	Fixed values, NULLs compacted
SQL Server	Page compression	Column prefix + dictionary	Page-level patterns
Oracle	Basic compression	Deduplicate repeated values	Per-block, read-only friendly
Oracle	Advanced/HCC	Columnar + algorithms	Exadata, data warehouse

Compression Tradeoffs

Compression Decision Matrix:

                 High Compression Ratio ──────────────►
          ┌────────────────────────────────────────────────┐
  Fast    │ LZ4, Snappy           │ zstd (balanced)       │
Decompress│ Good for OLTP,        │ Good general purpose, │
     │    │ hot data, real-time   │ configurable levels   │
     │    ├────────────────────────────────────────────────┤
     ▼    │ No compression        │ zlib, gzip            │
  Slow    │ Simple, CPU-free      │ Best ratio, slow      │
Decompress│ Max speed needed      │ Cold data, archival   │
          └────────────────────────────────────────────────┘

When to Use Compression:

✅ Large text values with redundancy (logs, HTML, XML)
✅ Cold/archival data rarely accessed
✅ I/O-bound systems (SSD/HDD often slower than decompression)
❌ Already-compressed data (images, videos, encrypted)
❌ CPU-bound systems with low I/O pressure
❌ Very small values (overhead > savings)

PostgreSQL LZ4 TOAST

Variable-Length Data in Indexes

Index pages also must handle variable-length keys, but with different constraints than heap pages:

Why Index Keys Are Special

Keys must fit on a single page — No TOAST/overflow for index entries
Keys must be comparable — Compressed/encoded values must preserve order
Fanout matters — Larger keys = fewer entries per page = taller tree

Key Truncation and Prefix Compression

B-tree indexes can use key abbreviation in internal pages:

Leaf level: Full keys for exact matching
  ['Alice', 'Bob', 'Charlie', 'David', 'Eve']

Internal level: Truncated separator keys
  ['C', 'E']  // Minimal keys that separate children

Prefix compression (InnoDB):
  Page stores common prefix once
  Individual entries store suffix only
  
  Original: ['product_12345', 'product_12346', 'product_12347']
  Prefix: 'product_1234'
  Entries: ['5', '6', '7']

Maximum Key Length

Database	Max Key Size	Notes
PostgreSQL	~2712 bytes	1/3 of page, practical limit lower due to overhead
MySQL InnoDB	3072 bytes	With innodb_large_prefix=ON and DYNAMIC/COMPRESSED
SQL Server	900 bytes	Clustered index; 1700 for nonclustered with INCLUDE
SQLite	~1/4 page	Default 1024 bytes for 4KB pages

Strategies When Keys Are Too Long:

Use prefix index: INDEX(column(100))
Hash the value: INDEX(MD5(column)) with equality-only queries
Use full-text search instead of B-tree
Normalize: Extract long values to separate table
PostgreSQL: Use hash index (lower overhead than B-tree for long keys)

Index Key Size Impacts

Updating Variable-Length Records

Updates to variable-length fields present unique challenges because the new value may differ in size from the old value.

Update Scenarios

Scenario 1: New value is SMALLER
  Old: 'The quick brown fox jumps over the lazy dog' (45 bytes)
  New: 'Hello' (5 bytes)
  
  Options:
  a) In-place update, leaving 40 bytes as internal fragmentation
  b) Delete old + insert new at different location
  c) Compact page after update
  
Scenario 2: New value is SAME SIZE
  Old: 'Hello' (5 bytes)
  New: 'World' (5 bytes)
  
  Simple in-place overwrite
  
Scenario 3: New value is LARGER
  Old: 'Hi' (2 bytes)
  New: 'Hello World' (11 bytes)
  
  Options:
  a) Extend in-place if contiguous free space available
  b) Relocate within same page if total space available
  c) Move to different page entirely
  d) Original slot becomes forward pointer (Oracle migration)
  e) MVCC: Keep old version, insert new version (PostgreSQL)

Scenario 4: Value exceeds TOAST threshold
  Old: 100-byte string (inline)
  New: 50KB document (must TOAST)
  
  Old inline space freed (or kept for MVCC)
  New value goes to TOAST table
  Pointer replaces inline value

MVCC and Variable-Length Updates

•PostgreSQL HOT Updates — If updated columns aren't indexed, new tuple version can stay on same page. Old version retained for MVCC readers. Slot directory enables chain following.
•InnoDB Undo Log — Before-image stored in undo log, not heap. Update is in-place. MVCC reconstructs old versions from undo chain.
•Oracle Undo Segments — Similar to InnoDB; consistent read reconstructs from undo.
•SQL Server Row Versioning — Tempdb stores old versions when READ_COMMITTED_SNAPSHOT or SNAPSHOT isolation enabled.

Fill Factor for Update-Heavy Tables

Summary: Variable-Length Records

Variable-length data is the norm in real-world databases, requiring sophisticated encoding and overflow mechanisms. Let's consolidate the key takeaways:

Key Takeaways

•Two primary encodings: length-prefix and offset arrays — Trade sequential simplicity (length-prefix) against random access efficiency (offset arrays).
•Length field size balances overhead vs range — Variable-length integers optimize for common cases of small values.
•TOAST/overflow handles values exceeding page size — Transparent out-of-line storage with pointers replacing inline data.
•Each database has unique overflow mechanisms — PostgreSQL TOAST, InnoDB overflow pages, SQL Server ROW_OVERFLOW_DATA, Oracle chaining.
•Compression trades CPU for storage/I/O — LZ4 for speed, zstd for balance, zlib for maximum compression.
•Index keys have stricter size limits — No TOAST for indexes; long keys hurt fanout and performance.
•Variable-length updates are complex — Size changes may require relocation, MVCC enables non-destructive versioning.
•Fill factor helps update-heavy workloads — Reserved space accommodates in-place growth.

What's Next:

Page Complete

4 / 5