Database Management SystemsFile Organization

File Organization Methods

LevelIntermediate

Duration75 mins

TopicFile Organization

1 / 5

Heap File Organization

The Simplest Storage Solution

When you have a collection of records to store, the most intuitive approach is often the simplest: just append each new record to the end of the file. This deceptively straightforward strategy—known as heap file organization—forms the foundation upon which all other file organization methods build.

Despite its simplicity, heap file organization is far from obsolete. It remains the default organization in most database systems and is often the optimal choice for specific workloads. Understanding heap files deeply is essential for making informed decisions about data organization in production systems.

What You Will Learn

By the end of this page, you will understand the internal structure of heap files, analyze the asymptotic performance of fundamental operations, appreciate the trade-offs that make heap files suitable for certain workloads, and recognize when alternative organizations become necessary.

What is a Heap File?

A heap file (also called an unordered file or pile file) is a file organization method where records are stored in no particular order. New records are inserted at the end of the file (or in the first available space if deletions have occurred), without any consideration for the values they contain.

The term "heap" comes from the data structure concept of a heap—not the balanced tree structure, but rather the general notion of an unstructured pile of items. Just as you might toss items onto a heap without organizing them, records in a heap file accumulate without internal ordering.

Formal Definition:

A heap file H is a sequence of pages P₁, P₂, ..., Pₙ where:

Each page contains zero or more records
Records within a page have no prescribed order
Pages themselves have no logical ordering relationship
New records are placed in any page with sufficient free space

Heap File vs. Heap Data Structure

Don't confuse heap files with heap data structures (priority queues). The heap file 'heap' refers to an unorganized collection, not the tree-based structure that maintains partial ordering. This naming confusion is historical and unfortunate, but the context (file organization vs. data structures) usually makes the meaning clear.

Key Characteristics:

No Ordering Constraint: Records are not sorted by any attribute, nor are they clustered by value. The physical position of a record has no relationship to its content.
Append-Oriented Insertion: The default insertion strategy adds records at the end of the file, making inserts very fast.
Space Reclamation: When records are deleted, space may be reclaimed immediately, marked for reuse, or left as fragmentation (implementation-dependent).
Full Scan Requirement: Finding a specific record by value requires scanning the entire file (in the worst case) because there's no ordering to exploit.
Minimal Metadata: Heap files require very little auxiliary structure—just enough to track pages and free space.

Physical Structure of Heap Files

Understanding the physical layout of heap files is crucial for analyzing their performance characteristics. A heap file consists of several key components:

1. File Header

Every heap file begins with a header page (or header block) containing metadata:

File identification and version information
Number of pages currently allocated
Pointer to the first data page
Pointer to a free space directory (if using linked list or page directory)
Statistics (record count, average record size, etc.)

heap_file_header.pseudo

Pseudocode

// Heap File Header Structure
struct HeapFileHeader {
    int32   magic_number;        // File type identifier (e.g., 0xHEAP)
    int32   version;             // File format version
    int64   num_pages;           // Total pages allocated
    int64   num_records;         // Total records (may be approximate)
    int64   first_data_page;     // Page number of first data page
    int64   last_data_page;      // Page number of last data page
    int64   free_space_map_page; // Page number of free space map (if used)
    int32   page_size;           // Size of each page in bytes
    byte    reserved[200];       // Reserved for future use
}
 
// Example: 8KB pages, 4 bytes magic, rest as shown
// Total header: typically one page (8192 bytes)

2. Data Pages

The bulk of a heap file consists of data pages, each containing actual records. Pages are the unit of I/O transfer between disk and memory—they match the disk block size for efficiency (typically 4KB, 8KB, or 16KB).

Each data page has its own internal structure:

heap_page_structure.pseudo

Pseudocode

// Heap Page Layout (Slotted Page)
struct HeapPage {
    // Header Section (fixed position at page start)
    struct PageHeader {
        int32 page_id;           // Unique page identifier
        int32 num_slots;         // Number of slots in directory
        int32 free_space_start;  // Offset where free space begins
        int32 free_space_end;    // Offset where slot directory starts
        int16 flags;             // Page flags (e.g., is_full, is_compacted)
        int16 reserved;
    } header;
    
    // Free Space Region
    // Records grow from bottom of page upward
    // Slot directory grows from end of page downward
    byte free_space[];
    
    // Slot Directory (grows backward from page end)
    // Each slot is: (offset, length, flags)
    struct Slot {
        int16 offset;    // Offset of record from page start (-1 if deleted)
        int16 length;    // Length of record
    } slots[];           // slots[0] is closest to page end
}
 
// Record Identifier (RID)
struct RID {
    int64 page_id;   // Which page contains the record
    int32 slot_num;  // Which slot within the page
}
 
/*
 * Visual Layout of a Slotted Page:
 * 
 * ┌─────────────────────────────────────────────┐
 * │ Page Header (page_id, num_slots, etc.)      │
 * ├─────────────────────────────────────────────┤
 * │ Record 3 Data → → → → → → → → → → → → → →  │
 * │ Record 1 Data → → → → → → → → → → → → → →  │
 * │ (Deleted - now free space)                  │
 * │ Record 2 Data → → → → → → → → → → → → → →  │
 * │                                             │
 * │ ═══════════ FREE SPACE ═══════════════════  │
 * │                                             │
 * ├─────────────────────────────────────────────┤
 * │ Slot 3 │ Slot 2 │ Slot 1 │ Slot 0 │← Dir    │
 * └─────────────────────────────────────────────┘
 */

3. Free Space Management

Managing free space efficiently is one of the main challenges in heap file implementation. Three common approaches exist:

Approach A: Linked List of Free Pages

Maintain a linked list connecting all pages with free space
File header points to head of list
Each page includes a "next free page" pointer
Pro: Simple implementation
Con: May need to traverse many pages to find sufficient space

Approach B: Page Directory (Free Space Map)

Dedicated pages contain a directory listing free space per page
Each directory entry: (page_id, free_bytes)
Pro: Quick lookup for page with sufficient space
Con: Additional space overhead for directory

Approach C: Free Space Bitmap

Bitmap where each bit indicates if corresponding page has free space
Often combined with a threshold (e.g., bit=1 if >10% free)
Pro: Compact representation, fast scanning
Con: Coarse granularity may require checking multiple pages

Free Space Management Approaches Comparison
Approach	Space Overhead	Insert Performance	Suitable For
Linked List	O(1) per page	O(n) worst case	Small files, uniform records
Page Directory	O(n/k) pages	O(1) to O(n/k)	Large files, variable records
Free Space Bitmap	O(n/8) bytes	O(n/block) scan	Very large files, approximate

Heap File Operations

Let's analyze the fundamental operations on heap files with rigorous complexity analysis. Understanding these operations is essential for evaluating when heap organization is appropriate.

INSERT Operation

Inserting a record into a heap file is straightforward:

Find a page with sufficient free space
Add the record to that page
Update the slot directory
Return the Record ID (RID)

Complexity Analysis:

Finding a page: O(1) with free space map, O(n) with naive linked list
Adding record to page: O(1) — just write at free space offset
Total: O(1) amortized with proper free space management

Cost in I/O:

Best case: 1 read (free space map) + 1 read/write (data page) = 2 I/Os
Average case: 2-3 I/Os
No index updates needed (the simplest case)

heap_insert.pseudo

Pseudocode

function insert(HeapFile file, Record record) -> RID {
    // Step 1: Calculate space needed
    int space_needed = sizeof(record) + sizeof(Slot);
    
    // Step 2: Find page with sufficient space
    PageID page_id = file.free_space_map.find_page_with_space(space_needed);
    
    if (page_id == NONE) {
        // No existing page has space; allocate new page
        page_id = file.allocate_new_page();
        file.free_space_map.add_page(page_id, PAGE_SIZE - HEADER_SIZE);
    }
    
    // Step 3: Load the page into buffer pool
    Page page = buffer_pool.get_page(file.file_id, page_id);
    
    // Step 4: Insert record into page
    int slot_num = page.add_record(record);
    
    // Step 5: Update free space map
    file.free_space_map.update(page_id, page.free_space());
    
    // Step 6: Mark page dirty and return RID
    buffer_pool.mark_dirty(page);
    return RID(page_id, slot_num);
}
 
function Page.add_record(Record record) -> int {
    // Find next available slot
    int slot_num = this.header.num_slots;
    
    // Calculate record position (grows from bottom up)
    int record_offset = this.header.free_space_start;
    
    // Write record data
    memcpy(this.data + record_offset, record.data, record.length);
    
    // Update slot directory (grows from top down)
    this.slots[slot_num] = Slot(record_offset, record.length);
    
    // Update header
    this.header.num_slots++;
    this.header.free_space_start += record.length;
    this.header.free_space_end -= sizeof(Slot);
    
    return slot_num;
}

SEARCH Operation

Searching in a heap file—finding records that match a given predicate—is the primary weakness of this organization:

Equality Search (e.g., WHERE id = 42):

Must scan all records until match found (or entire file if no match)
Average case: scan half the file
Worst case: scan entire file
Complexity: O(n) where n = number of records

Range Search (e.g., WHERE price BETWEEN 10 AND 100):

Must scan all records; cannot stop early
Complexity: O(n) always

Cost in I/O:

Must read all pages: B pages × 1 I/O per page = B I/Os
For a 1GB heap file with 8KB pages: ~125,000 I/O operations!

The Search Problem

This O(n) search cost is the fundamental limitation of heap files. For any table where you frequently search by a specific attribute, you must either: (1) accept full table scans, (2) build an index on that attribute, or (3) use a different file organization.

DELETE Operation

Deleting a record identified by its RID proceeds as follows:

Locate the page using the page_id from RID
Mark the slot as deleted (set offset to -1 or set deleted flag)
Optionally compact the page to reclaim space
Update free space tracking

Complexity:

With RID: O(1) — direct page access
Without RID (by value): O(n) — must search first

Space Reclamation Strategies:

Immediate Compaction: Shift remaining records to eliminate gap. Pro: No fragmentation. Con: Expensive for each delete.
Lazy Reclamation: Mark slot as deleted, reuse space later. Pro: Fast deletes. Con: Internal fragmentation.
Periodic Vacuum: Batch compaction during maintenance windows. Pro: Amortized cost. Con: Temporary space waste.

heap_delete.pseudo

Pseudocode

function delete_by_rid(HeapFile file, RID rid) -> bool {
    // Step 1: Load the page
    Page page = buffer_pool.get_page(file.file_id, rid.page_id);
    
    // Step 2: Validate slot exists and is not already deleted
    if (rid.slot_num >= page.header.num_slots) {
        return false;  // Invalid slot number
    }
    
    Slot slot = page.slots[rid.slot_num];
    if (slot.offset == DELETED_MARKER) {
        return false;  // Already deleted
    }
    
    // Step 3: Mark slot as deleted
    int freed_space = slot.length;
    page.slots[rid.slot_num].offset = DELETED_MARKER;
    page.slots[rid.slot_num].length = 0;
    
    // Step 4: Update page statistics
    page.header.deleted_count++;
    page.header.fragmented_bytes += freed_space;
    
    // Step 5: Optionally compact if fragmentation exceeds threshold
    if (page.fragmentation_ratio() > COMPACTION_THRESHOLD) {
        page.compact();  // Shift records to eliminate gaps
    }
    
    // Step 6: Update free space map
    file.free_space_map.update(rid.page_id, page.total_free_space());
    
    buffer_pool.mark_dirty(page);
    return true;
}
 
function Page.compact() {
    // Create temporary buffer
    byte temp[PAGE_SIZE];
    int write_offset = sizeof(PageHeader);
    
    // Copy non-deleted records contiguously
    for (int i = 0; i < this.header.num_slots; i++) {
        if (this.slots[i].offset != DELETED_MARKER) {
            int len = this.slots[i].length;
            memcpy(temp + write_offset, this.data + this.slots[i].offset, len);
            this.slots[i].offset = write_offset;
            write_offset += len;
        }
    }
    
    // Copy compacted data back
    memcpy(this.data + sizeof(PageHeader), temp + sizeof(PageHeader), 
           write_offset - sizeof(PageHeader));
    this.header.free_space_start = write_offset;
    this.header.fragmented_bytes = 0;
}

UPDATE Operation

Updating a record has two cases:

Case 1: Update in Place (record size unchanged)

Locate page via RID
Overwrite record data at existing slot
Complexity: O(1)

Case 2: Record Size Changes

If new size ≤ old size: update in place, potentially create internal fragment
If new size > old size and fits in page: may need to relocate within page
If new size > old size and doesn't fit: delete old record, insert new record (new RID!)

The potential for RID changes when records grow is a significant complication. Systems handle this through:

Forwarding Addresses: Old location contains pointer to new location
Fixed Maximum Record Size: Pre-allocate maximum possible space
Separate Overflow Pages: Large records go to overflow area

Performance Analysis

Let's establish a rigorous performance model for heap files. This analysis forms the baseline against which we'll compare other file organizations.

Notation:

n = number of records in the file
R = size of each record (bytes)
B = number of disk pages (blocks) in the file
P = page size (bytes)
D = time for one disk I/O operation (seek + transfer)
C = time to process one record in memory

Derived Relationships:

Records per page: r = ⌊(P - header_overhead) / R⌋
Total pages: B = ⌈n / r⌉

Heap File Operation Costs
Operation	I/O Cost (Pages)	CPU Cost	Notes
Insert	1-2	O(1)	Find page + write page
Delete (by RID)	1	O(1)	Direct page access
Delete (by value)	B/2 avg, B worst	O(n)	Must search first
Search (equality)	B/2 avg, B worst	O(n)	Linear scan
Search (range)	B	O(n)	Must scan all
Full scan	B	O(n)	Sequential I/O

Concrete Example:

Consider a customer table with the following characteristics:

10 million records (n = 10,000,000)
Record size: 200 bytes (R = 200)
Page size: 8 KB (P = 8,192 bytes)
Page header overhead: 100 bytes
Disk I/O time: 10 ms average (D = 0.01 seconds)
CPU processing: 1 μs per record (C = 0.000001 seconds)

Calculations:

Records per page: r = ⌊(8192 - 100) / 200⌋ = 40 records
Total pages: B = ⌈10,000,000 / 40⌉ = 250,000 pages
Total file size: 250,000 × 8 KB = ~2 GB

Search Cost:

I/O time (average): 125,000 pages × 10 ms = 1,250 seconds ≈ 21 minutes
CPU time: 5,000,000 records × 1 μs = 5 seconds
Total: ~21 minutes for a single equality search!

This analysis demonstrates why heap files alone are impractical for search-heavy workloads.

Sequential vs Random I/O

The silver lining of heap file scans is that they're sequential I/O—reading contiguous pages from disk. Sequential I/O can be 100x faster than random I/O (10 ms random vs 0.1 ms sequential per page). Thus, full table scans on heap files are much faster than the naive calculation suggests when pages are physically contiguous.

Revised Analysis with Sequential I/O:

With sequential access optimization:

Sequential I/O time: 0.1 ms per page
Total I/O time: 250,000 × 0.1 ms = 25 seconds
With CPU time: ~30 seconds total

This is still slow, but far better than 21 minutes. Modern systems exploit:

Prefetching: Read pages ahead of when they're needed
Buffer Pool: Cache frequently accessed pages
Parallel I/O: Use multiple disk channels simultaneously
Vectorized Execution: Process multiple records per CPU cycle

With all optimizations: a full scan of 10M records might take 2-5 seconds on modern hardware.

When to Use Heap Files

Despite the search performance limitations, heap files are the optimal choice in several scenarios:

Ideal Use Cases for Heap Files

•Bulk Loading: When loading large datasets, appending to heap files is fastest. Sort and organize later if needed.
•Write-Heavy Workloads: Applications with many inserts and few reads benefit from O(1) insert cost.
•Small Tables: When n is small, linear scan O(n) is fast enough that overhead of maintaining order isn't worthwhile.
•Full Table Scans: Analytics and reporting that must read all data anyway don't benefit from ordering.
•Staging Tables: Temporary tables for ETL processes where data will be transformed and moved.
•Log Tables: Append-only logs where data is accessed sequentially by time (newest entries).
•Tables with Indexes: The heap file holds data while B+-tree indexes provide fast lookup.

Advantages

•Fastest Inserts: O(1) append with no rebalancing
•No Ordering Overhead: No need to find correct position
•Simple Implementation: Minimal code, fewer bugs
•Efficient Bulk Loads: Just stream data to end of file
•Good for Variable Sizes: No fragmentation from size variations
•Stable RIDs: Records don't move (except for updates that grow)

Disadvantages

•Slow Searches: O(n) for any lookup by value
•No Range Query Support: Cannot exploit ordering
•Space Fragmentation: Deletions can leave holes
•Poor Locality: Related records scattered across file
•Index Required: Need separate indexes for efficient access
•Updates May Move Records: Growing records need relocation

PostgreSQL's Approach

PostgreSQL uses heap organization as the default table storage (called 'heap tables'). Every table without explicit organization is a heap file. B-tree indexes point to heap tuple locations (TID = tuple identifier). This separation of data storage (heap) and access paths (indexes) provides flexibility: you can have zero, one, or many indexes on the same heap file.

Implementation Considerations

Production implementations of heap files must address several practical challenges that the theoretical model glosses over:

1. Concurrency Control

Multiple transactions may attempt to access the same page simultaneously. Implementation must ensure:

Atomicity: A record insert is either fully visible or not at all
Isolation: Concurrent reads and writes don't interfere
Latch Protocols: Short-term locks on pages during modification

concurrent_insert.pseudo

Pseudocode

function concurrent_insert(HeapFile file, Record record) -> RID {
    while (true) {
        // Step 1: Find candidate page (optimistic)
        PageID candidate = file.free_space_map.find_candidate(record.size);
        
        // Step 2: Pin and latch the page
        Page page = buffer_pool.pin_page(file.file_id, candidate);
        page.acquire_exclusive_latch();  // Short-term lock
        
        try {
            // Step 3: Verify there's still space (could have filled since check)
            if (page.free_space() >= record.size + SLOT_SIZE) {
                int slot_num = page.add_record(record);
                buffer_pool.mark_dirty(page);
                return RID(candidate, slot_num);
            }
            // Not enough space, will retry with different page
        } finally {
            page.release_exclusive_latch();
            buffer_pool.unpin_page(page);
        }
        
        // Update free space map and retry
        file.free_space_map.mark_full(candidate);
    }
}

2. Recovery and Logging

For durability, all modifications must be logged:

Write-Ahead Logging (WAL) records the change before modifying the page
Log records contain enough information to redo or undo the operation
Checkpointing periodically flushes dirty pages to reduce recovery time

3. Page Splits and Overflow

When a record doesn't fit in any existing page:

Allocate a new page from the file extent
Add page to the free space tracking structure
Insert record into the new page
Update file header statistics

4. Variable-Length Records

Handling variable-length records adds complexity:

Slot directory must store both offset AND length
Compaction must handle variable-sized moves
Free space calculation must account for slot overhead
Fragmentation is more likely with varying sizes

5. Large Records (TOAST)

Records exceeding a threshold (e.g., 2KB in PostgreSQL) require special handling:

Out-of-line Storage: Large values stored in separate TOAST table
Compression: Large values may be compressed to fit
Chunking: Very large values split across multiple pages
The main heap file stores a pointer to the TOAST data

TOAST = The Oversized-Attribute Storage Technique

PostgreSQL's TOAST system automatically handles large attribute values. When a row would exceed the page size, large columns are compressed and/or moved to a separate TOAST table, with the main row storing just a pointer. This is transparent to SQL queries—you simply SELECT the column, and PostgreSQL retrieves and reconstructs the value automatically.

Heap Files in Production Systems

Let's examine how major database systems implement and optimize heap file organization:

Heap File Implementations Across Database Systems
System	Heap Implementation	Notable Features
PostgreSQL	Heap Tables (default)	MVCC with tuple versioning, TOAST for large values, Visibility Map for vacuum optimization
MySQL InnoDB	Primary key clustered only	Tables are always clustered by PK; no pure heap option
SQL Server	Heap Tables (explicit)	Optional; rowstore without clustered index, forwarding rows for updates
Oracle	Heap-Organized Tables (default)	Rowid-based access, ITL for concurrency, PCT_FREE for updates
SQLite	B-tree always	Even 'heap' tables are stored in B-tree by rowid

PostgreSQL Deep Dive:

PostgreSQL's heap implementation is particularly instructive:

Tuple Headers: Each tuple (record) has a 23-byte header containing:
- Transaction IDs for MVCC (xmin, xmax)
- Command ID within transaction (cid)
- Tuple flags (HOT updated, has null, etc.)
- Offset to null bitmap and user data
Visibility Map: One bit per page indicating whether all tuples on that page are visible to all transactions. Speeds up index-only scans—if all tuples visible, skip heap fetch.
Free Space Map: Tracks free space per page, stored in a separate fork of the relation file. Enables O(1) lookup for pages with sufficient space.
HOT Updates: Heap-Only Tuples optimization. When an update doesn't change indexed columns and new version fits on same page, create a HOT chain without updating indexes.

postgresql_heap_analysis.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Analyze heap file structure in PostgreSQL
 
-- View heap file size and page count
SELECT 
    relname AS table_name,
    pg_size_pretty(pg_relation_size(oid)) AS heap_size,
    relpages AS page_count,
    reltuples AS estimated_rows,
    reltuples / NULLIF(relpages, 0) AS rows_per_page
FROM pg_class
WHERE relname = 'customers';
 
-- Inspect individual page contents using pageinspect extension
CREATE EXTENSION IF NOT EXISTS pageinspect;
 
-- View page header information
SELECT * FROM page_header(get_raw_page('customers', 0));
 
-- Result columns include:
-- lsn:           Log Sequence Number (for recovery)
-- checksum:      Page checksum (if enabled)
-- flags:         Page flags
-- lower:         Offset to start of free space
-- upper:         Offset to end of free space
-- special:       Offset to special space (used by indexes)
-- pagesize:      Page size (usually 8192)
-- version:       Page version
-- prune_xid:     Oldest XID requiring cleanup
 
-- View heap tuple headers
SELECT 
    lp AS line_pointer,
    lp_off AS offset,
    lp_len AS length,
    t_xmin AS insert_xid,
    t_xmax AS delete_xid,
    t_ctid AS current_tuple_id,
    t_infomask::bit(16) AS flags
FROM heap_page_items(get_raw_page('customers', 0))
LIMIT 10;

Summary: Heap File Organization

We've conducted a thorough examination of heap file organization—the foundational storage method in database systems. Let's consolidate the key insights:

Key Takeaways

•Heap files store records without ordering — New records are appended at the end or inserted into any page with free space, without regard to content.
•Inserts are O(1), searches are O(n) — The fundamental trade-off of heap organization favors write performance over read performance.
•Slotted page structure enables variable-length records — The slot directory provides indirection, allowing records to move within a page without changing their RID.
•Free space management is critical — Efficient tracking of available space determines insert performance and space utilization.
•Heap files are ideal for specific workloads — Bulk loading, write-heavy applications, small tables, and tables with indexes all benefit from heap organization.
•Production systems add concurrency, logging, and TOAST — Real implementations layer recovery, concurrent access, and large-value handling onto the basic structure.
•Heap files are the default in PostgreSQL and Oracle — Understanding heap organization is essential for most relational database work.

What's Next:

Now that we understand heap files—the baseline organization—we'll explore sorted file organization. By maintaining records in sorted order, we can dramatically improve search performance, at the cost of more expensive insertions. This trade-off is the essence of file organization design.

Page Complete

You now have a comprehensive understanding of heap file organization—its structure, operations, performance characteristics, and practical applications. This foundational knowledge will serve as the baseline for comparing other file organizations in subsequent pages.

1 / 5

Loading learning content...

Database Management SystemsFile Organization

File Organization Methods

LevelIntermediate

Duration75 mins

TopicFile Organization

1 / 5

Heap File Organization

The Simplest Storage Solution

What You Will Learn

What is a Heap File?

Formal Definition:

A heap file H is a sequence of pages P₁, P₂, ..., Pₙ where:

Each page contains zero or more records
Records within a page have no prescribed order
Pages themselves have no logical ordering relationship
New records are placed in any page with sufficient free space

Heap File vs. Heap Data Structure

Key Characteristics:

No Ordering Constraint: Records are not sorted by any attribute, nor are they clustered by value. The physical position of a record has no relationship to its content.
Append-Oriented Insertion: The default insertion strategy adds records at the end of the file, making inserts very fast.
Space Reclamation: When records are deleted, space may be reclaimed immediately, marked for reuse, or left as fragmentation (implementation-dependent).
Full Scan Requirement: Finding a specific record by value requires scanning the entire file (in the worst case) because there's no ordering to exploit.
Minimal Metadata: Heap files require very little auxiliary structure—just enough to track pages and free space.

Physical Structure of Heap Files

Understanding the physical layout of heap files is crucial for analyzing their performance characteristics. A heap file consists of several key components:

1. File Header

Every heap file begins with a header page (or header block) containing metadata:

File identification and version information
Number of pages currently allocated
Pointer to the first data page
Pointer to a free space directory (if using linked list or page directory)
Statistics (record count, average record size, etc.)

heap_file_header.pseudo

Pseudocode

// Heap File Header Structure
struct HeapFileHeader {
    int32   magic_number;        // File type identifier (e.g., 0xHEAP)
    int32   version;             // File format version
    int64   num_pages;           // Total pages allocated
    int64   num_records;         // Total records (may be approximate)
    int64   first_data_page;     // Page number of first data page
    int64   last_data_page;      // Page number of last data page
    int64   free_space_map_page; // Page number of free space map (if used)
    int32   page_size;           // Size of each page in bytes
    byte    reserved[200];       // Reserved for future use
}
 
// Example: 8KB pages, 4 bytes magic, rest as shown
// Total header: typically one page (8192 bytes)

2. Data Pages

Each data page has its own internal structure:

heap_page_structure.pseudo

Pseudocode

// Heap Page Layout (Slotted Page)
struct HeapPage {
    // Header Section (fixed position at page start)
    struct PageHeader {
        int32 page_id;           // Unique page identifier
        int32 num_slots;         // Number of slots in directory
        int32 free_space_start;  // Offset where free space begins
        int32 free_space_end;    // Offset where slot directory starts
        int16 flags;             // Page flags (e.g., is_full, is_compacted)
        int16 reserved;
    } header;
    
    // Free Space Region
    // Records grow from bottom of page upward
    // Slot directory grows from end of page downward
    byte free_space[];
    
    // Slot Directory (grows backward from page end)
    // Each slot is: (offset, length, flags)
    struct Slot {
        int16 offset;    // Offset of record from page start (-1 if deleted)
        int16 length;    // Length of record
    } slots[];           // slots[0] is closest to page end
}
 
// Record Identifier (RID)
struct RID {
    int64 page_id;   // Which page contains the record
    int32 slot_num;  // Which slot within the page
}
 
/*
 * Visual Layout of a Slotted Page:
 * 
 * ┌─────────────────────────────────────────────┐
 * │ Page Header (page_id, num_slots, etc.)      │
 * ├─────────────────────────────────────────────┤
 * │ Record 3 Data → → → → → → → → → → → → → →  │
 * │ Record 1 Data → → → → → → → → → → → → → →  │
 * │ (Deleted - now free space)                  │
 * │ Record 2 Data → → → → → → → → → → → → → →  │
 * │                                             │
 * │ ═══════════ FREE SPACE ═══════════════════  │
 * │                                             │
 * ├─────────────────────────────────────────────┤
 * │ Slot 3 │ Slot 2 │ Slot 1 │ Slot 0 │← Dir    │
 * └─────────────────────────────────────────────┘
 */

3. Free Space Management

Managing free space efficiently is one of the main challenges in heap file implementation. Three common approaches exist:

Approach A: Linked List of Free Pages

Maintain a linked list connecting all pages with free space
File header points to head of list
Each page includes a "next free page" pointer
Pro: Simple implementation
Con: May need to traverse many pages to find sufficient space

Approach B: Page Directory (Free Space Map)

Dedicated pages contain a directory listing free space per page
Each directory entry: (page_id, free_bytes)
Pro: Quick lookup for page with sufficient space
Con: Additional space overhead for directory

Approach C: Free Space Bitmap

Bitmap where each bit indicates if corresponding page has free space
Often combined with a threshold (e.g., bit=1 if >10% free)
Pro: Compact representation, fast scanning
Con: Coarse granularity may require checking multiple pages

Free Space Management Approaches Comparison
Approach	Space Overhead	Insert Performance	Suitable For
Linked List	O(1) per page	O(n) worst case	Small files, uniform records
Page Directory	O(n/k) pages	O(1) to O(n/k)	Large files, variable records
Free Space Bitmap	O(n/8) bytes	O(n/block) scan	Very large files, approximate

Heap File Operations

Let's analyze the fundamental operations on heap files with rigorous complexity analysis. Understanding these operations is essential for evaluating when heap organization is appropriate.

INSERT Operation

Inserting a record into a heap file is straightforward:

Find a page with sufficient free space
Add the record to that page
Update the slot directory
Return the Record ID (RID)

Complexity Analysis:

Finding a page: O(1) with free space map, O(n) with naive linked list
Adding record to page: O(1) — just write at free space offset
Total: O(1) amortized with proper free space management

Cost in I/O:

Best case: 1 read (free space map) + 1 read/write (data page) = 2 I/Os
Average case: 2-3 I/Os
No index updates needed (the simplest case)

heap_insert.pseudo

Pseudocode

function insert(HeapFile file, Record record) -> RID {
    // Step 1: Calculate space needed
    int space_needed = sizeof(record) + sizeof(Slot);
    
    // Step 2: Find page with sufficient space
    PageID page_id = file.free_space_map.find_page_with_space(space_needed);
    
    if (page_id == NONE) {
        // No existing page has space; allocate new page
        page_id = file.allocate_new_page();
        file.free_space_map.add_page(page_id, PAGE_SIZE - HEADER_SIZE);
    }
    
    // Step 3: Load the page into buffer pool
    Page page = buffer_pool.get_page(file.file_id, page_id);
    
    // Step 4: Insert record into page
    int slot_num = page.add_record(record);
    
    // Step 5: Update free space map
    file.free_space_map.update(page_id, page.free_space());
    
    // Step 6: Mark page dirty and return RID
    buffer_pool.mark_dirty(page);
    return RID(page_id, slot_num);
}
 
function Page.add_record(Record record) -> int {
    // Find next available slot
    int slot_num = this.header.num_slots;
    
    // Calculate record position (grows from bottom up)
    int record_offset = this.header.free_space_start;
    
    // Write record data
    memcpy(this.data + record_offset, record.data, record.length);
    
    // Update slot directory (grows from top down)
    this.slots[slot_num] = Slot(record_offset, record.length);
    
    // Update header
    this.header.num_slots++;
    this.header.free_space_start += record.length;
    this.header.free_space_end -= sizeof(Slot);
    
    return slot_num;
}

SEARCH Operation

Searching in a heap file—finding records that match a given predicate—is the primary weakness of this organization:

Equality Search (e.g., WHERE id = 42):

Must scan all records until match found (or entire file if no match)
Average case: scan half the file
Worst case: scan entire file
Complexity: O(n) where n = number of records

Range Search (e.g., WHERE price BETWEEN 10 AND 100):

Must scan all records; cannot stop early
Complexity: O(n) always

Cost in I/O:

Must read all pages: B pages × 1 I/O per page = B I/Os
For a 1GB heap file with 8KB pages: ~125,000 I/O operations!

The Search Problem

DELETE Operation

Deleting a record identified by its RID proceeds as follows:

Locate the page using the page_id from RID
Mark the slot as deleted (set offset to -1 or set deleted flag)
Optionally compact the page to reclaim space
Update free space tracking

Complexity:

With RID: O(1) — direct page access
Without RID (by value): O(n) — must search first

Space Reclamation Strategies:

Immediate Compaction: Shift remaining records to eliminate gap. Pro: No fragmentation. Con: Expensive for each delete.
Lazy Reclamation: Mark slot as deleted, reuse space later. Pro: Fast deletes. Con: Internal fragmentation.
Periodic Vacuum: Batch compaction during maintenance windows. Pro: Amortized cost. Con: Temporary space waste.

heap_delete.pseudo

Pseudocode

function delete_by_rid(HeapFile file, RID rid) -> bool {
    // Step 1: Load the page
    Page page = buffer_pool.get_page(file.file_id, rid.page_id);
    
    // Step 2: Validate slot exists and is not already deleted
    if (rid.slot_num >= page.header.num_slots) {
        return false;  // Invalid slot number
    }
    
    Slot slot = page.slots[rid.slot_num];
    if (slot.offset == DELETED_MARKER) {
        return false;  // Already deleted
    }
    
    // Step 3: Mark slot as deleted
    int freed_space = slot.length;
    page.slots[rid.slot_num].offset = DELETED_MARKER;
    page.slots[rid.slot_num].length = 0;
    
    // Step 4: Update page statistics
    page.header.deleted_count++;
    page.header.fragmented_bytes += freed_space;
    
    // Step 5: Optionally compact if fragmentation exceeds threshold
    if (page.fragmentation_ratio() > COMPACTION_THRESHOLD) {
        page.compact();  // Shift records to eliminate gaps
    }
    
    // Step 6: Update free space map
    file.free_space_map.update(rid.page_id, page.total_free_space());
    
    buffer_pool.mark_dirty(page);
    return true;
}
 
function Page.compact() {
    // Create temporary buffer
    byte temp[PAGE_SIZE];
    int write_offset = sizeof(PageHeader);
    
    // Copy non-deleted records contiguously
    for (int i = 0; i < this.header.num_slots; i++) {
        if (this.slots[i].offset != DELETED_MARKER) {
            int len = this.slots[i].length;
            memcpy(temp + write_offset, this.data + this.slots[i].offset, len);
            this.slots[i].offset = write_offset;
            write_offset += len;
        }
    }
    
    // Copy compacted data back
    memcpy(this.data + sizeof(PageHeader), temp + sizeof(PageHeader), 
           write_offset - sizeof(PageHeader));
    this.header.free_space_start = write_offset;
    this.header.fragmented_bytes = 0;
}

UPDATE Operation

Updating a record has two cases:

Case 1: Update in Place (record size unchanged)

Locate page via RID
Overwrite record data at existing slot
Complexity: O(1)

Case 2: Record Size Changes

If new size ≤ old size: update in place, potentially create internal fragment
If new size > old size and fits in page: may need to relocate within page
If new size > old size and doesn't fit: delete old record, insert new record (new RID!)

The potential for RID changes when records grow is a significant complication. Systems handle this through:

Forwarding Addresses: Old location contains pointer to new location
Fixed Maximum Record Size: Pre-allocate maximum possible space
Separate Overflow Pages: Large records go to overflow area

Performance Analysis

Let's establish a rigorous performance model for heap files. This analysis forms the baseline against which we'll compare other file organizations.

Notation:

n = number of records in the file
R = size of each record (bytes)
B = number of disk pages (blocks) in the file
P = page size (bytes)
D = time for one disk I/O operation (seek + transfer)
C = time to process one record in memory

Derived Relationships:

Records per page: r = ⌊(P - header_overhead) / R⌋
Total pages: B = ⌈n / r⌉

Heap File Operation Costs
Operation	I/O Cost (Pages)	CPU Cost	Notes
Insert	1-2	O(1)	Find page + write page
Delete (by RID)	1	O(1)	Direct page access
Delete (by value)	B/2 avg, B worst	O(n)	Must search first
Search (equality)	B/2 avg, B worst	O(n)	Linear scan
Search (range)	B	O(n)	Must scan all
Full scan	B	O(n)	Sequential I/O

Concrete Example:

Consider a customer table with the following characteristics:

10 million records (n = 10,000,000)
Record size: 200 bytes (R = 200)
Page size: 8 KB (P = 8,192 bytes)
Page header overhead: 100 bytes
Disk I/O time: 10 ms average (D = 0.01 seconds)
CPU processing: 1 μs per record (C = 0.000001 seconds)

Calculations:

Records per page: r = ⌊(8192 - 100) / 200⌋ = 40 records
Total pages: B = ⌈10,000,000 / 40⌉ = 250,000 pages
Total file size: 250,000 × 8 KB = ~2 GB

Search Cost:

I/O time (average): 125,000 pages × 10 ms = 1,250 seconds ≈ 21 minutes
CPU time: 5,000,000 records × 1 μs = 5 seconds
Total: ~21 minutes for a single equality search!

This analysis demonstrates why heap files alone are impractical for search-heavy workloads.

Sequential vs Random I/O

Revised Analysis with Sequential I/O:

With sequential access optimization:

Sequential I/O time: 0.1 ms per page
Total I/O time: 250,000 × 0.1 ms = 25 seconds
With CPU time: ~30 seconds total

This is still slow, but far better than 21 minutes. Modern systems exploit:

Prefetching: Read pages ahead of when they're needed
Buffer Pool: Cache frequently accessed pages
Parallel I/O: Use multiple disk channels simultaneously
Vectorized Execution: Process multiple records per CPU cycle

With all optimizations: a full scan of 10M records might take 2-5 seconds on modern hardware.

When to Use Heap Files

Despite the search performance limitations, heap files are the optimal choice in several scenarios:

Ideal Use Cases for Heap Files

•Bulk Loading: When loading large datasets, appending to heap files is fastest. Sort and organize later if needed.
•Write-Heavy Workloads: Applications with many inserts and few reads benefit from O(1) insert cost.
•Small Tables: When n is small, linear scan O(n) is fast enough that overhead of maintaining order isn't worthwhile.
•Full Table Scans: Analytics and reporting that must read all data anyway don't benefit from ordering.
•Staging Tables: Temporary tables for ETL processes where data will be transformed and moved.
•Log Tables: Append-only logs where data is accessed sequentially by time (newest entries).
•Tables with Indexes: The heap file holds data while B+-tree indexes provide fast lookup.

Advantages

•Fastest Inserts: O(1) append with no rebalancing
•No Ordering Overhead: No need to find correct position
•Simple Implementation: Minimal code, fewer bugs
•Efficient Bulk Loads: Just stream data to end of file
•Good for Variable Sizes: No fragmentation from size variations
•Stable RIDs: Records don't move (except for updates that grow)

Disadvantages

•Slow Searches: O(n) for any lookup by value
•No Range Query Support: Cannot exploit ordering
•Space Fragmentation: Deletions can leave holes
•Poor Locality: Related records scattered across file
•Index Required: Need separate indexes for efficient access
•Updates May Move Records: Growing records need relocation

PostgreSQL's Approach

Implementation Considerations

Production implementations of heap files must address several practical challenges that the theoretical model glosses over:

1. Concurrency Control

Multiple transactions may attempt to access the same page simultaneously. Implementation must ensure:

Atomicity: A record insert is either fully visible or not at all
Isolation: Concurrent reads and writes don't interfere
Latch Protocols: Short-term locks on pages during modification

concurrent_insert.pseudo

Pseudocode

function concurrent_insert(HeapFile file, Record record) -> RID {
    while (true) {
        // Step 1: Find candidate page (optimistic)
        PageID candidate = file.free_space_map.find_candidate(record.size);
        
        // Step 2: Pin and latch the page
        Page page = buffer_pool.pin_page(file.file_id, candidate);
        page.acquire_exclusive_latch();  // Short-term lock
        
        try {
            // Step 3: Verify there's still space (could have filled since check)
            if (page.free_space() >= record.size + SLOT_SIZE) {
                int slot_num = page.add_record(record);
                buffer_pool.mark_dirty(page);
                return RID(candidate, slot_num);
            }
            // Not enough space, will retry with different page
        } finally {
            page.release_exclusive_latch();
            buffer_pool.unpin_page(page);
        }
        
        // Update free space map and retry
        file.free_space_map.mark_full(candidate);
    }
}

2. Recovery and Logging

For durability, all modifications must be logged:

Write-Ahead Logging (WAL) records the change before modifying the page
Log records contain enough information to redo or undo the operation
Checkpointing periodically flushes dirty pages to reduce recovery time

3. Page Splits and Overflow

When a record doesn't fit in any existing page:

Allocate a new page from the file extent
Add page to the free space tracking structure
Insert record into the new page
Update file header statistics

4. Variable-Length Records

Handling variable-length records adds complexity:

Slot directory must store both offset AND length
Compaction must handle variable-sized moves
Free space calculation must account for slot overhead
Fragmentation is more likely with varying sizes

5. Large Records (TOAST)

Records exceeding a threshold (e.g., 2KB in PostgreSQL) require special handling:

Out-of-line Storage: Large values stored in separate TOAST table
Compression: Large values may be compressed to fit
Chunking: Very large values split across multiple pages
The main heap file stores a pointer to the TOAST data

TOAST = The Oversized-Attribute Storage Technique

Heap Files in Production Systems

Let's examine how major database systems implement and optimize heap file organization:

Heap File Implementations Across Database Systems
System	Heap Implementation	Notable Features
PostgreSQL	Heap Tables (default)	MVCC with tuple versioning, TOAST for large values, Visibility Map for vacuum optimization
MySQL InnoDB	Primary key clustered only	Tables are always clustered by PK; no pure heap option
SQL Server	Heap Tables (explicit)	Optional; rowstore without clustered index, forwarding rows for updates
Oracle	Heap-Organized Tables (default)	Rowid-based access, ITL for concurrency, PCT_FREE for updates
SQLite	B-tree always	Even 'heap' tables are stored in B-tree by rowid

PostgreSQL Deep Dive:

PostgreSQL's heap implementation is particularly instructive:

Tuple Headers: Each tuple (record) has a 23-byte header containing:
- Transaction IDs for MVCC (xmin, xmax)
- Command ID within transaction (cid)
- Tuple flags (HOT updated, has null, etc.)
- Offset to null bitmap and user data
Visibility Map: One bit per page indicating whether all tuples on that page are visible to all transactions. Speeds up index-only scans—if all tuples visible, skip heap fetch.
Free Space Map: Tracks free space per page, stored in a separate fork of the relation file. Enables O(1) lookup for pages with sufficient space.
HOT Updates: Heap-Only Tuples optimization. When an update doesn't change indexed columns and new version fits on same page, create a HOT chain without updating indexes.

postgresql_heap_analysis.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Analyze heap file structure in PostgreSQL
 
-- View heap file size and page count
SELECT 
    relname AS table_name,
    pg_size_pretty(pg_relation_size(oid)) AS heap_size,
    relpages AS page_count,
    reltuples AS estimated_rows,
    reltuples / NULLIF(relpages, 0) AS rows_per_page
FROM pg_class
WHERE relname = 'customers';
 
-- Inspect individual page contents using pageinspect extension
CREATE EXTENSION IF NOT EXISTS pageinspect;
 
-- View page header information
SELECT * FROM page_header(get_raw_page('customers', 0));
 
-- Result columns include:
-- lsn:           Log Sequence Number (for recovery)
-- checksum:      Page checksum (if enabled)
-- flags:         Page flags
-- lower:         Offset to start of free space
-- upper:         Offset to end of free space
-- special:       Offset to special space (used by indexes)
-- pagesize:      Page size (usually 8192)
-- version:       Page version
-- prune_xid:     Oldest XID requiring cleanup
 
-- View heap tuple headers
SELECT 
    lp AS line_pointer,
    lp_off AS offset,
    lp_len AS length,
    t_xmin AS insert_xid,
    t_xmax AS delete_xid,
    t_ctid AS current_tuple_id,
    t_infomask::bit(16) AS flags
FROM heap_page_items(get_raw_page('customers', 0))
LIMIT 10;

Summary: Heap File Organization

We've conducted a thorough examination of heap file organization—the foundational storage method in database systems. Let's consolidate the key insights:

Key Takeaways

•Heap files store records without ordering — New records are appended at the end or inserted into any page with free space, without regard to content.
•Inserts are O(1), searches are O(n) — The fundamental trade-off of heap organization favors write performance over read performance.
•Slotted page structure enables variable-length records — The slot directory provides indirection, allowing records to move within a page without changing their RID.
•Free space management is critical — Efficient tracking of available space determines insert performance and space utilization.
•Heap files are ideal for specific workloads — Bulk loading, write-heavy applications, small tables, and tables with indexes all benefit from heap organization.
•Production systems add concurrency, logging, and TOAST — Real implementations layer recovery, concurrent access, and large-value handling onto the basic structure.
•Heap files are the default in PostgreSQL and Oracle — Understanding heap organization is essential for most relational database work.

What's Next:

Page Complete

1 / 5