Loading content...
In the previous page, we saw that heap files sacrifice search efficiency for insertion speed. Every lookup requires scanning the entire file—a prohibitive cost for large datasets. The natural question arises: What if we kept records sorted?
Sorted file organization (also called sequential file or ordered file) maintains records in a specified order based on a designated sort key. This simple change fundamentally transforms the search equation: instead of O(n) linear scan, we can use binary search to find any record in O(log n) time.
But this improvement comes at a cost. Understanding the trade-off between search efficiency and maintenance cost is essential for choosing the right file organization for your workload.
By the end of this page, you will understand how sorted files are structured, master the binary search algorithm for page-based access, analyze the cost of maintaining sort order during insertions and deletions, and evaluate when sorted organization is worth the maintenance overhead.
A sorted file (also called an ordered file or sequential file) maintains all records in sorted order according to a designated attribute called the sort key (or ordering key). The sort key determines the physical arrangement of records on disk.
Formal Definition:
A sorted file S on attribute A is a sequence of pages P₁, P₂, ..., Pₙ where:
Key Properties:
Single Sort Key: A file can only be sorted by one attribute (or a composite of attributes). You cannot have a file sorted by both customer_id and order_date independently.
Physical Ordering: Unlike an index which provides logical ordering, sorted files are physically ordered. The record for key=100 literally appears before the record for key=200 on disk.
Deterministic Placement: Given a new record, there is exactly one correct position for it in the file, determined by the sort key value.
Exploitable Locality: Records with similar key values are stored in adjacent pages, benefiting range queries.
The term 'sequential file' can be confusing. Historically, it meant reading records in sequence (like tape storage). Today, 'sorted file' is more precise—it emphasizes that records are maintained in key order, not just that they're accessed sequentially. Both heap and sorted files support sequential access; only sorted files guarantee ordering.
Choosing the Sort Key:
The choice of sort key is critical and depends on the workload:
| Sort Key Choice | Benefits | Typical Use Case |
|---|---|---|
| Primary key | Unique lookups are optimized | OLTP systems with key-based access |
| Foreign key | Join performance improved | Tables frequently joined on that key |
| Date/timestamp | Time-range queries efficient | Log tables, time-series data |
| Frequently filtered column | WHERE clause performance | Reports filtering on that column |
Important: You can only sort by one key. Choose the one that benefits the most common or most expensive queries. Other access patterns will require indexes or full scans.
The physical structure of a sorted file is similar to a heap file, with one crucial difference: records must maintain their sorted order across all pages.
File Layout:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
/* * Sorted File Layout * * ┌─────────────────────────────────────────────────────────────────┐ * │ FILE HEADER │ * │ - Sort key attribute │ * │ - Sort order (ASC/DESC) │ * │ - Page count, record count │ * │ - Pointer to overflow area (if used) │ * └─────────────────────────────────────────────────────────────────┘ * * ┌─────────────────────────────────────────────────────────────────┐ * │ PAGE 1 │ * │ Records with keys: 1, 5, 12, 23, 45, 67 │ * │ Min key: 1, Max key: 67 │ * └─────────────────────────────────────────────────────────────────┘ * ┌─────────────────────────────────────────────────────────────────┐ * │ PAGE 2 │ * │ Records with keys: 89, 101, 134, 156, 178, 199 │ * │ Min key: 89, Max key: 199 │ * └─────────────────────────────────────────────────────────────────┘ * ┌─────────────────────────────────────────────────────────────────┐ * │ PAGE 3 │ * │ Records with keys: 234, 267, 289, 312, 345, 378 │ * │ Min key: 234, Max key: 378 │ * └─────────────────────────────────────────────────────────────────┘ * * └─── ... more pages ... ───┘ * * ┌─────────────────────────────────────────────────────────────────┐ * │ OVERFLOW AREA (Optional) │ * │ Records inserted after initial load, waiting for reorganization │ * └─────────────────────────────────────────────────────────────────┘ */ struct SortedFileHeader { int32 magic_number; // File type identifier int32 version; // File format version int64 num_pages; // Total pages in main area int64 num_records; // Total records int64 overflow_pages; // Pages in overflow area int32 sort_key_offset; // Offset of sort key in record int32 sort_key_length; // Length of sort key int16 sort_key_type; // Type (INT, VARCHAR, etc.) int16 sort_order; // ASC = 1, DESC = 2 int64 last_reorganization; // Timestamp of last reorg byte reserved[200];}Page-Level Organization:
Within each page, records are also stored in sorted order:
The Key Boundary Property
This property is crucial for binary search:
To find key=150, we can immediately identify that it must be in Page 2 (if it exists), without examining Pages 1 or 3.
Many implementations maintain a sparse directory in memory: a sorted array of (first_key, page_id) pairs. Binary search on this small directory identifies the target page in O(log B) comparisons (B = number of pages), requiring zero disk I/O for the page lookup step.
The primary advantage of sorted files is the ability to use binary search to locate records. This reduces search complexity from O(n) to O(log n)—a dramatic improvement for large files.
Two-Level Binary Search:
Searching a sorted file involves two levels of binary search:
Complexity Analysis:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
function search_sorted_file(SortedFile file, Key target) -> Record? { // Phase 1: Binary search to find the correct page PageID page_id = binary_search_pages(file, target); if (page_id == NOT_FOUND) { return null; // Key is outside file's key range } // Phase 2: Load the page and search within it Page page = buffer_pool.get_page(file.file_id, page_id); Record record = binary_search_within_page(page, target); return record; // null if not found within page} function binary_search_pages(SortedFile file, Key target) -> PageID { int low = 0; int high = file.header.num_pages - 1; while (low <= high) { int mid = (low + high) / 2; // Read page header to get key boundaries PageBoundary bounds = get_page_boundaries(file, mid); if (target < bounds.min_key) { high = mid - 1; // Target is in earlier page } else if (target > bounds.max_key) { low = mid + 1; // Target is in later page } else { return mid; // Target is within this page's range } } return NOT_FOUND; // Key doesn't exist in file} function binary_search_within_page(Page page, Key target) -> Record? { int low = 0; int high = page.header.num_records - 1; while (low <= high) { int mid = (low + high) / 2; Record record = page.get_record(mid); Key record_key = extract_key(record); if (record_key == target) { return record; // Found! } else if (record_key < target) { low = mid + 1; // Search upper half } else { high = mid - 1; // Search lower half } } return null; // Key not found in this page} /* * I/O Cost Analysis: * * Without page boundary cache: * - Each step of page binary search: 1 I/O to read page header * - Total page search: O(log B) I/Os * - Final page read: 1 I/O * - Total: O(log B) I/Os * * With sparse page directory in memory: * - Page binary search: 0 I/Os (all in memory) * - Final page read: 1 I/O * - Total: O(1) I/Os for equality search! * * This is why maintaining a sparse directory is critical. */Concrete Example:
Consider a sorted file with:
Heap File Search (Comparison):
Sorted File Search (Binary Search):
The improvement: 126,000x faster for equality search!
The dramatic improvement assumes the sparse directory fits in memory. For a 250,000-page file with 8-byte keys and 4-byte page IDs, the directory needs ~3MB—easily fitting in the buffer pool. But for billions of pages, even the directory may require disk I/O, reducing but not eliminating the advantage.
Beyond equality searches, sorted files excel at range queries—queries that retrieve all records with keys within a specified range.
Range Query Algorithm:
Complexity:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
function range_query(SortedFile file, Key low, Key high) -> List<Record> { List<Record> results = new List(); // Step 1: Binary search for starting page PageID start_page = find_first_page_geq(file, low); if (start_page == NOT_FOUND) { return results; // No records >= low exist } // Step 2: Sequential scan through pages until we exceed high PageID current_page = start_page; while (current_page < file.header.num_pages) { Page page = buffer_pool.get_page(file.file_id, current_page); // Check if this page could contain any matching records if (page.min_key > high) { break; // All remaining pages have keys > high } // Scan records in this page for (int i = 0; i < page.header.num_records; i++) { Record record = page.get_record(i); Key key = extract_key(record); if (key > high) { // Records are sorted, so no more matches in this page break; } if (key >= low) { results.add(record); } } current_page++; } return results;} /* * Example: Range query for orders with dates in [2024-01-01, 2024-01-31] * * File: 10 million orders, 40 per page = 250,000 pages * Selectivity: 1 month out of 10 years ≈ 0.83% * Matching records: ~83,000 records over ~2,100 pages * * Heap file: 250,000 page reads (full scan) * Sorted file: 1 (find start) + 2,100 (scan range) = 2,101 page reads * * Improvement: 119x fewer I/Os */Range Query Performance Characteristics:
| Query Selectivity | Heap File I/O | Sorted File I/O | Speedup |
|---|---|---|---|
| 0.01% (very selective) | B pages | ~B/10000 pages | ~10,000x |
| 1% (selective) | B pages | ~B/100 pages | ~100x |
| 10% (moderate) | B pages | ~B/10 pages | ~10x |
| 50% (broad) | B pages | ~B/2 pages | ~2x |
| 100% (full scan) | B pages | B pages | 1x (same) |
Key Insight: The more selective the range query, the greater the benefit of sorted organization. For full table scans, there's no advantage—both organizations read all pages.
Range queries on sorted files benefit from sequential I/O. The pages within the range are physically contiguous, enabling prefetching and maximizing disk throughput. A range query reading 1,000 consecutive pages is far faster than 1,000 random page reads—potentially 100x faster on HDDs, and 5-10x faster even on SSDs.
The search advantages of sorted files come with a significant trade-off: maintaining sort order during insertions is expensive.
The Insertion Problem:
When inserting a new record with key K:
Step 2 is the problem. If the target page is full, we cannot simply add the record. Options include:
Option A: Shift All Subsequent Records
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
function insert_with_shift(SortedFile file, Record new_record) -> RID { Key key = extract_key(new_record); // Find target page PageID target_page = find_page_for_key(file, key); Page page = buffer_pool.get_page(file.file_id, target_page); if (page.has_space_for(new_record)) { // Simple case: room in the target page return insert_into_page_sorted(page, new_record); } // Complex case: page is full, need to shift // Find last record in last page that needs to shift // Create new page at end of file PageID new_page_id = file.allocate_new_page(); Page new_page = buffer_pool.get_page(file.file_id, new_page_id); // Shift last record from each full page to the next page // Working backwards from end of file to target page for (PageID p = file.header.num_pages - 2; p >= target_page; p--) { Page current = buffer_pool.get_page(file.file_id, p); Page next = buffer_pool.get_page(file.file_id, p + 1); // Move last record from current to first position of next Record last = current.remove_last_record(); next.insert_at_beginning(last); buffer_pool.mark_dirty(current); buffer_pool.mark_dirty(next); } // Now target page has room return insert_into_page_sorted(page, new_record);} /* * Cost Analysis: * * For a file with B pages, inserting into the first page: * - Read B pages * - Write B pages * - Total: 2B I/O operations * * For 10 million records (250,000 pages) at 10ms per I/O: * - 500,000 I/O operations * - 5,000 seconds ≈ 1.4 hours per insert! * * This is clearly unacceptable for transactional workloads. */Option B: Overflow Chain
Instead of immediately shifting, append new records to an overflow area and periodically reorganize:
This trades immediate O(B) cost for amortized cost:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
/* * Sorted File with Overflow Area * * Main File (Sorted): * ┌────────────────┐ * │ Page 1 │──────┐ * │ Keys: 1-100 │ │ Overflow pointer * └────────────────┘ ▼ * ┌────────────────┐ * │ Overflow Block │ * │ Key: 47, 83 │ * └────────────────┘ * * ┌────────────────┐ * │ Page 2 │──────┐ * │ Keys: 101-200 │ │ * └────────────────┘ ▼ * ┌────────────────┐ * │ Overflow Block │ * │ Key: 156 │ * └────────────────┘ */ function insert_with_overflow(SortedFile file, Record new_record) -> RID { Key key = extract_key(new_record); // Find target page PageID target_page = find_page_for_key(file, key); Page page = buffer_pool.get_page(file.file_id, target_page); if (page.has_space_for(new_record)) { return insert_into_page_sorted(page, new_record); } // Insert into overflow area OverflowBlock overflow = get_or_create_overflow(page); int slot = overflow.append(new_record); // Update statistics file.header.overflow_record_count++; // Check if reorganization is needed if (should_reorganize(file)) { schedule_background_reorganization(file); } return RID(target_page, slot, IS_OVERFLOW);} function search_with_overflow(SortedFile file, Key target) -> Record? { // First, binary search in main file PageID page_id = binary_search_pages(file, target); Page page = buffer_pool.get_page(file.file_id, page_id); // Search within page Record record = binary_search_within_page(page, target); if (record != null) { return record; } // Not in main page - check overflow chain OverflowBlock overflow = page.overflow_pointer; while (overflow != null) { record = linear_search(overflow, target); if (record != null) { return record; } overflow = overflow.next; } return null; // Not found anywhere}As overflow chains grow, search performance degrades toward heap file levels. Each overflow block in a chain requires a separate I/O and linear search. Production systems must balance insertion efficiency against search degradation by triggering reorganization at appropriate thresholds (e.g., when overflow > 10% of main file).
Deletion in Sorted Files:
Deletion faces a similar challenge: removing a record from the middle of a sorted file creates a gap that either remains as fragmentation or requires shifting records.
Deletion Strategies:
1. Mark as Deleted (Tombstone)
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
function delete_sorted(SortedFile file, Key key) -> bool { // Binary search to find the record PageID page_id = binary_search_pages(file, key); Page page = buffer_pool.get_page(file.file_id, page_id); int slot = binary_search_slot(page, key); if (slot == NOT_FOUND) { // Check overflow chain OverflowBlock overflow = page.overflow_pointer; while (overflow != null) { slot = linear_search_slot(overflow, key); if (slot != NOT_FOUND) { overflow.mark_deleted(slot); file.header.deleted_count++; return true; } overflow = overflow.next; } return false; // Key not found } // Option A: Mark as deleted (fast) page.mark_deleted(slot); file.header.deleted_count++; // Option B: Shift records to fill gap (maintains density) // page.shift_records_after(slot); // file.adjust_subsequent_pages_if_needed(); buffer_pool.mark_dirty(page); return true;} /* * Fragmentation Impact: * * After many deletions, pages may have significant unused space: * * Page before deletions: [R1][R2][R3][R4][R5][R6] - 0% fragmentation * Page after deletions: [R1][XX][R3][XX][R5][XX] - 50% fragmentation * * Effects: * - Range queries read more pages for same result count * - Memory usage increases (caching partially empty pages) * - Binary search may need to skip tombstones */File Reorganization:
To reclaim space and restore performance, sorted files require periodic reorganization (also called compaction or rebuild):
Reorganization Cost:
Reorganization Triggers:
| Factor | Frequent Reorg | Infrequent Reorg |
|---|---|---|
| Search Performance | Optimal (no overflow, no fragmentation) | Degraded (long overflow chains, tombstones) |
| Write Amplification | High (frequent file rewrites) | Low (writes deferred) |
| Space Utilization | High (deleted space reclaimed quickly) | Lower (fragmentation accumulates) |
| Availability Impact | Frequent brief locks | Rare but longer downtime |
| Best For | Read-heavy, few writes | Write-heavy, tolerates variability |
Let's consolidate the performance characteristics of sorted files and compare directly with heap files.
Operation Complexity Summary:
| Operation | Heap File | Sorted File | Winner |
|---|---|---|---|
| Insert | O(1) — 1-2 I/Os | O(B) worst, O(1) with overflow | Heap |
| Delete by RID | O(1) — 1 I/O | O(1) — 1 I/O | Tie |
| Delete by Key | O(n) — B I/Os avg | O(log B) — log B I/Os | Sorted |
| Equality Search | O(n) — B/2 I/Os avg | O(log B) — ~1 I/O with cache | Sorted |
| Range Search | O(n) — B I/Os | O(log B + k/r) I/Os | Sorted |
| Full Scan | O(n) — B I/Os | O(n) — B I/Os | Tie |
| Sorted Output | O(n log n) — External sort | O(n) — Sequential read | Sorted |
Detailed I/O Cost Formulas:
Let:
Sorted File Costs:
| Operation | I/O Formula | With Cached Directory |
|---|---|---|
| Equality search | log₂(B) + 1 | 1 (plus O if in overflow) |
| Range query | log₂(B) + ⌈k/r⌉ | 1 + ⌈k/r⌉ |
| Insert (no overflow) | log₂(B) + 1 | 2 |
| Insert (with overflow) | log₂(B) + 1 | 2 |
| Reorganization | 2B | 2B |
When Sorted Files Win:
When Heap Files Win:
In practice, most systems use heap files for data storage and B+-tree indexes for search acceleration. This provides O(1) inserts (heap file) with O(log n) searches (via index). The sorted file organization is primarily used for: (1) index leaf levels (which are sorted by key), (2) ISAM systems, and (3) data warehouses with infrequent updates.
Sorted file organization appears in several important contexts in modern database systems:
1. ISAM (Indexed Sequential Access Method)
Historically important, ISAM systems stored data in sorted files with a sparse index:
2. LSM Trees (Log-Structured Merge Trees)
Modern write-optimized systems like RocksDB and LevelDB use sorted files as components:
123456789101112131415161718192021222324252627282930313233343536373839404142
/* * LSM Tree Structure (uses sorted files extensively) * * Level 0: MemTable (in-memory, sorted) * ↓ (flush when full) * * Level 1: SSTable 1 (sorted file on disk) * SSTable 2 (sorted file on disk) * ↓ (merge when too many) * * Level 2: SSTable A (larger sorted file) * SSTable B (larger sorted file) * ↓ (merge when too many) * * Level 3: SSTable X (even larger sorted file) * ... * * Each level is sorted, and levels grow exponentially in size. * Writes: O(1) amortized (append to MemTable) * Reads: O(L * log N) where L = levels, N = records per level * Space amplification: O(L) copies of each record during merges */ function lsm_read(key) { // Check MemTable first (newest data) if (memtable.contains(key)) { return memtable.get(key); } // Check each level's sorted files for (level in [1, 2, 3, ...]) { for (sstable in level.files) { // Binary search within sorted file result = sstable.binary_search(key); if (result != null) { return result; } } } return null; // Key not found}3. Data Warehouse Column Stores
Column-oriented databases like Amazon Redshift and Apache Parquet use sorted files:
4. Clustered Indexes in Databases
When you create a clustered index, the table data is physically sorted:
5. External Sort Output
When sorting large datasets that don't fit in memory:
| System | Usage of Sorted Files | Key Benefit |
|---|---|---|
| RocksDB/LevelDB | SSTables at every level | Write optimization + range queries |
| Apache Cassandra | SSTables for persistent storage | Fast reads with bloom filters |
| Amazon Redshift | Sort keys for tables | Zone maps, compression, range scans |
| Apache Parquet | Sorted row groups | Predicate pushdown, better compression |
| SQL Server Clustered | B+-tree leaf = sorted data | Range scans on clustering key |
We've conducted a thorough analysis of sorted file organization—understanding both its powerful search capabilities and its maintenance costs. Let's consolidate the key insights:
What's Next:
We've seen that sorted files trade insertion efficiency for search speed. In the next page, we'll examine hash file organization—an approach that provides O(1) equality searches at the cost of abandoning ordering entirely. Where sorted files excel at range queries, hash files are unmatched for point lookups.
You now have a comprehensive understanding of sorted file organization—how it enables efficient search through binary search, the maintenance challenges it creates, and the real-world systems that employ it. This knowledge is essential for understanding B+-trees, LSM trees, and clustered indexing strategies.