Loading content...
Every design choice involves tradeoffs. Linked allocation's elimination of external fragmentation comes at a severe cost: random access performance degrades linearly with file position. To read byte 1,000,000 of a file, you must first traverse every block that comes before it.
This isn't a minor inconvenience—it's a fundamental algorithmic limitation that makes pure linked allocation unsuitable for databases, media players, spreadsheets, and virtually any application that needs to jump to arbitrary positions within files. Understanding this limitation deeply is essential for appreciating both why linked allocation found its niche and why more sophisticated schemes eventually replaced it.
By the end of this page, you will understand the mathematical proof of O(n) random access complexity, see concrete performance measurements demonstrating this problem, analyze which workloads are affected most severely, and appreciate why this limitation drove the development of alternatives like indexed allocation and the FAT optimization.
To understand linked allocation's random access problem, we must compare it with the ideal case.
Contiguous Allocation: O(1) Random Access
With contiguous allocation, accessing any byte is direct arithmetic:
Position of byte B in file starting at block S:
block_number = S + (B / block_size)
offset_in_block = B % block_size
disk_address = block_number × block_size + offset_in_block
No matter how large the file, computing the disk location of byte 1,000,000 or byte 1 takes the same constant time—a few arithmetic operations.
Linked Allocation: O(n) Random Access
With linked allocation, the blocks are scattered. The only way to find block N is to follow the chain from block 0:
To read byte B:
1. Calculate target_block_index = B / data_per_block
2. Start at first_block (from directory)
3. Read block, extract next_pointer
4. Repeat step 3 for (target_block_index) times
5. Finally read the target block
Disk reads required: target_block_index + 1 = O(n)
The Critical Difference:
| Access Pattern | Contiguous | Linked |
|---|---|---|
| Read block 0 | 1 disk I/O | 1 disk I/O |
| Read block 10 | 1 disk I/O | 11 disk I/Os |
| Read block 100 | 1 disk I/O | 101 disk I/Os |
| Read block 1000 | 1 disk I/O | 1001 disk I/Os |
| Read block 10000 | 1 disk I/O | 10001 disk I/Os |
The disparity grows without bound.
Each disk I/O on an HDD typically involves a seek operation (5-15ms). For block 1000 in a linked file, that's potentially 1001 seeks = 5-15 SECONDS just to reach the data. Compare this to ~5ms for contiguous access.
Let's rigorously analyze the complexity of various operations in linked allocation.
Definitions:
Sequential Read (Complete File):
Time to read entire file:
T_seq = n × t_read + (scatter_factor × n × t_seek)
Where scatter_factor ≈ 0 for contiguous, ≈ 1 for fully scattered
For contiguous linked blocks: T_seq ≈ n × 0.1ms
For scattered linked blocks: T_seq ≈ n × 8.1ms (81× slower)
Random Access (Single Read at Position p):
123456789101112131415161718192021222324252627282930313233343536373839404142434445
// Analysis of linked allocation access complexity // Time to access byte at position 'offset' in a linked filedouble time_to_access(uint64_t offset, uint32_t block_size, uint32_t pointer_size, double seek_time_ms, double read_time_ms) { uint32_t data_per_block = block_size - pointer_size; uint64_t target_block = offset / data_per_block; // Must traverse all blocks from 0 to target_block // Each traversal step requires: // - Reading the current block (to get data and next pointer) // - Seeking to the next block (if scattered) // Best case: blocks are contiguous // Traversal: (target_block + 1) reads, no seeks double best_case = (target_block + 1) * read_time_ms; // Average case: blocks have some locality // Assume 50% of transitions require seeks double avg_seeks = target_block * 0.5; double avg_case = (target_block + 1) * read_time_ms + avg_seeks * seek_time_ms; // Worst case: every block requires a seek double worst_case = (target_block + 1) * (read_time_ms + seek_time_ms); return avg_case; // Return average case estimate} // Example calculations:// 4KB blocks, 4-byte pointers, 8ms seek, 0.1ms read // Access byte 1,000,000:// target_block = 1,000,000 / 4092 = 244// avg_case = 245 * 0.1 + 122 * 8 = 24.5 + 976 = 1000.5ms ≈ 1 second // Access byte 100,000,000 (100MB into file):// target_block = 100,000,000 / 4092 = 24,437// avg_case = 24,438 * 0.1 + 12,219 * 8 = 2444 + 97,752 = 100,196ms ≈ 100 seconds // This is completely impractical!Complexity Summary:
| Operation | Contiguous | Linked (Pure) | Linked (FAT cached) |
|---|---|---|---|
| Read byte at random offset | O(1) | O(n) | O(1)* |
| Read entire file sequentially | O(n) | O(n) | O(n) |
| Append data | O(1) amortized | O(n) or O(1)** | O(1)** |
| Truncate to size k | O(1) | O(k) | O(k) |
| Delete file | O(1) | O(n) | O(n) |
* With FAT in memory, pointer lookup is O(1) but initial load is O(n) ** With cached last-block pointer
The Exponential Pain of Deep Access:
The O(n) complexity means access time grows linearly with position:
| File Offset | Block Number | Disk I/Os | Time (8ms seek) |
|---|---|---|---|
| 40 KB | 10 | 11 | ~88ms |
| 400 KB | 100 | 101 | ~0.8s |
| 4 MB | 1,000 | 1,001 | ~8s |
| 40 MB | 10,000 | 10,001 | ~80s |
| 400 MB | 100,000 | 100,001 | ~13 min |
| 4 GB | 1,000,000 | 1,000,001 | ~2.2 hours |
Seeking to the end of a 4GB file could take over 2 HOURS with pure linked allocation. Even with SSD speeds (0.1ms per access instead of 8ms), it would still take ~100 seconds. No production system could tolerate this.
Different applications have different access patterns. Let's analyze how linked allocation affects various workloads.
Workload Categories:
Sequential Access Workloads:
Applications that read/write files from beginning to end:
Linked Allocation Performance: ✅ ACCEPTABLE
Sequential access only requires following pointers in order—no wasted traversals. The penalty is the seek between scattered blocks, but each block is visited exactly once.
Sequential read time = blocks × (read_time + scatter_penalty)
≈ Same as contiguous if blocks are near each other
| Application Type | Access Pattern | Linked Allocation Verdict |
|---|---|---|
| Log files | Append-only | ✅ Excellent |
| Streaming video capture | Sequential write | ✅ Good |
| File backup | Sequential read | ✅ Good |
| Email (read new) | Sequential | ✅ Acceptable |
| Text editor (small files) | Mixed | ⚠️ Tolerable |
| Text editor (large files) | Mixed | ❌ Poor |
| Database | Random | ❌ Unusable |
| Video player (seek) | Random | ❌ Unusable |
| Virtual machines | Random | ❌ Unusable |
Let's examine a concrete scenario that illustrates why linked allocation is fundamentally incompatible with database workloads.
Scenario:
A simple customer database:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
// Database query: Find customer by ID// Customer IDs are not sorted - random distribution typedef struct { uint32_t id; char name[100]; char email[200]; // ... other fields uint8_t padding[724]; // Total: 1024 bytes per record} CustomerRecord; // QUERY 1: Lookup customer ID 500,000// This customer is approximately in the middle of the file CustomerRecord* find_customer(uint32_t target_id) { /* * With linked allocation, we must: * 1. Start at block 0 * 2. Traverse through blocks until we find the record * 3. On average, the record is at block ~125,000 * * Time calculation: * - 125,000 blocks to traverse * - Each block: 8ms seek + 0.1ms read = 8.1ms * - Total: 125,000 × 8.1ms = 1,012,500 ms = 16.9 MINUTES * * That's 17 MINUTES to look up ONE customer! */ uint32_t current_block = file->first_block; uint8_t buffer[BLOCK_SIZE]; while (current_block != EOF_MARKER) { // Read block (triggers seek + read) disk_read(current_block, buffer); // Check each record in block for (int i = 0; i < 4; i++) { CustomerRecord *rec = (CustomerRecord*)(buffer + i * 1024); if (rec->id == target_id) { return rec; // Found! } } // Follow pointer to next block current_block = *(uint32_t*)(buffer + 4092); } return NULL; // Not found} // QUERY 2: Find all customers in zip code 94102// Must scan entire file - guaranteed worst case int count_by_zipcode(const char *zipcode) { /* * Full table scan required: 262,144 blocks * Time: 262,144 × 8.1ms = 2,123,366 ms = 35 MINUTES * * Every single query against non-indexed fields * takes 35 minutes! */} // QUERY 3: 100 random customer lookups// Perhaps for a customer list page void batch_lookup(uint32_t ids[], int count) { /* * 100 lookups × 17 minutes average = 1,700 minutes = 28 HOURS * * A single page load could take over a DAY! */}With pure linked allocation, a 1-million record database would take 17 minutes per lookup, 35 minutes for a table scan, and 28 hours for 100 random lookups. This is not merely slow—it's completely non-functional. This is why no database storage engine uses pure linked allocation.
Comparison with Indexed/Contiguous:
| Operation | Linked | Contiguous | B-tree Indexed |
|---|---|---|---|
| Random lookup | 17 min | 8ms | 24ms (3 levels) |
| Table scan | 35 min | 35 min | 35 min |
| 100 lookups | 28 hours | 800ms | 2.4s |
| Index range | N/A | N/A | ~100ms |
The 100,000× to 1,000,000× performance difference makes linked allocation effectively unusable for random-access workloads.
Over the years, various approaches have been tried to improve linked allocation's random access performance. Let's examine them:
1. Position Caching:
Cache recently accessed block positions:
123456789101112131415161718192021222324252627282930313233343536373839
// Maintain a sparse cache of block positionstypedef struct { uint32_t file_block_index; // Logical block number uint32_t disk_block; // Physical block address} CacheEntry; #define CACHE_SIZE 128 CacheEntry position_cache[CACHE_SIZE];uint32_t current_position; // Last accessed block uint32_t seek_with_cache(FileHandle *file, uint32_t target_block) { // Check cache for exact hit or nearest lower entry uint32_t best_start = 0; uint32_t best_block = file->first_block; for (int i = 0; i < CACHE_SIZE; i++) { if (position_cache[i].file_block_index <= target_block && position_cache[i].file_block_index > best_start) { best_start = position_cache[i].file_block_index; best_block = position_cache[i].disk_block; } } // Continue from best cached position // Still O(n) worst case, but often faster in practice uint32_t current = best_block; for (uint32_t i = best_start; i < target_block; i++) { current = follow_pointer(current); } // Update cache add_to_cache(target_block, current); return current;} // Limitation: Cache helps repeated access patterns// but can't help first access to any position2. Sequential Detection and Prefetching:
If sequential access is detected, prefetch upcoming blocks:
12345678910111213141516171819202122232425262728293031323334
// Detect sequential pattern and prefetch block pointerstypedef struct { uint32_t last_block; uint32_t sequential_count; uint32_t prefetch_buffer[PREFETCH_SIZE]; int prefetch_valid;} PrefetchState; void read_with_prefetch(FileHandle *file, uint32_t block) { if (file->prefetch.prefetch_valid && block == file->prefetch.last_block + 1) { // Sequential access detected file->prefetch.sequential_count++; // After 3 sequential reads, start prefetching if (file->prefetch.sequential_count > 3) { // Follow pointers ahead and cache them uint32_t current = block; for (int i = 0; i < PREFETCH_SIZE; i++) { current = follow_pointer(current); file->prefetch.prefetch_buffer[i] = current; } } } else { // Random access - reset prefetch file->prefetch.sequential_count = 0; file->prefetch.prefetch_valid = 0; } file->prefetch.last_block = block;} // Limitation: Only helps sequential access// Random access still requires full traversal3. Skip Pointers (Multi-Level Links):
Add forward pointers that skip multiple blocks:
Regular linked list:
B0 → B1 → B2 → B3 → B4 → B5 → B6 → B7
With skip pointers (every 4 blocks):
B0 ─────→ B4 ─────→ B8 ─────→ ...
↓ ↓ ↓ ↓ ↓ ↓
B1 → B2 → B3 B5 → B6 → B7
This reduces traversal from O(n) to O(n/k) where k is skip distance.
| Technique | Complexity (worst) | Improvement | Overhead |
|---|---|---|---|
| None (baseline) | O(n) | 1x | None |
| Position cache | O(n) | ~2-10x for hot paths | Memory for cache |
| Sequential prefetch | O(n) | ~2x for seq. access | Prefetch I/O |
| Skip pointers | O(n/k) | kx improvement | k pointers per block |
| Full pointer table (FAT) | O(1) | n× improvement | Table in memory |
Notice that none of the mitigations achieve true O(1) random access while staying pure linked allocation. The only way to achieve O(1) is to cache all pointers in memory (like FAT) or use a fundamentally different structure (like indexed allocation with B-trees).
The obvious solution to linked allocation's random access problem is to cache all the pointers in memory. This is exactly what the File Allocation Table (FAT) does. But this approach has significant implications.
FAT Solution:
Instead of storing pointers in each block, store ALL pointers in a centralized table:
Block 0: Data only (no embedded pointer)
Block 1: Data only
...
Block N: Data only
FAT table: FAT[0]=next_of_0, FAT[1]=next_of_1, ..., FAT[N]=next_of_N
With the FAT loaded in memory, following pointers is O(1)—just an array lookup.
Memory Requirements:
| Disk Size | Block Size | Total Blocks | 4-byte FAT Size |
|---|---|---|---|
| 1 GB | 4 KB | 262,144 | 1 MB |
| 10 GB | 4 KB | 2,621,440 | 10 MB |
| 100 GB | 4 KB | 26,214,400 | 100 MB |
| 1 TB | 4 KB | 268,435,456 | 1 GB |
| 10 TB | 4 KB | 2,684,354,560 | 10 GB |
The Memory Tradeoff:
For filesystems designed in the 1980s-1990s:
But as disk sizes grew:
This is why FAT32 has a maximum volume size of ~2TB (with 512-byte sectors) or ~16TB (with 4KB sectors)—the FAT itself becomes unwieldy.
Modern systems can load FAT portions on demand rather than the entire table. This reduces memory requirements but reintroduces some disk I/O for traversal. The kernel maintains a cache of frequently-accessed FAT regions.
Alternative: Indexed Allocation
For truly large files, a better solution is indexed allocation:
This is the approach used by modern file systems:
SSDs have dramatically different performance characteristics than HDDs. Does this change linked allocation's viability for random access?
SSD vs HDD Access Times:
| Metric | HDD | SSD (SATA) | SSD (NVMe) |
|---|---|---|---|
| Random read latency | 8-15ms | 0.1-0.2ms | 0.02-0.1ms |
| Sequential read | 150 MB/s | 500 MB/s | 3500 MB/s |
| Random read (IOPS) | 100-200 | 50,000-100,000 | 500,000+ |
Recalculating the Database Scenario:
With NVMe SSD (0.05ms per access):
Improvement: 17 minutes → 6 seconds (163× faster)
But still problematic:
SSDs make linked allocation tolerable for small-to-medium files, but fundamental O(n) complexity remains. A 100GB file with 25 million blocks still requires 25 million lookups. Even at 0.02ms each, that's still 500 seconds (8+ minutes) to seek to the end.
We've thoroughly examined linked allocation's most significant weakness—O(n) random access complexity. Let's consolidate our understanding:
What's next:
The final page explores the FAT optimization—the genius modification that made linked allocation practical for decades. We'll examine how moving pointers to a centralized table transformed random access performance while preserving linked allocation's fragmentation benefits.
You now understand why pure linked allocation is fundamentally unsuitable for random-access workloads. The O(n) traversal requirement makes it impractical for databases, media, and most modern applications. This understanding is crucial for appreciating both the FAT optimization and the move toward indexed allocation in modern file systems.