Loading learning content...
Imagine storing a book not as a continuous manuscript, but as individual pages scattered across a library, where each page contains a note pointing to the location of the next page. This analogy captures the essence of linked allocation—one of the fundamental methods for organizing file data on disk storage.
In earlier exploration of file system allocation strategies, we examined contiguous allocation, which required files to occupy consecutive disk blocks. While elegant in its simplicity and offering superb sequential read performance, contiguous allocation suffered from a fatal flaw: external fragmentation. As files were created and deleted, the disk became a patchwork of free space fragments, potentially leaving us unable to allocate space for new files even when total free space was abundant.
Linked allocation emerged as a direct solution to this fragmentation problem, introducing a fundamentally different approach to organizing file blocks on disk.
By the end of this page, you will understand how linked allocation works at a fundamental level, including how files are represented as chains of blocks, how the operating system traverses these chains, and why this approach elegantly solves the external fragmentation problem. You'll gain insight into the data structures underlying linked allocation and appreciate both its advantages and the challenges it introduces.
Linked allocation represents files as linked lists of disk blocks. Unlike contiguous allocation, where a file's blocks must be adjacent, linked allocation allows blocks to be scattered anywhere on the disk. Each block contains not only file data but also a pointer to the next block in the chain, creating a logical sequence from physically dispersed locations.
The core principle:
In linked allocation, the directory entry for a file contains the address of the first block (head) of the file. Each block then contains:
The last block of the file contains a special null pointer (typically -1, 0, or a sentinel value) indicating the end of the file.
Anatomy of a linked block:
Each disk block in a linked allocation scheme is divided into two logical sections:
+------------------------------------------+
| |
| FILE DATA |
| (Block Size - Pointer Size) |
| |
+------------------------------------------+
| NEXT BLOCK POINTER |
| (typically 4 bytes) |
+------------------------------------------+
For a standard 4KB (4096 byte) block with a 4-byte pointer:
While this overhead might seem negligible, it has profound implications for file organization and access patterns that we'll explore throughout this module.
The non-power-of-two data capacity (4092 bytes instead of 4096) complicates address calculations compared to contiguous allocation. Programs expecting exactly 4096 bytes per block must account for this discrepancy, or the file system must introduce additional abstraction layers.
Let's trace through the complete lifecycle of file operations under linked allocation to understand the mechanism in depth.
File Creation:
When creating a new file:
Writing to a File:
When writing data that exceeds the current file size:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
// Simplified pseudocode for appending data with linked allocationtypedef struct { char data[BLOCK_SIZE - sizeof(uint32_t)]; // Data portion uint32_t next_block; // Pointer to next block} LinkedBlock; int append_to_file(FileEntry *file, void *data, size_t length) { // Find the last block by traversing the chain uint32_t current_block = file->start_block; LinkedBlock *block = read_block(current_block); // Traverse to end of chain while (block->next_block != NULL_BLOCK) { current_block = block->next_block; block = read_block(current_block); } size_t bytes_written = 0; size_t space_in_block = BLOCK_SIZE - sizeof(uint32_t) - file->size_in_last_block; while (bytes_written < length) { // Write data to current block size_t to_write = min(space_in_block, length - bytes_written); memcpy(block->data + file->size_in_last_block, data + bytes_written, to_write); bytes_written += to_write; // If block is full and more data remains, allocate new block if (bytes_written < length) { uint32_t new_block = allocate_free_block(); if (new_block == INVALID_BLOCK) { return -1; // Disk full } block->next_block = new_block; write_block(current_block, block); current_block = new_block; block = read_block(current_block); block->next_block = NULL_BLOCK; space_in_block = BLOCK_SIZE - sizeof(uint32_t); file->size_in_last_block = 0; } } // Write final block and update file size write_block(current_block, block); file->size += length; return 0;}Reading from a File:
Sequential reading is straightforward:
File Deletion:
Deleting a file involves:
Importantly, the blocks don't need to be zeroed or physically moved—only the free list metadata needs updating.
| Operation | Algorithm | Complexity | Disk I/O Operations |
|---|---|---|---|
| Create File | Allocate one block, create directory entry | O(1) | 2-3 writes |
| Append Data | Traverse to end, allocate new blocks as needed | O(n) where n = current blocks | n reads + m writes (m = new blocks) |
| Sequential Read | Follow pointer chain | O(blocks read) | 1 read per block |
| Random Access (block k) | Traverse from start to block k | O(k) | k reads |
| Delete File | Traverse chain, return blocks to free list | O(n) | n reads + n writes to free list |
Understanding linked allocation requires appreciating how the familiar linked list data structure translates from memory to persistent storage.
In-Memory vs On-Disk Linked Lists:
In memory, a linked list node might contain:
struct Node {
void *data;
struct Node *next; // Memory address
};
On disk, the "pointer" becomes a block address—an integer identifying the physical or logical block number:
struct DiskBlock {
char data[BLOCK_DATA_SIZE];
uint32_t next_block_number; // Block address, not memory address
};
Key differences:
| Aspect | Memory Linked List | Disk Linked List |
|---|---|---|
| Pointer type | Memory address | Block number |
| Traversal cost | Nanoseconds | Milliseconds (HDD) to microseconds (SSD) |
| Node size | Flexible | Fixed (block size) |
| Cache behavior | CPU cache friendly | Disk cache dependent |
| Failure impact | Process crash | Data loss risk |
On a traditional hard disk, each pointer traversal potentially triggers a disk seek—moving the read/write head to a different cylinder. A single seek takes 5-15ms. For a file with 1000 blocks scattered across the disk, random access could take 5-15 SECONDS, compared to milliseconds for contiguous allocation.
Block Address Formats:
Different file systems represent block addresses differently:
Absolute Block Number: A global index into the disk's block array
disk_offset = block_number × block_sizeLogical Block Address (LBA): Abstracts physical disk geometry
Relative Block Number: Offset within a partition or volume
Pointer Size Considerations:
The size of the next-block pointer determines the maximum disk size:
| Pointer Size | Max Blocks Addressable | Max Disk Size (4KB blocks) |
|---|---|---|
| 16-bit | 65,536 | 256 MB |
| 24-bit | 16,777,216 | 64 GB |
| 32-bit | 4,294,967,296 | 16 TB |
| 48-bit | 281 trillion | 1 EB (Exabyte) |
Modern file systems typically use 32-bit or larger block pointers to support large storage devices.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
// Block address calculation and traversal#define BLOCK_SIZE 4096#define POINTER_SIZE 4#define DATA_PER_BLOCK (BLOCK_SIZE - POINTER_SIZE) // Structure representing directory entrytypedef struct { char filename[256]; uint32_t first_block; uint64_t file_size; uint32_t permissions; time_t created; time_t modified;} DirectoryEntry; // Read a specific byte offset within a fileint read_file_at_offset(DirectoryEntry *entry, uint64_t offset, void *buffer, size_t length) { // Calculate which block contains the offset uint32_t block_index = offset / DATA_PER_BLOCK; uint32_t offset_in_block = offset % DATA_PER_BLOCK; // Traverse to the target block uint32_t current_block = entry->first_block; for (uint32_t i = 0; i < block_index; i++) { if (current_block == NULL_BLOCK) { return -1; // Offset beyond file end } // Read block to get next pointer LinkedBlock *block = read_block(current_block); current_block = block->next_block; } // Now read data from current_block onwards size_t bytes_read = 0; while (bytes_read < length && current_block != NULL_BLOCK) { LinkedBlock *block = read_block(current_block); size_t available = DATA_PER_BLOCK - offset_in_block; size_t to_read = min(available, length - bytes_read); memcpy(buffer + bytes_read, block->data + offset_in_block, to_read); bytes_read += to_read; offset_in_block = 0; // After first block, start at beginning current_block = block->next_block; } return bytes_read;}The directory entry in linked allocation is notably simpler than in contiguous allocation because we don't need to track contiguous extent lengths.
Minimal Linked Allocation Directory Entry:
+------------------+------------------+
| Filename | First Block |
| (variable/fixed) | (4 bytes) |
+------------------+------------------+
The file size can be:
Extended Directory Entry:
Modern implementations typically include additional metadata:
12345678910111213141516171819202122232425262728
// Extended directory entry for linked allocationtypedef struct { // File identification char name[255]; // Null-terminated filename uint8_t name_length; // Actual length of filename // Linked allocation specific uint32_t first_block; // Starting block of file chain uint32_t last_block; // Cached pointer to last block (optimization) uint64_t file_size; // Total file size in bytes uint32_t block_count; // Number of blocks allocated // Standard metadata uint16_t permissions; // rwx for owner/group/other uint32_t owner_uid; // Owner user ID uint32_t owner_gid; // Owner group ID // Timestamps int64_t created_time; // Creation timestamp int64_t modified_time; // Last modification timestamp int64_t accessed_time; // Last access timestamp // File type uint8_t file_type; // Regular, directory, symlink, etc. } LinkedDirectoryEntry; // Size: approximately 320 bytes per entryThe Last Block Optimization:
Notice the last_block field above. This is a crucial optimization for append operations:
This transforms append from O(n) to O(1), making linked allocation practical for log files and other append-heavy workloads.
Caching the last block pointer creates a consistency challenge: if a crash occurs after updating the chain but before updating the directory entry, the cache becomes stale. File systems must either recalculate on mount or use journaling to ensure consistency.
Linked allocation offers several compelling advantages that made it a popular choice, particularly in early personal computer file systems:
1. Complete Elimination of External Fragmentation
Because files don't require contiguous blocks, any free block can satisfy any allocation request. There's no concept of "holes" being too small—every free block is equally useful.
2. Dynamic File Growth
Files can grow indefinitely without pre-allocation or reservation:
3. Simple Free Space Management
Free blocks form a simple pool:
4. Efficient Deletion
Deleting a file only requires:
No data movement or space reclamation needed.
5. Natural Append Support
With the last-block optimization, appending is extremely efficient:
This makes linked allocation excellent for log files, audit trails, and streaming data.
The original FAT file system, which used linked allocation (with the FAT optimization), powered MS-DOS, early Windows, and remains in use today on USB drives, SD cards, and UEFI system partitions. Its success demonstrates that linked allocation, despite its limitations, fills an important niche.
Let's visualize a concrete example of how linked allocation organizes files on a disk. Consider a small disk with 16 blocks (numbered 0-15) containing three files:
File chains:
| File | Block Chain | Total Blocks |
|---|---|---|
| report.txt | 2 → 9 → 12 → NULL | 3 blocks |
| photo.jpg | 5 → 7 → 14 → NULL | 3 blocks |
| config.ini | 11 → NULL | 1 block |
Key observations:
To read report.txt sequentially:
Total: 3 disk reads for 3 blocks of data. Compare this to contiguous allocation where a single multi-block read could fetch all data.
Linked allocation has a rich history in computing, emerging as a practical solution to the limitations of early storage systems.
The Problem Space of the 1970s-1980s:
Contiguous allocation, while offering excellent read performance, became increasingly impractical as:
| System/Year | Approach | Notable Features |
|---|---|---|
| CTSS (1961) | Modified linked | One of first time-sharing systems |
| FAT (1977) | Centralized table | MS-DOS, Windows 3.x/9x |
| HFS (1985) | Extent-based variation | Macintosh file system |
| Amiga OFS (1985) | Pure linked blocks | Amiga Original File System |
The MS-DOS Revolution:
When Microsoft and Seattle Computer Products designed the File Allocation Table (FAT) file system for 86-DOS (later MS-DOS), they chose linked allocation with a crucial optimization: moving all the pointers into a centralized table (the FAT). This preserved linked allocation's fragmentation advantages while improving random access—we'll explore this in depth in a later page.
Modern Relevance:
While modern file systems like ext4, NTFS, and APFS use more sophisticated allocation schemes (extent-based, copy-on-write), understanding linked allocation remains valuable because:
Every computer you use today likely has at least one FAT partition—the UEFI System Partition (ESP) that enables your computer to boot. Understanding linked allocation means understanding a technology that literally helps start every modern PC.
We've established the foundational understanding of linked allocation. Let's consolidate the key concepts:
What's next:
The next page examines the pointer structure in detail—how each block contains a pointer to its successor, the mechanics of pointer updates during file operations, and the implications of pointer corruption. Understanding these mechanics is essential for grasping both the strengths and vulnerabilities of linked allocation.
You now understand the fundamental concept of linked allocation—files as chains of blocks scattered across the disk, connected by pointers. This mental model will serve as the foundation for understanding pointer mechanics, fragmentation behavior, random access performance, and the revolutionary FAT optimization in the following pages.