Loading learning content...
The logical file system thinks in terms of files, names, and permissions. The disk hardware thinks in terms of sectors, cylinders, and platters. Somewhere between these two vastly different worldviews sits the file organization module—the translator that converts abstract file concepts into concrete storage locations.
When you request to read bytes 4096-8191 of a file, the logical file system knows this corresponds to 'logical block 1' of the file. But which physical block on the disk contains this data? The file might be stored contiguously, scattered across the disk, or organized through complex indexing structures. The file organization module knows the answer.
This layer is where the core design decisions of a file system manifest. The choice of allocation strategy—contiguous, linked, or indexed—fundamentally shapes the file system's performance characteristics, fragmentation behavior, and recovery capabilities. Understanding this layer is essential for choosing the right file system for your workload and for diagnosing performance problems.
By the end of this page, you will understand the file organization module's role in the file system stack, how it manages the logical-to-physical block mapping, and how it works with the free space manager to allocate and deallocate storage. You'll grasp the tradeoffs between different approaches and why hybrid strategies dominate modern file systems.
The file organization module sits at layer 4 in our five-layer file system stack, directly below the logical file system and above the basic file system. Its position is strategic: it receives abstract requests from above and translates them into block-level operations below.
This layer has two fundamental responsibilities:
Logical-to-physical block mapping: Given a file identifier and a logical block number within that file, determine the physical block number on the storage device.
Free space management: Track which blocks on the storage device are available for allocation, and manage the allocation/deallocation of blocks as files grow, shrink, or are deleted.
To appreciate the file organization module's complexity, consider what it must accomplish:
Logical View (per file) Physical Reality (disk)
────────────────────────── ─────────────────────────────
File A: [Block 0][Block 1][Block 2] Block 0: belongs to File C
File B: [Block 0][Block 1] Block 1: belongs to File A, block 2
File C: [Block 0][Block 1][Block 2][Block 3] Block 2: belongs to File A, block 0
Block 3: FREE
Block 4: belongs to File B, block 1
Block 5: belongs to File C, block 0
Block 6: belongs to File C, block 1
Block 7: belongs to File B, block 0
Block 8: FREE
Block 9: belongs to File A, block 1
...
The mapping is arbitrary—logical block 0 of a file could be at any physical location. The file organization module maintains data structures that encode these mappings efficiently, supporting:
The obvious solution—store each file in contiguous blocks—makes mapping trivial (just store start block and length). But files grow and shrink. What happens when a file needs to expand but another file occupies the adjacent blocks? Contiguous allocation leads to fragmentation, compaction overhead, and complex space management. We'll explore this tradeoff in depth later.
Before diving into specific allocation strategies, we need to understand the basic concepts that all strategies share.
A block (also called a cluster in some file systems) is the smallest unit of allocation. The file system allocates storage in whole blocks—even a 1-byte file occupies at least one full block.
The block size trade-off:
| Block Size | Advantages | Disadvantages |
|---|---|---|
| Small (1-2 KB) | Less internal fragmentation, efficient for small files | More blocks = more metadata, more seeks |
| Large (64-128 KB) | Fewer blocks = less metadata, better sequential performance | Wastes space for small files |
| Typical (4 KB) | Balanced for general workloads | Compromise isn't optimal for any extreme |
Modern file systems typically use 4 KB blocks by default, matching the common page size for virtual memory. This alignment provides significant performance benefits—pages can be transferred directly between disk and memory without splitting.
Some file systems support variable block sizes or block suballocation to reduce waste for small files. XFS, for example, can use different block sizes for different parts of the file system.
Internal fragmentation occurs within allocated blocks. A 5 KB file in a file system with 4 KB blocks uses 2 blocks (8 KB total), wasting 3 KB.
External fragmentation occurs in the free space. Even if there's enough total free space, it may be scattered in pieces too small to satisfy an allocation request.
External Fragmentation Example:
───────────────────────────────────────────────
[USED][FREE][USED][FREE][USED][FREE][USED][FREE]
Total free: 4 blocks
Largest contiguous: 1 block
Request for 3 contiguous blocks: FAILS
On HDDs, external fragmentation causes performance degradation because reading scattered blocks requires seek time between each block. On SSDs, fragmentation causes less performance impact but can still affect write amplification and TRIM efficiency. Modern file systems work hard to prevent fragmentation through intelligent allocation strategies.
Contiguous allocation is the simplest allocation strategy: each file occupies a set of contiguous blocks on disk. The file organization module only needs to store two values for each file:
Directory Entry: report.txt → (start: 100, length: 5)
Disk Layout:
[97][98][99][100 101 102 103 104 ][105][106]
↑──────────────────────────────────↑
report.txt (blocks 0-4)
Mapping is trivial: To read logical block n of the file, access physical block (start + n).
Random access is immediate: O(1) time to calculate any block's physical location.
Initial state:
[Apple][Banana][Cherry][Date][Elderberry]
After deleting Banana and Date:
[Apple][ FREE ][Cherry][ FREE ][Elderberry]
Try to create Fig (needs 3 blocks):
FAILS - no contiguous space of size 3
But total free space = 4 blocks
The only solution is compaction: moving files to create contiguous free space. This is extremely expensive—potentially requiring reading and writing every block on the disk. Historically, this was done offline ("defragmenting" the disk). For this reason, contiguous allocation is rarely used for general-purpose file systems.
Despite its limitations, contiguous allocation is used when:
Modern file systems like ext4 and XFS use extent-based allocation—a hybrid that provides contiguous blocks where possible but falls back to non-contiguous allocation when necessary. We'll explore this later.
Linked allocation solves the external fragmentation problem by allowing files to use any available blocks, connected via pointers. Each block contains a pointer to the next block in the file.
Directory Entry: report.txt → (start: 100, end: 73)
Block 100: [data...][ptr:205] → Block 205: [data...][ptr:73] → Block 73: [data...][NULL]
Physical layout (blocks scattered):
[...][73 ][...][100 ][...][205 ][...]
↑ ↑ ↑
3rd block 1st block 2nd block
The file's directory entry only needs to store the first block address (and optionally the last for efficient appending). Each block dedicates a portion of its space to the next-block pointer.
To read logical block n of a file, the file organization module must:
This is O(n) time complexity—unacceptable for large files or random access patterns.
Reading block 1000 of a file:
- Linked allocation: 1000 disk reads (following pointers)
- Contiguous allocation: 1 disk read
For a 1 GB file with 4 KB blocks, accessing the last block requires traversing 262,144 pointers. This could take several minutes on an HDD.
The File Allocation Table (FAT) file system addresses the random access problem by moving all pointers out of the data blocks and into a dedicated table in memory:
File Allocation Table (in memory):
┌───────┬────────────┐
│ Block │ Next Block │
├───────┼────────────┤
│ 0 │ FREE │
│ 1 │ 5 │ ← File A starts at block 1
│ 2 │ FREE │
│ 3 │ EOF │ ← End of File B
│ 4 │ 3 │ ← File B starts at block 4
│ 5 │ EOF │ ← End of File A
│ 6 │ FREE │
└───────┴────────────┘
Now traversing the chain only requires memory access—no disk I/O. Random access becomes much faster, though still O(n) in terms of table entries traversed.
FAT limitations:
This is why FAT remains useful for small removable media but is inadequate for modern disk sizes.
Despite its limitations, FAT remains the most widely compatible file system. It's used for SD cards, USB drives, and the EFI System Partition on UEFI systems. exFAT (FAT64) extends the design for larger devices while maintaining the linked allocation model with an in-memory table.
Indexed allocation combines the random access capability of contiguous allocation with the flexibility of linked allocation. Each file has an index block (also called an indirect block) that contains pointers to all the file's data blocks.
Directory Entry: report.txt → (index block: 500)
Index Block 500:
┌───────────────────┐
│ Block 0: 105 │
│ Block 1: 237 │
│ Block 2: 89 │
│ Block 3: 412 │
│ Block 4: NULL │ ← File has 4 blocks
│ ... │
└───────────────────┘
To read logical block 2: Look up index[2] = 89, read block 89
Random access is O(1): Reading any block requires exactly two I/O operations—one to read the index block, one to read the data block. The index block can be cached, reducing this to one I/O in practice.
With a 4 KB block and 4-byte block pointers, an index block can hold:
4096 bytes / 4 bytes per pointer = 1024 pointers
1024 pointers × 4 KB per block = 4 MB maximum file size
4 MB is far too small for modern files. Three strategies address this:
1. Linked Index Blocks
Index Block 1 → Block 2 → Block 3 → NULL
↓ ↓ ↓
[1024 ptrs] [1024 ptrs] [512 ptrs]
Limitation: Random access degrades to O(n/1024) for blocks beyond the first index block.
2. Multi-Level Index (Hierarchical)
Primary Index Block:
┌─────────────────────┐
│ → Secondary Index 1 │ → [1024 data block pointers]
│ → Secondary Index 2 │ → [1024 data block pointers]
│ → Secondary Index 3 │ → [1024 data block pointers]
│ ... │
└─────────────────────┘
Supports: 1024 × 1024 × 4KB = 4 GB per level. Two levels = 4 TB.
3. Combined Scheme (Unix inode style)
Inode contains:
├─ 12 direct pointers → 48 KB directly accessible
├─ 1 single-indirect ptr → 4 MB more
├─ 1 double-indirect ptr → 4 GB more
└─ 1 triple-indirect ptr → 4 TB more
The combined scheme optimizes for the common case (small files) while supporting huge files. We'll examine this in detail when studying Unix inodes.
Studies show most files are small. With 12 direct pointers and 4KB blocks, files up to 48KB need no indirect blocks—covering the vast majority of files. Larger files incur the overhead of indirect blocks only when necessary. This is a classic example of optimizing for the common case.
The file organization module must track which blocks are free for allocation. When a file needs more space or when a new file is created, the free space manager efficiently locates and allocates available blocks. When files are deleted or truncated, blocks are returned to the free pool.
Free space management involves two key operations:
Several data structures can implement free space tracking, each with different trade-offs.
The most common approach uses a bit vector where each bit represents one block:
Bit position: 0 1 2 3 4 5 6 7 8 9 ...
Bit value: 1 1 0 1 1 0 0 1 0 0 ...
1 = allocated, 0 = free
Free blocks: 2, 5, 6, 8, 9, ...
Storage requirement: For a 1 TB disk with 4 KB blocks:
Operations:
Advantages: Compact, fast with proper caching, naturally finds contiguous blocks.
Disadvantages: Must scan to find free blocks; entire bitmap may not fit in memory for huge disks.
Each free block contains a pointer to the next free block:
Free list head: block 5
Block 5: [ptr: 12] → Block 12: [ptr: 37] → Block 37: [ptr: NULL]
Allocation: Remove head, update list
Deallocation: Insert at head
Advantages: No extra space needed (pointers stored in free blocks themselves). O(1) allocation/deallocation of single blocks.
Disadvantages: Poor locality; can't efficiently find contiguous blocks; must traverse list.
Variation where first free block stores addresses of n free blocks, with the last entry pointing to another such block:
Block 5: [12, 37, 89, 102, 155, →Block 200]
Block 200: [201, 203, 205, 207, 209, →Block 300]
Reduces traversal overhead by grouping multiple free block addresses together.
Stores (start, count) pairs for contiguous free regions:
Free Space Table:
┌─────────┬───────┐
│ Start │ Count │
├─────────┼───────┤
│ 5 │ 3 │ ← Blocks 5, 6, 7 are free
│ 15 │ 12 │ ← Blocks 15-26 are free
│ 100 │ 7 │ ← Blocks 100-106 are free
└─────────┴───────┘
Advantages: Very compact when free space is contiguous. Efficient for allocating contiguous blocks.
Disadvantages: Requires merging adjacent regions on deallocation. List grows when disk becomes fragmented.
Modern file systems like ZFS and Btrfs use sophisticated free space management combining B-trees, space maps, and other structures. They optimize for both scattered allocations (many random writes) and contiguous allocations (streaming writes), adapting to workload patterns.
Extent-based allocation is the dominant strategy in modern file systems (ext4, XFS, NTFS, Btrfs, APFS). An extent is a contiguous run of blocks described by a single (start, length) pair.
Traditional indexed allocation (per-block pointers):
[ptr 0][ptr 1][ptr 2][ptr 3][ptr 4][ptr 5][ptr 6][ptr 7]...
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
100 101 102 103 104 105 106 107
Extent-based allocation:
[start: 100, length: 8] ← One extent describes all 8 blocks
The insight is brilliant: why store 8 pointers when blocks are contiguous? An extent reduces metadata overhead dramatically while preserving the ability to handle non-contiguous allocation when necessary.
A typical extent entry contains:
Extent:
┌────────────────────────────────────────────────────────────────┐
│ Logical Block Start │ Physical Block Start │ Length │ Flags │
│ (32-48 bits) │ (32-48 bits) │ (15-24)│ (1-8) │
└────────────────────────────────────────────────────────────────┘
Logical block start: First logical block in this extent (the file's view) Physical block start: First physical block on disk Length: Number of contiguous blocks Flags: Pre-allocated, unwritten, encrypted, etc.
For very large files with many extents, a flat list becomes inefficient. Modern file systems organize extents in B-trees or similar structures:
┌─────────────────────────┐
│ Extent Tree Root │
│ (stored in inode) │
└───────────┬─────────────┘
│
┌──────────────┼──────────────┐
↓ ↓ ↓
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Leaf Node│ │ Leaf Node│ │ Leaf Node│
│ Extents │ │ Extents │ │ Extents │
│ 0-1000 │ │ 1001- │ │ 2001- │
│ │ │ 2000 │ │ 3000 │
└──────────┘ └──────────┘ └──────────┘
Lookup is O(log n) where n is the number of extents, not the number of blocks. For most files with few extents, the entire tree fits in the inode itself.
The fallocate() system call lets applications reserve contiguous extents before writing. This is crucial for databases and virtual machine images—ensuring data is contiguous even when written incrementally. Without pre-allocation, files written over time become fragmented as other files interleave with them.
The file organization module serves as a bridge, receiving requests from the logical file system (layer 5) and issuing requests to the basic file system (layer 3). Understanding this interface clarifies each layer's responsibility.
Input: (inode, logical_block_number, operation, buffer)
Examples:
- Read block 5 of inode 12345 into buffer
- Write buffer to block 10 of inode 12345
- Allocate 3 more blocks for inode 12345
- Truncate inode 12345 to 7 blocks
The logical file system deals in file identifiers (inodes) and logical block numbers. It has no knowledge of physical disk layout.
Output: (physical_block_number, operation, buffer)
Examples:
- Read physical block 1042 into buffer
- Write buffer to physical block 2561
- Read physical blocks 1042-1046 into buffer (prefetch)
The basic file system deals in physical block numbers. It doesn't know about files—only that certain blocks need to be read or written.
Logical FS Request: "Read byte 100000 of file inode 789"
│
▼
┌─────────────────────────────────────────────────────────────┐
│ FILE ORGANIZATION MODULE │
├─────────────────────────────────────────────────────────────┤
│ 1. Calculate: logical_block = 100000 / 4096 = 24 │
│ │
│ 2. Look up inode 789's extent tree │
│ Found: Extent(logical: 20-30, physical: 5000-5010) │
│ │
│ 3. Calculate: physical_block = 5000 + (24 - 20) = 5004 │
│ │
│ 4. Issue: "Read physical block 5004" │
└─────────────────────────────────────────────────────────────┘
│
▼
Basic FS Request: "Read physical block 5004"
This translation happens on every file I/O operation. Efficient caching of extent trees and allocation metadata is essential for performance.
The file organization module is the last layer that understands 'files'. The basic file system below it only sees generic blocks. This separation allows the same basic file system code to work with different allocation strategies—swap them in the file organization module without changing anything below.
We've explored the file organization module—the layer that bridges the logical file abstraction and physical storage reality. Let's consolidate the key concepts:
What's Next:
The file organization module translates file requests into physical block requests. But how do those block requests actually reach the storage device? That's the job of the basic file system—our next topic. We'll explore buffer management, I/O scheduling, and the critical role of caching in file system performance.
You now understand the file organization module—its allocation strategies, free space management techniques, and the critical logical-to-physical mapping that makes file systems work. Next, we descend another layer to understand how block-level I/O is actually performed.