Operating SystemsFile System Layers

File System Layers

LevelIntermediate

Duration90 mins

TopicFile System Layers

2 / 5

File Organization Module

Bridging the Logical and Physical Worlds

The logical file system thinks in terms of files, names, and permissions. The disk hardware thinks in terms of sectors, cylinders, and platters. Somewhere between these two vastly different worldviews sits the file organization module—the translator that converts abstract file concepts into concrete storage locations.

When you request to read bytes 4096-8191 of a file, the logical file system knows this corresponds to 'logical block 1' of the file. But which physical block on the disk contains this data? The file might be stored contiguously, scattered across the disk, or organized through complex indexing structures. The file organization module knows the answer.

This layer is where the core design decisions of a file system manifest. The choice of allocation strategy—contiguous, linked, or indexed—fundamentally shapes the file system's performance characteristics, fragmentation behavior, and recovery capabilities. Understanding this layer is essential for choosing the right file system for your workload and for diagnosing performance problems.

What You Will Learn

By the end of this page, you will understand the file organization module's role in the file system stack, how it manages the logical-to-physical block mapping, and how it works with the free space manager to allocate and deallocate storage. You'll grasp the tradeoffs between different approaches and why hybrid strategies dominate modern file systems.

The Role of the File Organization Module

The file organization module sits at layer 4 in our five-layer file system stack, directly below the logical file system and above the basic file system. Its position is strategic: it receives abstract requests from above and translates them into block-level operations below.

This layer has two fundamental responsibilities:

Logical-to-physical block mapping: Given a file identifier and a logical block number within that file, determine the physical block number on the storage device.
Free space management: Track which blocks on the storage device are available for allocation, and manage the allocation/deallocation of blocks as files grow, shrink, or are deleted.

The Mapping Challenge

To appreciate the file organization module's complexity, consider what it must accomplish:

   Logical View (per file)          Physical Reality (disk)
   ──────────────────────────        ─────────────────────────────
   File A: [Block 0][Block 1][Block 2]     Block 0: belongs to File C
   File B: [Block 0][Block 1]              Block 1: belongs to File A, block 2
   File C: [Block 0][Block 1][Block 2][Block 3]  Block 2: belongs to File A, block 0
                                           Block 3: FREE
                                           Block 4: belongs to File B, block 1
                                           Block 5: belongs to File C, block 0
                                           Block 6: belongs to File C, block 1
                                           Block 7: belongs to File B, block 0
                                           Block 8: FREE
                                           Block 9: belongs to File A, block 1
                                           ...

The mapping is arbitrary—logical block 0 of a file could be at any physical location. The file organization module maintains data structures that encode these mappings efficiently, supporting:

Fast sequential access: Reading block n should quickly lead to block n+1
Fast random access: Reading an arbitrary block shouldn't require scanning from the beginning
Efficient allocation: Finding and assigning free blocks shouldn't degrade as the disk fills
Minimal overhead: The mapping metadata shouldn't consume excessive disk space

Why Not Just Store Files Contiguously?

The obvious solution—store each file in contiguous blocks—makes mapping trivial (just store start block and length). But files grow and shrink. What happens when a file needs to expand but another file occupies the adjacent blocks? Contiguous allocation leads to fragmentation, compaction overhead, and complex space management. We'll explore this tradeoff in depth later.

Block Allocation Fundamentals

Before diving into specific allocation strategies, we need to understand the basic concepts that all strategies share.

Block Size: The Fundamental Trade-off

A block (also called a cluster in some file systems) is the smallest unit of allocation. The file system allocates storage in whole blocks—even a 1-byte file occupies at least one full block.

The block size trade-off:

Block Size	Advantages	Disadvantages
Small (1-2 KB)	Less internal fragmentation, efficient for small files	More blocks = more metadata, more seeks
Large (64-128 KB)	Fewer blocks = less metadata, better sequential performance	Wastes space for small files
Typical (4 KB)	Balanced for general workloads	Compromise isn't optimal for any extreme

Modern file systems typically use 4 KB blocks by default, matching the common page size for virtual memory. This alignment provides significant performance benefits—pages can be transferred directly between disk and memory without splitting.

Some file systems support variable block sizes or block suballocation to reduce waste for small files. XFS, for example, can use different block sizes for different parts of the file system.

Internal vs External Fragmentation

Internal fragmentation occurs within allocated blocks. A 5 KB file in a file system with 4 KB blocks uses 2 blocks (8 KB total), wasting 3 KB.

External fragmentation occurs in the free space. Even if there's enough total free space, it may be scattered in pieces too small to satisfy an allocation request.

   External Fragmentation Example:
   ───────────────────────────────────────────────
   [USED][FREE][USED][FREE][USED][FREE][USED][FREE]
   
   Total free: 4 blocks
   Largest contiguous: 1 block
   Request for 3 contiguous blocks: FAILS

Fragmentation in Practice

On HDDs, external fragmentation causes performance degradation because reading scattered blocks requires seek time between each block. On SSDs, fragmentation causes less performance impact but can still affect write amplification and TRIM efficiency. Modern file systems work hard to prevent fragmentation through intelligent allocation strategies.

Contiguous Allocation Strategy

Contiguous allocation is the simplest allocation strategy: each file occupies a set of contiguous blocks on disk. The file organization module only needs to store two values for each file:

Starting block address
Length in blocks

   Directory Entry: report.txt → (start: 100, length: 5)
   
   Disk Layout:
   [97][98][99][100    101    102    103    104   ][105][106]
                ↑──────────────────────────────────↑
                         report.txt (blocks 0-4)

Mapping is trivial: To read logical block n of the file, access physical block (start + n).

Random access is immediate: O(1) time to calculate any block's physical location.

Advantages

•Excellent sequential performance — Reading the entire file requires no seeking; blocks are adjacent
•Simple implementation — Minimal metadata, trivial mapping calculation
•Predictable performance — No variability based on allocation patterns
•Efficient for read-only data — Ideal for CD-ROMs, installation media

Disadvantages

•File size must be known at creation — Can't easily grow files
•External fragmentation — Free space becomes scattered
•Compaction required — Must defragment to reclaim fragmented space
•Poor for dynamic workloads — Terrible for frequently modified files

The External Fragmentation Problem

   Initial state:
   [Apple][Banana][Cherry][Date][Elderberry]
   
   After deleting Banana and Date:
   [Apple][  FREE  ][Cherry][  FREE  ][Elderberry]
   
   Try to create Fig (needs 3 blocks):
   FAILS - no contiguous space of size 3
   
   But total free space = 4 blocks

The only solution is compaction: moving files to create contiguous free space. This is extremely expensive—potentially requiring reading and writing every block on the disk. Historically, this was done offline ("defragmenting" the disk). For this reason, contiguous allocation is rarely used for general-purpose file systems.

Where Contiguous Allocation Shines

Despite its limitations, contiguous allocation is used when:

Files are created once and never modified — CD-ROM/DVD (ISO 9660)
File sizes are known in advance — Database tablespaces with pre-allocated extents
Maximum performance is critical — Real-time systems, multimedia streaming

Modern file systems like ext4 and XFS use extent-based allocation—a hybrid that provides contiguous blocks where possible but falls back to non-contiguous allocation when necessary. We'll explore this later.

Linked Allocation Strategy

Linked allocation solves the external fragmentation problem by allowing files to use any available blocks, connected via pointers. Each block contains a pointer to the next block in the file.

   Directory Entry: report.txt → (start: 100, end: 73)
   
   Block 100: [data...][ptr:205] → Block 205: [data...][ptr:73] → Block 73: [data...][NULL]
   
   Physical layout (blocks scattered):
   [...][73    ][...][100   ][...][205   ][...]
           ↑           ↑           ↑
           3rd block   1st block   2nd block

The file's directory entry only needs to store the first block address (and optionally the last for efficient appending). Each block dedicates a portion of its space to the next-block pointer.

Advantages

•No external fragmentation — Any free block can be used by any file
•Files can grow dynamically — Just append more blocks to the chain
•Simple free space management — Allocate any available block
•No compaction needed — Works well regardless of allocation history

Disadvantages

•Terrible random access — O(n) to reach block n (must traverse chain)
•Space overhead for pointers — Each block loses some capacity
•Reliability concerns — One corrupted pointer breaks the chain
•Poor sequential performance — Seeks between scattered blocks

The Random Access Problem

To read logical block n of a file, the file organization module must:

Start at the first block
Follow n pointers through the chain
Finally read the target block

This is O(n) time complexity—unacceptable for large files or random access patterns.

   Reading block 1000 of a file:
   - Linked allocation: 1000 disk reads (following pointers)
   - Contiguous allocation: 1 disk read

For a 1 GB file with 4 KB blocks, accessing the last block requires traversing 262,144 pointers. This could take several minutes on an HDD.

The FAT Optimization

The File Allocation Table (FAT) file system addresses the random access problem by moving all pointers out of the data blocks and into a dedicated table in memory:

   File Allocation Table (in memory):
   ┌───────┬────────────┐
   │ Block │ Next Block │
   ├───────┼────────────┤
   │   0   │  FREE      │
   │   1   │  5         │  ← File A starts at block 1
   │   2   │  FREE      │
   │   3   │  EOF       │  ← End of File B
   │   4   │  3         │  ← File B starts at block 4
   │   5   │  EOF       │  ← End of File A
   │   6   │  FREE      │
   └───────┴────────────┘

Now traversing the chain only requires memory access—no disk I/O. Random access becomes much faster, though still O(n) in terms of table entries traversed.

FAT limitations:

The entire FAT must fit in memory for reasonable performance
FAT16: 2^16 entries × 2 bytes = 128 KB (manageable)
FAT32: 2^28 entries × 4 bytes = 1 GB (too large for full caching)

This is why FAT remains useful for small removable media but is inadequate for modern disk sizes.

FAT's Enduring Legacy

Despite its limitations, FAT remains the most widely compatible file system. It's used for SD cards, USB drives, and the EFI System Partition on UEFI systems. exFAT (FAT64) extends the design for larger devices while maintaining the linked allocation model with an in-memory table.

Indexed Allocation Strategy

Indexed allocation combines the random access capability of contiguous allocation with the flexibility of linked allocation. Each file has an index block (also called an indirect block) that contains pointers to all the file's data blocks.

   Directory Entry: report.txt → (index block: 500)
   
   Index Block 500:
   ┌───────────────────┐
   │ Block 0: 105      │
   │ Block 1: 237      │
   │ Block 2: 89       │
   │ Block 3: 412      │
   │ Block 4: NULL     │  ← File has 4 blocks
   │ ...               │
   └───────────────────┘
   
   To read logical block 2: Look up index[2] = 89, read block 89

Random access is O(1): Reading any block requires exactly two I/O operations—one to read the index block, one to read the data block. The index block can be cached, reducing this to one I/O in practice.

Advantages

•O(1) random access — Direct lookup via index
•No external fragmentation — Any free block can be used
•Dynamic file growth — Add entries to index as needed
•All pointers in one place — Easier to manage than linked lists

Disadvantages

•Index block overhead for small files — A 1-byte file still needs an index block
•Maximum file size limited by index block size — With 4KB blocks and 4-byte pointers, max ~4MB
•Extra I/O for every access — Must read index before data
•Index block is single point of failure — Corruption loses entire file

The File Size Problem

With a 4 KB block and 4-byte block pointers, an index block can hold:

   4096 bytes / 4 bytes per pointer = 1024 pointers
   1024 pointers × 4 KB per block = 4 MB maximum file size

4 MB is far too small for modern files. Three strategies address this:

1. Linked Index Blocks

   Index Block 1 → Block 2 → Block 3 → NULL
       ↓               ↓          ↓
   [1024 ptrs]    [1024 ptrs] [512 ptrs]

Limitation: Random access degrades to O(n/1024) for blocks beyond the first index block.

2. Multi-Level Index (Hierarchical)

   Primary Index Block:
   ┌─────────────────────┐
   │ → Secondary Index 1 │ → [1024 data block pointers]
   │ → Secondary Index 2 │ → [1024 data block pointers]
   │ → Secondary Index 3 │ → [1024 data block pointers]
   │ ...                 │
   └─────────────────────┘

Supports: 1024 × 1024 × 4KB = 4 GB per level. Two levels = 4 TB.

3. Combined Scheme (Unix inode style)

   Inode contains:
   ├─ 12 direct pointers      → 48 KB directly accessible
   ├─ 1 single-indirect ptr   → 4 MB more
   ├─ 1 double-indirect ptr   → 4 GB more
   └─ 1 triple-indirect ptr   → 4 TB more

The combined scheme optimizes for the common case (small files) while supporting huge files. We'll examine this in detail when studying Unix inodes.

Why 12 Direct Pointers?

Studies show most files are small. With 12 direct pointers and 4KB blocks, files up to 48KB need no indirect blocks—covering the vast majority of files. Larger files incur the overhead of indirect blocks only when necessary. This is a classic example of optimizing for the common case.

Free Space Management

The file organization module must track which blocks are free for allocation. When a file needs more space or when a new file is created, the free space manager efficiently locates and allocates available blocks. When files are deleted or truncated, blocks are returned to the free pool.

Free space management involves two key operations:

Allocation: Find n free blocks and mark them as used
Deallocation: Mark blocks as free when no longer needed

Several data structures can implement free space tracking, each with different trade-offs.

Bit Vector (Bitmap)

The most common approach uses a bit vector where each bit represents one block:

   Bit position:  0 1 2 3 4 5 6 7 8 9 ...
   Bit value:     1 1 0 1 1 0 0 1 0 0 ...
   
   1 = allocated, 0 = free
   Free blocks: 2, 5, 6, 8, 9, ...

Storage requirement: For a 1 TB disk with 4 KB blocks:

Blocks: 1 TB / 4 KB = 256 million blocks
Bitmap size: 256M bits = 32 MB

Operations:

Find free block: Scan for 0 bit (fast with hardware support: find-first-zero instruction)
Allocate: Set bit to 1
Free: Clear bit to 0
Find n contiguous blocks: Scan for n consecutive 0s

Advantages: Compact, fast with proper caching, naturally finds contiguous blocks.

Disadvantages: Must scan to find free blocks; entire bitmap may not fit in memory for huge disks.

Linked Free List

Each free block contains a pointer to the next free block:

   Free list head: block 5
   
   Block 5: [ptr: 12] → Block 12: [ptr: 37] → Block 37: [ptr: NULL]
   
   Allocation: Remove head, update list
   Deallocation: Insert at head

Advantages: No extra space needed (pointers stored in free blocks themselves). O(1) allocation/deallocation of single blocks.

Disadvantages: Poor locality; can't efficiently find contiguous blocks; must traverse list.

Grouping

Variation where first free block stores addresses of n free blocks, with the last entry pointing to another such block:

   Block 5: [12, 37, 89, 102, 155, →Block 200]
   Block 200: [201, 203, 205, 207, 209, →Block 300]

Reduces traversal overhead by grouping multiple free block addresses together.

Counting

Stores (start, count) pairs for contiguous free regions:

   Free Space Table:
   ┌─────────┬───────┐
   │ Start   │ Count │
   ├─────────┼───────┤
   │    5    │   3   │  ← Blocks 5, 6, 7 are free
   │   15    │  12   │  ← Blocks 15-26 are free
   │  100    │   7   │  ← Blocks 100-106 are free
   └─────────┴───────┘

Advantages: Very compact when free space is contiguous. Efficient for allocating contiguous blocks.

Disadvantages: Requires merging adjacent regions on deallocation. List grows when disk becomes fragmented.

Modern Hybrid Approaches

Modern file systems like ZFS and Btrfs use sophisticated free space management combining B-trees, space maps, and other structures. They optimize for both scattered allocations (many random writes) and contiguous allocations (streaming writes), adapting to workload patterns.

Extent-Based Allocation: The Modern Approach

Extent-based allocation is the dominant strategy in modern file systems (ext4, XFS, NTFS, Btrfs, APFS). An extent is a contiguous run of blocks described by a single (start, length) pair.

   Traditional indexed allocation (per-block pointers):
   [ptr 0][ptr 1][ptr 2][ptr 3][ptr 4][ptr 5][ptr 6][ptr 7]...
     ↓      ↓      ↓      ↓      ↓      ↓      ↓       ↓
   100    101    102    103    104    105    106    107
   
   Extent-based allocation:
   [start: 100, length: 8]  ← One extent describes all 8 blocks

The insight is brilliant: why store 8 pointers when blocks are contiguous? An extent reduces metadata overhead dramatically while preserving the ability to handle non-contiguous allocation when necessary.

Extent Structure

A typical extent entry contains:

   Extent:
   ┌────────────────────────────────────────────────────────────────┐
   │ Logical Block Start │ Physical Block Start │ Length │ Flags   │
   │      (32-48 bits)   │      (32-48 bits)    │ (15-24)│ (1-8)   │
   └────────────────────────────────────────────────────────────────┘

Logical block start: First logical block in this extent (the file's view) Physical block start: First physical block on disk Length: Number of contiguous blocks Flags: Pre-allocated, unwritten, encrypted, etc.

Extent Trees

For very large files with many extents, a flat list becomes inefficient. Modern file systems organize extents in B-trees or similar structures:

            ┌─────────────────────────┐
            │   Extent Tree Root      │
            │ (stored in inode)       │
            └───────────┬─────────────┘
                        │
         ┌──────────────┼──────────────┐
         ↓              ↓              ↓
   ┌──────────┐   ┌──────────┐   ┌──────────┐
   │ Leaf Node│   │ Leaf Node│   │ Leaf Node│
   │ Extents  │   │ Extents  │   │ Extents  │
   │ 0-1000   │   │ 1001-    │   │ 2001-    │
   │          │   │ 2000     │   │ 3000     │
   └──────────┘   └──────────┘   └──────────┘

Lookup is O(log n) where n is the number of extents, not the number of blocks. For most files with few extents, the entire tree fits in the inode itself.

Extent Advantages

•Massive metadata reduction — A 4GB contiguous file needs ONE extent, not 1 million block pointers
•Better sequential I/O — Contiguous allocation is the default, not a special case
•Efficient for huge files — Scales to petabyte files with minimal metadata
•Reduced fragmentation tracking — Pre-allocation reserves contiguous space
•Natural fit for modern storage — SSDs benefit from knowing extent boundaries (TRIM)

Fallocate: Pre-allocation in Action

The fallocate() system call lets applications reserve contiguous extents before writing. This is crucial for databases and virtual machine images—ensuring data is contiguous even when written incrementally. Without pre-allocation, files written over time become fragmented as other files interleave with them.

Interface Between Layers

The file organization module serves as a bridge, receiving requests from the logical file system (layer 5) and issuing requests to the basic file system (layer 3). Understanding this interface clarifies each layer's responsibility.

Requests FROM the Logical File System

   Input: (inode, logical_block_number, operation, buffer)
   
   Examples:
   - Read block 5 of inode 12345 into buffer
   - Write buffer to block 10 of inode 12345
   - Allocate 3 more blocks for inode 12345
   - Truncate inode 12345 to 7 blocks

The logical file system deals in file identifiers (inodes) and logical block numbers. It has no knowledge of physical disk layout.

Requests TO the Basic File System

   Output: (physical_block_number, operation, buffer)
   
   Examples:
   - Read physical block 1042 into buffer
   - Write buffer to physical block 2561
   - Read physical blocks 1042-1046 into buffer (prefetch)

The basic file system deals in physical block numbers. It doesn't know about files—only that certain blocks need to be read or written.

The Translation Process

   Logical FS Request: "Read byte 100000 of file inode 789"
                              │
                              ▼
   ┌─────────────────────────────────────────────────────────────┐
   │                  FILE ORGANIZATION MODULE                   │
   ├─────────────────────────────────────────────────────────────┤
   │ 1. Calculate: logical_block = 100000 / 4096 = 24            │
   │                                                             │
   │ 2. Look up inode 789's extent tree                          │
   │    Found: Extent(logical: 20-30, physical: 5000-5010)       │
   │                                                             │
   │ 3. Calculate: physical_block = 5000 + (24 - 20) = 5004      │
   │                                                             │
   │ 4. Issue: "Read physical block 5004"                        │
   └─────────────────────────────────────────────────────────────┘
                              │
                              ▼
   Basic FS Request: "Read physical block 5004"

This translation happens on every file I/O operation. Efficient caching of extent trees and allocation metadata is essential for performance.

The Abstraction Boundary

The file organization module is the last layer that understands 'files'. The basic file system below it only sees generic blocks. This separation allows the same basic file system code to work with different allocation strategies—swap them in the file organization module without changing anything below.

Summary: The File Organization Module

We've explored the file organization module—the layer that bridges the logical file abstraction and physical storage reality. Let's consolidate the key concepts:

Key Takeaways

•The file organization module manages block-level mapping — It translates logical blocks within files to physical blocks on storage.
•Block size represents a fundamental trade-off — Larger blocks improve throughput, smaller blocks reduce waste; 4KB balances modern workloads.
•Contiguous allocation is simple but inflexible — Great for read-only data, terrible for dynamic files due to external fragmentation.
•Linked allocation eliminates fragmentation but kills random access — FAT optimizes by moving pointers to a dedicated table.
•Indexed allocation provides O(1) random access — Multi-level schemes extend file size limits while optimizing for common cases.
•Extent-based allocation dominates modern systems — Combines contiguous allocation's performance with indexed allocation's flexibility.
•Free space management is crucial for allocation efficiency — Bitmaps, linked lists, and counting methods each optimize for different access patterns.
•The layered design enables independent evolution — Allocation strategies can be changed without affecting the logical or basic file system layers.

What's Next:

The file organization module translates file requests into physical block requests. But how do those block requests actually reach the storage device? That's the job of the basic file system—our next topic. We'll explore buffer management, I/O scheduling, and the critical role of caching in file system performance.

Page Complete

You now understand the file organization module—its allocation strategies, free space management techniques, and the critical logical-to-physical mapping that makes file systems work. Next, we descend another layer to understand how block-level I/O is actually performed.

2 / 5

Loading learning content...

Operating SystemsFile System Layers

File System Layers

LevelIntermediate

Duration90 mins

TopicFile System Layers

2 / 5

File Organization Module

Bridging the Logical and Physical Worlds

What You Will Learn

The Role of the File Organization Module

This layer has two fundamental responsibilities:

Logical-to-physical block mapping: Given a file identifier and a logical block number within that file, determine the physical block number on the storage device.
Free space management: Track which blocks on the storage device are available for allocation, and manage the allocation/deallocation of blocks as files grow, shrink, or are deleted.

The Mapping Challenge

To appreciate the file organization module's complexity, consider what it must accomplish:

   Logical View (per file)          Physical Reality (disk)
   ──────────────────────────        ─────────────────────────────
   File A: [Block 0][Block 1][Block 2]     Block 0: belongs to File C
   File B: [Block 0][Block 1]              Block 1: belongs to File A, block 2
   File C: [Block 0][Block 1][Block 2][Block 3]  Block 2: belongs to File A, block 0
                                           Block 3: FREE
                                           Block 4: belongs to File B, block 1
                                           Block 5: belongs to File C, block 0
                                           Block 6: belongs to File C, block 1
                                           Block 7: belongs to File B, block 0
                                           Block 8: FREE
                                           Block 9: belongs to File A, block 1
                                           ...

The mapping is arbitrary—logical block 0 of a file could be at any physical location. The file organization module maintains data structures that encode these mappings efficiently, supporting:

Fast sequential access: Reading block n should quickly lead to block n+1
Fast random access: Reading an arbitrary block shouldn't require scanning from the beginning
Efficient allocation: Finding and assigning free blocks shouldn't degrade as the disk fills
Minimal overhead: The mapping metadata shouldn't consume excessive disk space

Why Not Just Store Files Contiguously?

Block Allocation Fundamentals

Before diving into specific allocation strategies, we need to understand the basic concepts that all strategies share.

Block Size: The Fundamental Trade-off

The block size trade-off:

Block Size	Advantages	Disadvantages
Small (1-2 KB)	Less internal fragmentation, efficient for small files	More blocks = more metadata, more seeks
Large (64-128 KB)	Fewer blocks = less metadata, better sequential performance	Wastes space for small files
Typical (4 KB)	Balanced for general workloads	Compromise isn't optimal for any extreme

Some file systems support variable block sizes or block suballocation to reduce waste for small files. XFS, for example, can use different block sizes for different parts of the file system.

Internal vs External Fragmentation

Internal fragmentation occurs within allocated blocks. A 5 KB file in a file system with 4 KB blocks uses 2 blocks (8 KB total), wasting 3 KB.

External fragmentation occurs in the free space. Even if there's enough total free space, it may be scattered in pieces too small to satisfy an allocation request.

   External Fragmentation Example:
   ───────────────────────────────────────────────
   [USED][FREE][USED][FREE][USED][FREE][USED][FREE]
   
   Total free: 4 blocks
   Largest contiguous: 1 block
   Request for 3 contiguous blocks: FAILS

Fragmentation in Practice

Contiguous Allocation Strategy

Contiguous allocation is the simplest allocation strategy: each file occupies a set of contiguous blocks on disk. The file organization module only needs to store two values for each file:

Starting block address
Length in blocks

   Directory Entry: report.txt → (start: 100, length: 5)
   
   Disk Layout:
   [97][98][99][100    101    102    103    104   ][105][106]
                ↑──────────────────────────────────↑
                         report.txt (blocks 0-4)

Mapping is trivial: To read logical block n of the file, access physical block (start + n).

Random access is immediate: O(1) time to calculate any block's physical location.

Advantages

•Excellent sequential performance — Reading the entire file requires no seeking; blocks are adjacent
•Simple implementation — Minimal metadata, trivial mapping calculation
•Predictable performance — No variability based on allocation patterns
•Efficient for read-only data — Ideal for CD-ROMs, installation media

Disadvantages

•File size must be known at creation — Can't easily grow files
•External fragmentation — Free space becomes scattered
•Compaction required — Must defragment to reclaim fragmented space
•Poor for dynamic workloads — Terrible for frequently modified files

The External Fragmentation Problem

   Initial state:
   [Apple][Banana][Cherry][Date][Elderberry]
   
   After deleting Banana and Date:
   [Apple][  FREE  ][Cherry][  FREE  ][Elderberry]
   
   Try to create Fig (needs 3 blocks):
   FAILS - no contiguous space of size 3
   
   But total free space = 4 blocks

Where Contiguous Allocation Shines

Despite its limitations, contiguous allocation is used when:

Files are created once and never modified — CD-ROM/DVD (ISO 9660)
File sizes are known in advance — Database tablespaces with pre-allocated extents
Maximum performance is critical — Real-time systems, multimedia streaming

Linked Allocation Strategy

Linked allocation solves the external fragmentation problem by allowing files to use any available blocks, connected via pointers. Each block contains a pointer to the next block in the file.

   Directory Entry: report.txt → (start: 100, end: 73)
   
   Block 100: [data...][ptr:205] → Block 205: [data...][ptr:73] → Block 73: [data...][NULL]
   
   Physical layout (blocks scattered):
   [...][73    ][...][100   ][...][205   ][...]
           ↑           ↑           ↑
           3rd block   1st block   2nd block

The file's directory entry only needs to store the first block address (and optionally the last for efficient appending). Each block dedicates a portion of its space to the next-block pointer.

Advantages

•No external fragmentation — Any free block can be used by any file
•Files can grow dynamically — Just append more blocks to the chain
•Simple free space management — Allocate any available block
•No compaction needed — Works well regardless of allocation history

Disadvantages

•Terrible random access — O(n) to reach block n (must traverse chain)
•Space overhead for pointers — Each block loses some capacity
•Reliability concerns — One corrupted pointer breaks the chain
•Poor sequential performance — Seeks between scattered blocks

The Random Access Problem

To read logical block n of a file, the file organization module must:

Start at the first block
Follow n pointers through the chain
Finally read the target block

This is O(n) time complexity—unacceptable for large files or random access patterns.

   Reading block 1000 of a file:
   - Linked allocation: 1000 disk reads (following pointers)
   - Contiguous allocation: 1 disk read

For a 1 GB file with 4 KB blocks, accessing the last block requires traversing 262,144 pointers. This could take several minutes on an HDD.

The FAT Optimization

The File Allocation Table (FAT) file system addresses the random access problem by moving all pointers out of the data blocks and into a dedicated table in memory:

   File Allocation Table (in memory):
   ┌───────┬────────────┐
   │ Block │ Next Block │
   ├───────┼────────────┤
   │   0   │  FREE      │
   │   1   │  5         │  ← File A starts at block 1
   │   2   │  FREE      │
   │   3   │  EOF       │  ← End of File B
   │   4   │  3         │  ← File B starts at block 4
   │   5   │  EOF       │  ← End of File A
   │   6   │  FREE      │
   └───────┴────────────┘

Now traversing the chain only requires memory access—no disk I/O. Random access becomes much faster, though still O(n) in terms of table entries traversed.

FAT limitations:

The entire FAT must fit in memory for reasonable performance
FAT16: 2^16 entries × 2 bytes = 128 KB (manageable)
FAT32: 2^28 entries × 4 bytes = 1 GB (too large for full caching)

This is why FAT remains useful for small removable media but is inadequate for modern disk sizes.

FAT's Enduring Legacy

Indexed Allocation Strategy

   Directory Entry: report.txt → (index block: 500)
   
   Index Block 500:
   ┌───────────────────┐
   │ Block 0: 105      │
   │ Block 1: 237      │
   │ Block 2: 89       │
   │ Block 3: 412      │
   │ Block 4: NULL     │  ← File has 4 blocks
   │ ...               │
   └───────────────────┘
   
   To read logical block 2: Look up index[2] = 89, read block 89

Advantages

•O(1) random access — Direct lookup via index
•No external fragmentation — Any free block can be used
•Dynamic file growth — Add entries to index as needed
•All pointers in one place — Easier to manage than linked lists

Disadvantages

•Index block overhead for small files — A 1-byte file still needs an index block
•Maximum file size limited by index block size — With 4KB blocks and 4-byte pointers, max ~4MB
•Extra I/O for every access — Must read index before data
•Index block is single point of failure — Corruption loses entire file

The File Size Problem

With a 4 KB block and 4-byte block pointers, an index block can hold:

   4096 bytes / 4 bytes per pointer = 1024 pointers
   1024 pointers × 4 KB per block = 4 MB maximum file size

4 MB is far too small for modern files. Three strategies address this:

1. Linked Index Blocks

   Index Block 1 → Block 2 → Block 3 → NULL
       ↓               ↓          ↓
   [1024 ptrs]    [1024 ptrs] [512 ptrs]

Limitation: Random access degrades to O(n/1024) for blocks beyond the first index block.

2. Multi-Level Index (Hierarchical)

   Primary Index Block:
   ┌─────────────────────┐
   │ → Secondary Index 1 │ → [1024 data block pointers]
   │ → Secondary Index 2 │ → [1024 data block pointers]
   │ → Secondary Index 3 │ → [1024 data block pointers]
   │ ...                 │
   └─────────────────────┘

Supports: 1024 × 1024 × 4KB = 4 GB per level. Two levels = 4 TB.

3. Combined Scheme (Unix inode style)

   Inode contains:
   ├─ 12 direct pointers      → 48 KB directly accessible
   ├─ 1 single-indirect ptr   → 4 MB more
   ├─ 1 double-indirect ptr   → 4 GB more
   └─ 1 triple-indirect ptr   → 4 TB more

The combined scheme optimizes for the common case (small files) while supporting huge files. We'll examine this in detail when studying Unix inodes.

Why 12 Direct Pointers?

Free Space Management

Free space management involves two key operations:

Allocation: Find n free blocks and mark them as used
Deallocation: Mark blocks as free when no longer needed

Several data structures can implement free space tracking, each with different trade-offs.

Bit Vector (Bitmap)

The most common approach uses a bit vector where each bit represents one block:

   Bit position:  0 1 2 3 4 5 6 7 8 9 ...
   Bit value:     1 1 0 1 1 0 0 1 0 0 ...
   
   1 = allocated, 0 = free
   Free blocks: 2, 5, 6, 8, 9, ...

Storage requirement: For a 1 TB disk with 4 KB blocks:

Blocks: 1 TB / 4 KB = 256 million blocks
Bitmap size: 256M bits = 32 MB

Operations:

Find free block: Scan for 0 bit (fast with hardware support: find-first-zero instruction)
Allocate: Set bit to 1
Free: Clear bit to 0
Find n contiguous blocks: Scan for n consecutive 0s

Advantages: Compact, fast with proper caching, naturally finds contiguous blocks.

Disadvantages: Must scan to find free blocks; entire bitmap may not fit in memory for huge disks.

Linked Free List

Each free block contains a pointer to the next free block:

   Free list head: block 5
   
   Block 5: [ptr: 12] → Block 12: [ptr: 37] → Block 37: [ptr: NULL]
   
   Allocation: Remove head, update list
   Deallocation: Insert at head

Advantages: No extra space needed (pointers stored in free blocks themselves). O(1) allocation/deallocation of single blocks.

Disadvantages: Poor locality; can't efficiently find contiguous blocks; must traverse list.

Grouping

Variation where first free block stores addresses of n free blocks, with the last entry pointing to another such block:

   Block 5: [12, 37, 89, 102, 155, →Block 200]
   Block 200: [201, 203, 205, 207, 209, →Block 300]

Reduces traversal overhead by grouping multiple free block addresses together.

Counting

Stores (start, count) pairs for contiguous free regions:

   Free Space Table:
   ┌─────────┬───────┐
   │ Start   │ Count │
   ├─────────┼───────┤
   │    5    │   3   │  ← Blocks 5, 6, 7 are free
   │   15    │  12   │  ← Blocks 15-26 are free
   │  100    │   7   │  ← Blocks 100-106 are free
   └─────────┴───────┘

Advantages: Very compact when free space is contiguous. Efficient for allocating contiguous blocks.

Disadvantages: Requires merging adjacent regions on deallocation. List grows when disk becomes fragmented.

Modern Hybrid Approaches

Extent-Based Allocation: The Modern Approach

Extent-based allocation is the dominant strategy in modern file systems (ext4, XFS, NTFS, Btrfs, APFS). An extent is a contiguous run of blocks described by a single (start, length) pair.

   Traditional indexed allocation (per-block pointers):
   [ptr 0][ptr 1][ptr 2][ptr 3][ptr 4][ptr 5][ptr 6][ptr 7]...
     ↓      ↓      ↓      ↓      ↓      ↓      ↓       ↓
   100    101    102    103    104    105    106    107
   
   Extent-based allocation:
   [start: 100, length: 8]  ← One extent describes all 8 blocks

Extent Structure

A typical extent entry contains:

   Extent:
   ┌────────────────────────────────────────────────────────────────┐
   │ Logical Block Start │ Physical Block Start │ Length │ Flags   │
   │      (32-48 bits)   │      (32-48 bits)    │ (15-24)│ (1-8)   │
   └────────────────────────────────────────────────────────────────┘

Extent Trees

For very large files with many extents, a flat list becomes inefficient. Modern file systems organize extents in B-trees or similar structures:

            ┌─────────────────────────┐
            │   Extent Tree Root      │
            │ (stored in inode)       │
            └───────────┬─────────────┘
                        │
         ┌──────────────┼──────────────┐
         ↓              ↓              ↓
   ┌──────────┐   ┌──────────┐   ┌──────────┐
   │ Leaf Node│   │ Leaf Node│   │ Leaf Node│
   │ Extents  │   │ Extents  │   │ Extents  │
   │ 0-1000   │   │ 1001-    │   │ 2001-    │
   │          │   │ 2000     │   │ 3000     │
   └──────────┘   └──────────┘   └──────────┘

Lookup is O(log n) where n is the number of extents, not the number of blocks. For most files with few extents, the entire tree fits in the inode itself.

Extent Advantages

•Massive metadata reduction — A 4GB contiguous file needs ONE extent, not 1 million block pointers
•Better sequential I/O — Contiguous allocation is the default, not a special case
•Efficient for huge files — Scales to petabyte files with minimal metadata
•Reduced fragmentation tracking — Pre-allocation reserves contiguous space
•Natural fit for modern storage — SSDs benefit from knowing extent boundaries (TRIM)

Fallocate: Pre-allocation in Action

Interface Between Layers

Requests FROM the Logical File System

   Input: (inode, logical_block_number, operation, buffer)
   
   Examples:
   - Read block 5 of inode 12345 into buffer
   - Write buffer to block 10 of inode 12345
   - Allocate 3 more blocks for inode 12345
   - Truncate inode 12345 to 7 blocks

The logical file system deals in file identifiers (inodes) and logical block numbers. It has no knowledge of physical disk layout.

Requests TO the Basic File System

   Output: (physical_block_number, operation, buffer)
   
   Examples:
   - Read physical block 1042 into buffer
   - Write buffer to physical block 2561
   - Read physical blocks 1042-1046 into buffer (prefetch)

The basic file system deals in physical block numbers. It doesn't know about files—only that certain blocks need to be read or written.

The Translation Process

   Logical FS Request: "Read byte 100000 of file inode 789"
                              │
                              ▼
   ┌─────────────────────────────────────────────────────────────┐
   │                  FILE ORGANIZATION MODULE                   │
   ├─────────────────────────────────────────────────────────────┤
   │ 1. Calculate: logical_block = 100000 / 4096 = 24            │
   │                                                             │
   │ 2. Look up inode 789's extent tree                          │
   │    Found: Extent(logical: 20-30, physical: 5000-5010)       │
   │                                                             │
   │ 3. Calculate: physical_block = 5000 + (24 - 20) = 5004      │
   │                                                             │
   │ 4. Issue: "Read physical block 5004"                        │
   └─────────────────────────────────────────────────────────────┘
                              │
                              ▼
   Basic FS Request: "Read physical block 5004"

This translation happens on every file I/O operation. Efficient caching of extent trees and allocation metadata is essential for performance.

The Abstraction Boundary

Summary: The File Organization Module

We've explored the file organization module—the layer that bridges the logical file abstraction and physical storage reality. Let's consolidate the key concepts:

Key Takeaways

•The file organization module manages block-level mapping — It translates logical blocks within files to physical blocks on storage.
•Block size represents a fundamental trade-off — Larger blocks improve throughput, smaller blocks reduce waste; 4KB balances modern workloads.
•Contiguous allocation is simple but inflexible — Great for read-only data, terrible for dynamic files due to external fragmentation.
•Linked allocation eliminates fragmentation but kills random access — FAT optimizes by moving pointers to a dedicated table.
•Indexed allocation provides O(1) random access — Multi-level schemes extend file size limits while optimizing for common cases.
•Extent-based allocation dominates modern systems — Combines contiguous allocation's performance with indexed allocation's flexibility.
•Free space management is crucial for allocation efficiency — Bitmaps, linked lists, and counting methods each optimize for different access patterns.
•The layered design enables independent evolution — Allocation strategies can be changed without affecting the logical or basic file system layers.

What's Next:

Page Complete

2 / 5