Operating SystemsFile System Structures

Unix inode Structure

LevelIntermediate

Duration75 mins

TopicFile System Structures

4 / 5

Indirect Blocks

Beyond 48KB: The Need for Indirection

Direct block pointers are elegant and fast, but they have an inherent limitation: a fixed number of pointers in a fixed-size inode can only address a limited amount of data. With 12 direct pointers and 4KB blocks, we can address only 48KB—adequate for most configuration files and source code, but hopelessly insufficient for documents, images, databases, or media files.

The Unix designers needed a solution that:

Extends addressing capacity dramatically
Adds minimal overhead for small files
Maintains reasonable access speed for larger files
Works within the fixed inode size constraint

Their solution was indirect blocks—a technique that repurposes data blocks to hold additional pointers, essentially creating a tree of block addresses. This page focuses on single indirect blocks, the first and most commonly encountered level of indirection.

What You Will Learn

By the end of this page, you will understand: how the single indirect pointer creates an extra level of addressing; the mathematics of calculating indirect block capacity; the I/O cost of indirection—one extra read per access; how filesystems cache indirect blocks to minimize overhead; and when files transition from direct to indirect addressing.

The Indirection Concept

Indirection is a fundamental computer science technique: instead of storing a value directly, you store a pointer to where the value is. In the context of file block addressing:

Direct pointer: Points directly to a data block containing file content
Indirect pointer: Points to a block that contains more pointers

The 13th pointer in the inode's block array (index 12) is the single indirect pointer. Instead of pointing to file data, it points to a block filled entirely with direct pointers. Each of those pointers then points to an actual data block.

Converting Mermaid diagram...

The key insight: A single indirect block converts one inode pointer slot into (block_size / pointer_size) pointers. With 4KB blocks and 4-byte pointers, that's 1024 additional block addresses from a single inode entry.

Pointers per indirect block = Block Size / Pointer Size
                            = 4096 bytes / 4 bytes
                            = 1024 pointers

Each of these 1024 pointers addresses one data block, so a single indirect block provides access to:

Single Indirect Capacity = 1024 × 4KB = 4MB

Combined with 12 direct blocks (48KB), a file using direct + single indirect addressing can be up to 48KB + 4MB = ~4.05MB.

Mathematical Analysis of Addressing Capacity

Let's formalize the addressing calculations. Define:

B = Block size (bytes)
P = Pointer size (bytes, typically 4 or 8)
N = Number of direct pointers (typically 12)
k = Pointers per block = B / P

Address ranges with various configurations:

Addressing Capacity: Direct + Single Indirect
Block Size (B)	Pointer Size (P)	Pointers/Block (k)	Direct Capacity	Single Indirect Adds	Total with Single Indirect
1 KB	4 bytes	256	12 KB	256 KB	268 KB
2 KB	4 bytes	512	24 KB	1 MB	~1.02 MB
4 KB	4 bytes	1024	48 KB	4 MB	~4.05 MB
4 KB	8 bytes (64-bit)	512	48 KB	2 MB	~2.05 MB
8 KB	4 bytes	2048	96 KB	16 MB	~16.1 MB
64 KB	8 bytes	8192	768 KB	512 MB	~513 MB

Logical block number mapping:

When accessing byte offset O in a file:

Logical block number L = O / B

If L < 12:                    → Use direct pointer block[L]
If 12 ≤ L < 12 + k:           → Use single indirect
                                 Offset within indirect: L - 12
                                 Read indirect block, get pointer[L - 12]

Example with 4KB blocks:

Accessing byte 200,000:

Logical block = 200,000 / 4096 = 48 (with remainder 3152)

48 ≥ 12, so we use single indirect
Offset within indirect = 48 - 12 = 36

Steps:
1. Read inode->block[12] → get indirect block number
2. Read indirect block, extract pointer[36]
3. Read data from that pointer's block at offset 3152

The Transition Point

The transition from direct to single indirect addressing happens exactly at byte 49152 (with 4KB blocks). Files that grow past 48KB incur the cost of indirect block allocation and extra I/O for subsequent accesses to the extended portion.

Access Pattern Analysis

Indirection adds I/O cost. Let's carefully analyze what happens when accessing data through single indirect blocks:

Reading byte 100,000 from a 1MB file (first access, no caching):

Step 1: Open file
        → Read inode from disk (1 I/O)
        → Inode now cached in memory

Step 2: Calculate location
        → Byte 100,000 / 4096 = block 24
        → 24 ≥ 12, so single indirect
        → Offset in indirect: 24 - 12 = 12

Step 3: Read indirect block
        → inode->block[12] = 5000
        → Read block 5000 from disk (1 I/O)
        → Indirect block now cached

Step 4: Get data block address
        → indirect_block[12] = 7500

Step 5: Read data block
        → Read block 7500 at offset 100000 % 4096 = 1312 (1 I/O)
        → Return data

Total I/O: 3 reads (inode + indirect + data)
Compare to direct block: 2 reads (inode + data)
Overhead: +1 read (50% more I/O)

Indirect Block Structure in Detail

An indirect block is simply a data block repurposed to hold pointers. It has no special header or metadata—just a contiguous array of block numbers:

// Conceptual structure of a single indirect block
// (not an explicit struct in the filesystem—it's just an array of integers)

#define BLOCK_SIZE 4096
#define POINTER_SIZE 4
#define PTRS_PER_BLOCK (BLOCK_SIZE / POINTER_SIZE)  // 1024

// An indirect block is just:
u32 indirect_block[PTRS_PER_BLOCK];
// where each entry is a physical block number, or 0 for sparse holes

single_indirect_access.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/**
 * Read data from a file using single indirect block addressing
 * This demonstrates the core algorithm for indirect block traversal
 */
ssize_t read_single_indirect(struct inode *inode, 
                              u32 logical_block, 
                              void *buffer, 
                              size_t bytes) {
    const u32 PTRS_PER_BLOCK = inode->sb->block_size / sizeof(u32);
    
    // Validate logical block is in single indirect range
    if (logical_block < DIRECT_BLOCKS || 
        logical_block >= DIRECT_BLOCKS + PTRS_PER_BLOCK) {
        return -EINVAL;  // Not in single indirect range
    }
    
    // Get the single indirect block pointer from inode
    blkcnt_t indirect_block_num = inode->block[12];
    
    if (indirect_block_num == 0) {
        // Sparse file: no indirect block allocated
        // The entire single-indirect region is a hole
        memset(buffer, 0, bytes);
        return bytes;
    }
    
    // Read the indirect block (may be cached)
    u32 *indirect_block = read_block_cached(indirect_block_num);
    if (!indirect_block) {
        return -EIO;
    }
    
    // Calculate offset within indirect block
    u32 index = logical_block - DIRECT_BLOCKS;
    
    // Get the actual data block number
    blkcnt_t data_block_num = indirect_block[index];
    
    if (data_block_num == 0) {
        // Sparse: this specific block is a hole
        memset(buffer, 0, bytes);
        return bytes;
    }
    
    // Read the data block
    return read_data_block(data_block_num, buffer, bytes);
}

Critical implementation details:

Block caching is essential — Without caching, every access through single indirect would require 2 disk reads. The buffer cache stores the indirect block after first access.
Sparse file support — A zero entry in the indirect block means that logical block is a hole. No I/O needed—return zeros.
Atomic consistency — The indirect block is a single unit that can be written atomically. This simplifies crash recovery.
Allocation on demand — The indirect block itself is allocated only when first needed (when file grows past 48KB).

Why Not More Direct Pointers?

You might wonder: why use indirection at all? Why not just make the inode bigger with more direct pointers? The answer: fixed inode size enables O(1) inode access. If inodes varied in size based on file size, finding inode N would require O(n) scanning. The indirection scheme keeps inodes fixed while supporting arbitrarily large files.

Space Accounting with Indirect Blocks

Indirect blocks consume disk space themselves. This affects how we account for file storage:

Example: A 1MB file

Logical view: 1MB = 1,048,576 bytes

Actual block allocation:
- Data blocks: 256 × 4KB = 1,048,576 bytes
- Single indirect block: 1 × 4KB = 4,096 bytes
- Total disk usage: 1,052,672 bytes

Overhead: 4KB / 1MB = 0.39%

Indirect Block Overhead for Various File Sizes
File Size	Data Blocks	Indirect Blocks	Total Blocks	Overhead
48 KB (no indirect)	12	0	12	0%
52 KB (just past direct)	13	1	14	7.7%
100 KB	25	1	26	4.0%
500 KB	125	1	126	0.8%
1 MB	256	1	257	0.39%
4 MB	1024	1	1025	0.10%
4.05 MB (max single ind.)	1036	1	1037	0.10%

Observations:

Overhead is proportionally small — For files fully utilizing single indirect (4MB), the overhead is under 0.1%.
Worst overhead at threshold — A 52KB file (just past direct blocks) wastes the most proportionally—the entire 4KB indirect block holds just one pointer.
One indirect per level — Single indirect always uses exactly one block, regardless of file size (up to its limit).

The inode's block count field:

$ stat 1mb_file.bin
  Size: 1048576     Blocks: 2056        IO Block: 4096   regular file
                           ^^^^
                           2056 × 512 = 1,052,672 bytes
                           = 1MB data + overhead

The Blocks count includes indirect blocks, so it accurately reflects actual disk usage.

The 52KB Cliff

Files just past 48KB pay a disproportionate overhead. A 52KB file uses 14 blocks total (13 data + 1 indirect), where a 48KB file uses only 12. This 17% jump in storage for 8% more data is worth considering for cache sizing or record formats where you control file sizes.

Buffer Cache and Indirect Block Optimization

The kernel's buffer cache is crucial for maintaining performance with indirect blocks. Let's examine how caching transforms the I/O pattern:

Without Caching (Theoretical Worst Case):

Access pattern: Random reads in 1MB file

Each read:
1. Read inode (maybe cached)
2. Read indirect block
3. Read data block

100 random reads = 300 I/O operations
Actual data retrieved: 100 × 4KB

Effective bandwidth: 33% of theoretical max

With Buffer Cache (Actual Behavior):

Access pattern: Random reads in 1MB file

First read:
- Read inode (1 I/O, then cached)
- Read indirect block (1 I/O, then cached)
- Read data block (1 I/O)

Subsequent 99 reads:
- Inode cached: 0 I/O
- Indirect block cached: 0 I/O
- Read data block (1 I/O each)

100 random reads = 101 I/O operations
With read-ahead: potentially much less

Cache hierarchy for file access:

┌─────────────────────────────────────────────────────────────┐
│  Application Buffer (user space)                            │
└─────────────────────────────────────────────────────────────┘
                              ↑
┌─────────────────────────────────────────────────────────────┐
│  Page Cache (data pages, very large, persists across calls) │
│  - Holds file data blocks                                   │
│  - Primary cache for file I/O                               │
└─────────────────────────────────────────────────────────────┘
                              ↑
┌─────────────────────────────────────────────────────────────┐
│  Buffer Cache (metadata blocks, integrated with page cache) │
│  - Holds indirect blocks                                    │
│  - Holds inode table blocks                                 │
│  - Holds directory blocks                                   │
└─────────────────────────────────────────────────────────────┘
                              ↑
┌─────────────────────────────────────────────────────────────┐
│  Inode Cache (parsed inode structures)                      │
│  - Holds struct inode for open files                        │
│  - Avoids re-parsing inode from disk blocks                 │
└─────────────────────────────────────────────────────────────┘
                              ↑
┌─────────────────────────────────────────────────────────────┐
│  Disk (actual storage)                                      │
└─────────────────────────────────────────────────────────────┘

The combined effect: after a file is opened and accessed once, subsequent random access anywhere in the file involves only data block reads—the indirect blocks remain cached.

Memory Pressure and Eviction

Under memory pressure, the kernel may evict cached blocks. Indirect blocks are good eviction candidates when files are closed and unlikely to be accessed soon. If evicted, the next access will incur the extra I/O. Long-running processes with stable file access patterns benefit most from cached indirect blocks.

Allocating and Growing Indirect Blocks

When a file grows past the direct block limit, the filesystem must allocate an indirect block. Let's trace this process:

allocate_indirect.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/**
 * Extend a file that needs its first indirect block
 * Simplified flow showing key operations
 */
int allocate_single_indirect(struct inode *inode, u32 target_logical_block) {
    if (target_logical_block < DIRECT_BLOCKS) {
        return -EINVAL;  // Should use direct blocks
    }
    
    // Check if indirect block already exists
    if (inode->block[12] == 0) {
        // Need to allocate the indirect block itself
        blkcnt_t new_indirect = allocate_block(inode->sb);
        if (new_indirect == 0) {
            return -ENOSPC;  // No space for indirect block
        }
        
        // Initialize indirect block with zeros
        void *block_data = get_block_buffer(new_indirect);
        memset(block_data, 0, inode->sb->block_size);
        mark_buffer_dirty(block_data);
        
        // Store indirect block number in inode
        inode->block[12] = new_indirect;
        mark_inode_dirty(inode);
    }
    
    // Now allocate the actual data block
    blkcnt_t new_data_block = allocate_block(inode->sb);
    if (new_data_block == 0) {
        // Complex: may need to free the indirect block if
        // it was just allocated and has no entries
        return -ENOSPC;
    }
    
    // Read/get indirect block
    u32 *indirect = get_block_buffer(inode->block[12]);
    
    // Store data block pointer in indirect block
    u32 index = target_logical_block - DIRECT_BLOCKS;
    indirect[index] = new_data_block;
    mark_buffer_dirty(indirect);
    
    // Update inode metadata
    inode->blocks += inode->sb->block_size / 512;
    if ((target_logical_block + 1) * inode->sb->block_size > inode->size) {
        inode->size = (target_logical_block + 1) * inode->sb->block_size;
    }
    mark_inode_dirty(inode);
    
    return 0;
}

Key allocation principles:

Lazy allocation — The indirect block is created only when first needed, not preemptively.
Two allocations required — Growing past the threshold requires allocating both the indirect block AND the data block.
Multiple dirty blocks — A single write past the boundary dirties: the inode, the new indirect block, and the new data block.
Transaction semantics — Journaling filesystems wrap these operations in a transaction for crash consistency.

Corner case: truncation

When a file is truncated below 48KB, the single indirect block becomes unused:

// Pseudocode for truncation
if (new_size <= DIRECT_BLOCKS * block_size) {
    if (inode->block[12] != 0) {
        // Free all blocks pointed to by indirect block
        u32 *indirect = read_block(inode->block[12]);
        for (int i = 0; i < PTRS_PER_BLOCK; i++) {
            if (indirect[i] != 0) {
                free_block(indirect[i]);
            }
        }
        // Free indirect block itself
        free_block(inode->block[12]);
        inode->block[12] = 0;
    }
}

Performance: Direct vs. Single Indirect

Let's quantify the performance difference between direct and single indirect access:

Performance Comparison: Direct vs Single Indirect (Cold Cache)
Metric	Direct Block	Single Indirect	Difference
Disk reads for first byte	2 (inode + data)	3 (inode + indirect + data)	+50%
Memory accesses per byte (cached)	2	3	+50%
Blocks traversed	0	1	+1
Pointer dereferences	1	2	+100%
Write amplification (appending)	2 blocks touched	3 blocks touched	+50%

Real-world impact (latency estimates):

Assume:
- inode cached after open(): 0 latency for inode
- SSD random read: 100μs
- HDD random read: 10ms
- Memory access: 100ns

Reading from direct block (cached):
- Memory: 100ns (get block pointer)
- Disk: 100μs (SSD) or 10ms (HDD)
- Total: ~100μs (SSD) or ~10ms (HDD)

Reading from single indirect (cold):
- Memory: 100ns (get indirect pointer)
- Disk: 100μs or 10ms (read indirect)
- Memory: 100ns (get data pointer from indirect)
- Disk: 100μs or 10ms (read data)
- Total: ~200μs (SSD) or ~20ms (HDD)

Reading from single indirect (indirect cached):
- Same as direct block: ~100μs or ~10ms

Key insight: On HDDs, the difference is dramatic (2× latency). On SSDs, it's still noticeable (2× latency). But once the indirect block is cached—which happens quickly for active files—there's no difference.

Prefetching Indirect Blocks

Smart filesystem implementations prefetch the indirect block when opening files larger than 48KB. By the time you actually read past the direct blocks, the indirect block is already cached. This hides the latency for sequential read patterns.

Summary: Single Indirect Blocks

We've thoroughly examined single indirect blocks—the first level of indirection in Unix file addressing. Let's consolidate:

Key Takeaways

•Single indirect adds ~4MB capacity — With 4KB blocks and 4-byte pointers, one indirect block holds 1024 pointers to 1024 data blocks = 4MB additional capacity.
•Extra I/O is amortized — The indirect block is read once and cached. Subsequent accesses use the cached version, eliminating the overhead.
•Sparse files still work — Zero entries in the indirect block represent holes, just like zero direct pointers in the inode.
•Space overhead is minimal — One 4KB block for addressing up to 4MB of data = 0.1% overhead at max utilization.
•Allocation is on-demand — The indirect block is only created when the file first grows past 48KB.
•Performance depends on cache state — Cold access doubles I/O; warm access is identical to direct blocks.

What's next:

Single indirect blocks extend capacity to ~4MB—enough for many files, but far from sufficient for databases, media files, or disk images. The next page explores double and triple indirect blocks, which apply the same indirection concept recursively to support files up to 4TB (with 32-bit block pointers) or effectively unlimited sizes (with 64-bit pointers).

Page Complete

You now understand single indirect blocks—how one inode pointer multiplies into 1024 block addresses through indirection. This concept forms the foundation for the deeper indirection levels we'll explore next, where recursive application of this pattern enables support for multi-terabyte files.

4 / 5

Loading learning content...

Operating SystemsFile System Structures

Unix inode Structure

LevelIntermediate

Duration75 mins

TopicFile System Structures

4 / 5

Indirect Blocks

Beyond 48KB: The Need for Indirection

The Unix designers needed a solution that:

Extends addressing capacity dramatically
Adds minimal overhead for small files
Maintains reasonable access speed for larger files
Works within the fixed inode size constraint

What You Will Learn

The Indirection Concept

Indirection is a fundamental computer science technique: instead of storing a value directly, you store a pointer to where the value is. In the context of file block addressing:

Direct pointer: Points directly to a data block containing file content
Indirect pointer: Points to a block that contains more pointers

Converting Mermaid diagram...

Pointers per indirect block = Block Size / Pointer Size
                            = 4096 bytes / 4 bytes
                            = 1024 pointers

Each of these 1024 pointers addresses one data block, so a single indirect block provides access to:

Single Indirect Capacity = 1024 × 4KB = 4MB

Combined with 12 direct blocks (48KB), a file using direct + single indirect addressing can be up to 48KB + 4MB = ~4.05MB.

Mathematical Analysis of Addressing Capacity

Let's formalize the addressing calculations. Define:

B = Block size (bytes)
P = Pointer size (bytes, typically 4 or 8)
N = Number of direct pointers (typically 12)
k = Pointers per block = B / P

Address ranges with various configurations:

Addressing Capacity: Direct + Single Indirect
Block Size (B)	Pointer Size (P)	Pointers/Block (k)	Direct Capacity	Single Indirect Adds	Total with Single Indirect
1 KB	4 bytes	256	12 KB	256 KB	268 KB
2 KB	4 bytes	512	24 KB	1 MB	~1.02 MB
4 KB	4 bytes	1024	48 KB	4 MB	~4.05 MB
4 KB	8 bytes (64-bit)	512	48 KB	2 MB	~2.05 MB
8 KB	4 bytes	2048	96 KB	16 MB	~16.1 MB
64 KB	8 bytes	8192	768 KB	512 MB	~513 MB

Logical block number mapping:

When accessing byte offset O in a file:

Logical block number L = O / B

If L < 12:                    → Use direct pointer block[L]
If 12 ≤ L < 12 + k:           → Use single indirect
                                 Offset within indirect: L - 12
                                 Read indirect block, get pointer[L - 12]

Example with 4KB blocks:

Accessing byte 200,000:

Logical block = 200,000 / 4096 = 48 (with remainder 3152)

48 ≥ 12, so we use single indirect
Offset within indirect = 48 - 12 = 36

Steps:
1. Read inode->block[12] → get indirect block number
2. Read indirect block, extract pointer[36]
3. Read data from that pointer's block at offset 3152

The Transition Point

Access Pattern Analysis

Indirection adds I/O cost. Let's carefully analyze what happens when accessing data through single indirect blocks:

Reading byte 100,000 from a 1MB file (first access, no caching):

Step 1: Open file
        → Read inode from disk (1 I/O)
        → Inode now cached in memory

Step 2: Calculate location
        → Byte 100,000 / 4096 = block 24
        → 24 ≥ 12, so single indirect
        → Offset in indirect: 24 - 12 = 12

Step 3: Read indirect block
        → inode->block[12] = 5000
        → Read block 5000 from disk (1 I/O)
        → Indirect block now cached

Step 4: Get data block address
        → indirect_block[12] = 7500

Step 5: Read data block
        → Read block 7500 at offset 100000 % 4096 = 1312 (1 I/O)
        → Return data

Total I/O: 3 reads (inode + indirect + data)
Compare to direct block: 2 reads (inode + data)
Overhead: +1 read (50% more I/O)

Indirect Block Structure in Detail

An indirect block is simply a data block repurposed to hold pointers. It has no special header or metadata—just a contiguous array of block numbers:

// Conceptual structure of a single indirect block
// (not an explicit struct in the filesystem—it's just an array of integers)

#define BLOCK_SIZE 4096
#define POINTER_SIZE 4
#define PTRS_PER_BLOCK (BLOCK_SIZE / POINTER_SIZE)  // 1024

// An indirect block is just:
u32 indirect_block[PTRS_PER_BLOCK];
// where each entry is a physical block number, or 0 for sparse holes

single_indirect_access.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/**
 * Read data from a file using single indirect block addressing
 * This demonstrates the core algorithm for indirect block traversal
 */
ssize_t read_single_indirect(struct inode *inode, 
                              u32 logical_block, 
                              void *buffer, 
                              size_t bytes) {
    const u32 PTRS_PER_BLOCK = inode->sb->block_size / sizeof(u32);
    
    // Validate logical block is in single indirect range
    if (logical_block < DIRECT_BLOCKS || 
        logical_block >= DIRECT_BLOCKS + PTRS_PER_BLOCK) {
        return -EINVAL;  // Not in single indirect range
    }
    
    // Get the single indirect block pointer from inode
    blkcnt_t indirect_block_num = inode->block[12];
    
    if (indirect_block_num == 0) {
        // Sparse file: no indirect block allocated
        // The entire single-indirect region is a hole
        memset(buffer, 0, bytes);
        return bytes;
    }
    
    // Read the indirect block (may be cached)
    u32 *indirect_block = read_block_cached(indirect_block_num);
    if (!indirect_block) {
        return -EIO;
    }
    
    // Calculate offset within indirect block
    u32 index = logical_block - DIRECT_BLOCKS;
    
    // Get the actual data block number
    blkcnt_t data_block_num = indirect_block[index];
    
    if (data_block_num == 0) {
        // Sparse: this specific block is a hole
        memset(buffer, 0, bytes);
        return bytes;
    }
    
    // Read the data block
    return read_data_block(data_block_num, buffer, bytes);
}

Critical implementation details:

Block caching is essential — Without caching, every access through single indirect would require 2 disk reads. The buffer cache stores the indirect block after first access.
Sparse file support — A zero entry in the indirect block means that logical block is a hole. No I/O needed—return zeros.
Atomic consistency — The indirect block is a single unit that can be written atomically. This simplifies crash recovery.
Allocation on demand — The indirect block itself is allocated only when first needed (when file grows past 48KB).

Why Not More Direct Pointers?

Space Accounting with Indirect Blocks

Indirect blocks consume disk space themselves. This affects how we account for file storage:

Example: A 1MB file

Logical view: 1MB = 1,048,576 bytes

Actual block allocation:
- Data blocks: 256 × 4KB = 1,048,576 bytes
- Single indirect block: 1 × 4KB = 4,096 bytes
- Total disk usage: 1,052,672 bytes

Overhead: 4KB / 1MB = 0.39%

Indirect Block Overhead for Various File Sizes
File Size	Data Blocks	Indirect Blocks	Total Blocks	Overhead
48 KB (no indirect)	12	0	12	0%
52 KB (just past direct)	13	1	14	7.7%
100 KB	25	1	26	4.0%
500 KB	125	1	126	0.8%
1 MB	256	1	257	0.39%
4 MB	1024	1	1025	0.10%
4.05 MB (max single ind.)	1036	1	1037	0.10%

Observations:

Overhead is proportionally small — For files fully utilizing single indirect (4MB), the overhead is under 0.1%.
Worst overhead at threshold — A 52KB file (just past direct blocks) wastes the most proportionally—the entire 4KB indirect block holds just one pointer.
One indirect per level — Single indirect always uses exactly one block, regardless of file size (up to its limit).

The inode's block count field:

$ stat 1mb_file.bin
  Size: 1048576     Blocks: 2056        IO Block: 4096   regular file
                           ^^^^
                           2056 × 512 = 1,052,672 bytes
                           = 1MB data + overhead

The Blocks count includes indirect blocks, so it accurately reflects actual disk usage.

The 52KB Cliff

Buffer Cache and Indirect Block Optimization

The kernel's buffer cache is crucial for maintaining performance with indirect blocks. Let's examine how caching transforms the I/O pattern:

Without Caching (Theoretical Worst Case):

Access pattern: Random reads in 1MB file

Each read:
1. Read inode (maybe cached)
2. Read indirect block
3. Read data block

100 random reads = 300 I/O operations
Actual data retrieved: 100 × 4KB

Effective bandwidth: 33% of theoretical max

With Buffer Cache (Actual Behavior):

Access pattern: Random reads in 1MB file

First read:
- Read inode (1 I/O, then cached)
- Read indirect block (1 I/O, then cached)
- Read data block (1 I/O)

Subsequent 99 reads:
- Inode cached: 0 I/O
- Indirect block cached: 0 I/O
- Read data block (1 I/O each)

100 random reads = 101 I/O operations
With read-ahead: potentially much less

Cache hierarchy for file access:

┌─────────────────────────────────────────────────────────────┐
│  Application Buffer (user space)                            │
└─────────────────────────────────────────────────────────────┘
                              ↑
┌─────────────────────────────────────────────────────────────┐
│  Page Cache (data pages, very large, persists across calls) │
│  - Holds file data blocks                                   │
│  - Primary cache for file I/O                               │
└─────────────────────────────────────────────────────────────┘
                              ↑
┌─────────────────────────────────────────────────────────────┐
│  Buffer Cache (metadata blocks, integrated with page cache) │
│  - Holds indirect blocks                                    │
│  - Holds inode table blocks                                 │
│  - Holds directory blocks                                   │
└─────────────────────────────────────────────────────────────┘
                              ↑
┌─────────────────────────────────────────────────────────────┐
│  Inode Cache (parsed inode structures)                      │
│  - Holds struct inode for open files                        │
│  - Avoids re-parsing inode from disk blocks                 │
└─────────────────────────────────────────────────────────────┘
                              ↑
┌─────────────────────────────────────────────────────────────┐
│  Disk (actual storage)                                      │
└─────────────────────────────────────────────────────────────┘

The combined effect: after a file is opened and accessed once, subsequent random access anywhere in the file involves only data block reads—the indirect blocks remain cached.

Memory Pressure and Eviction

Allocating and Growing Indirect Blocks

When a file grows past the direct block limit, the filesystem must allocate an indirect block. Let's trace this process:

allocate_indirect.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/**
 * Extend a file that needs its first indirect block
 * Simplified flow showing key operations
 */
int allocate_single_indirect(struct inode *inode, u32 target_logical_block) {
    if (target_logical_block < DIRECT_BLOCKS) {
        return -EINVAL;  // Should use direct blocks
    }
    
    // Check if indirect block already exists
    if (inode->block[12] == 0) {
        // Need to allocate the indirect block itself
        blkcnt_t new_indirect = allocate_block(inode->sb);
        if (new_indirect == 0) {
            return -ENOSPC;  // No space for indirect block
        }
        
        // Initialize indirect block with zeros
        void *block_data = get_block_buffer(new_indirect);
        memset(block_data, 0, inode->sb->block_size);
        mark_buffer_dirty(block_data);
        
        // Store indirect block number in inode
        inode->block[12] = new_indirect;
        mark_inode_dirty(inode);
    }
    
    // Now allocate the actual data block
    blkcnt_t new_data_block = allocate_block(inode->sb);
    if (new_data_block == 0) {
        // Complex: may need to free the indirect block if
        // it was just allocated and has no entries
        return -ENOSPC;
    }
    
    // Read/get indirect block
    u32 *indirect = get_block_buffer(inode->block[12]);
    
    // Store data block pointer in indirect block
    u32 index = target_logical_block - DIRECT_BLOCKS;
    indirect[index] = new_data_block;
    mark_buffer_dirty(indirect);
    
    // Update inode metadata
    inode->blocks += inode->sb->block_size / 512;
    if ((target_logical_block + 1) * inode->sb->block_size > inode->size) {
        inode->size = (target_logical_block + 1) * inode->sb->block_size;
    }
    mark_inode_dirty(inode);
    
    return 0;
}

Key allocation principles:

Lazy allocation — The indirect block is created only when first needed, not preemptively.
Two allocations required — Growing past the threshold requires allocating both the indirect block AND the data block.
Multiple dirty blocks — A single write past the boundary dirties: the inode, the new indirect block, and the new data block.
Transaction semantics — Journaling filesystems wrap these operations in a transaction for crash consistency.

Corner case: truncation

When a file is truncated below 48KB, the single indirect block becomes unused:

// Pseudocode for truncation
if (new_size <= DIRECT_BLOCKS * block_size) {
    if (inode->block[12] != 0) {
        // Free all blocks pointed to by indirect block
        u32 *indirect = read_block(inode->block[12]);
        for (int i = 0; i < PTRS_PER_BLOCK; i++) {
            if (indirect[i] != 0) {
                free_block(indirect[i]);
            }
        }
        // Free indirect block itself
        free_block(inode->block[12]);
        inode->block[12] = 0;
    }
}

Performance: Direct vs. Single Indirect

Let's quantify the performance difference between direct and single indirect access:

Performance Comparison: Direct vs Single Indirect (Cold Cache)
Metric	Direct Block	Single Indirect	Difference
Disk reads for first byte	2 (inode + data)	3 (inode + indirect + data)	+50%
Memory accesses per byte (cached)	2	3	+50%
Blocks traversed	0	1	+1
Pointer dereferences	1	2	+100%
Write amplification (appending)	2 blocks touched	3 blocks touched	+50%

Real-world impact (latency estimates):

Assume:
- inode cached after open(): 0 latency for inode
- SSD random read: 100μs
- HDD random read: 10ms
- Memory access: 100ns

Reading from direct block (cached):
- Memory: 100ns (get block pointer)
- Disk: 100μs (SSD) or 10ms (HDD)
- Total: ~100μs (SSD) or ~10ms (HDD)

Reading from single indirect (cold):
- Memory: 100ns (get indirect pointer)
- Disk: 100μs or 10ms (read indirect)
- Memory: 100ns (get data pointer from indirect)
- Disk: 100μs or 10ms (read data)
- Total: ~200μs (SSD) or ~20ms (HDD)

Reading from single indirect (indirect cached):
- Same as direct block: ~100μs or ~10ms

Prefetching Indirect Blocks

Summary: Single Indirect Blocks

We've thoroughly examined single indirect blocks—the first level of indirection in Unix file addressing. Let's consolidate:

Key Takeaways

•Single indirect adds ~4MB capacity — With 4KB blocks and 4-byte pointers, one indirect block holds 1024 pointers to 1024 data blocks = 4MB additional capacity.
•Extra I/O is amortized — The indirect block is read once and cached. Subsequent accesses use the cached version, eliminating the overhead.
•Sparse files still work — Zero entries in the indirect block represent holes, just like zero direct pointers in the inode.
•Space overhead is minimal — One 4KB block for addressing up to 4MB of data = 0.1% overhead at max utilization.
•Allocation is on-demand — The indirect block is only created when the file first grows past 48KB.
•Performance depends on cache state — Cold access doubles I/O; warm access is identical to direct blocks.

What's next:

Page Complete

4 / 5