Loading learning content...
Direct block pointers are elegant and fast, but they have an inherent limitation: a fixed number of pointers in a fixed-size inode can only address a limited amount of data. With 12 direct pointers and 4KB blocks, we can address only 48KB—adequate for most configuration files and source code, but hopelessly insufficient for documents, images, databases, or media files.
The Unix designers needed a solution that:
Their solution was indirect blocks—a technique that repurposes data blocks to hold additional pointers, essentially creating a tree of block addresses. This page focuses on single indirect blocks, the first and most commonly encountered level of indirection.
By the end of this page, you will understand: how the single indirect pointer creates an extra level of addressing; the mathematics of calculating indirect block capacity; the I/O cost of indirection—one extra read per access; how filesystems cache indirect blocks to minimize overhead; and when files transition from direct to indirect addressing.
Indirection is a fundamental computer science technique: instead of storing a value directly, you store a pointer to where the value is. In the context of file block addressing:
The 13th pointer in the inode's block array (index 12) is the single indirect pointer. Instead of pointing to file data, it points to a block filled entirely with direct pointers. Each of those pointers then points to an actual data block.
The key insight: A single indirect block converts one inode pointer slot into (block_size / pointer_size) pointers. With 4KB blocks and 4-byte pointers, that's 1024 additional block addresses from a single inode entry.
Pointers per indirect block = Block Size / Pointer Size
= 4096 bytes / 4 bytes
= 1024 pointers
Each of these 1024 pointers addresses one data block, so a single indirect block provides access to:
Single Indirect Capacity = 1024 × 4KB = 4MB
Combined with 12 direct blocks (48KB), a file using direct + single indirect addressing can be up to 48KB + 4MB = ~4.05MB.
Let's formalize the addressing calculations. Define:
Address ranges with various configurations:
| Block Size (B) | Pointer Size (P) | Pointers/Block (k) | Direct Capacity | Single Indirect Adds | Total with Single Indirect |
|---|---|---|---|---|---|
| 1 KB | 4 bytes | 256 | 12 KB | 256 KB | 268 KB |
| 2 KB | 4 bytes | 512 | 24 KB | 1 MB | ~1.02 MB |
| 4 KB | 4 bytes | 1024 | 48 KB | 4 MB | ~4.05 MB |
| 4 KB | 8 bytes (64-bit) | 512 | 48 KB | 2 MB | ~2.05 MB |
| 8 KB | 4 bytes | 2048 | 96 KB | 16 MB | ~16.1 MB |
| 64 KB | 8 bytes | 8192 | 768 KB | 512 MB | ~513 MB |
Logical block number mapping:
When accessing byte offset O in a file:
Logical block number L = O / B
If L < 12: → Use direct pointer block[L]
If 12 ≤ L < 12 + k: → Use single indirect
Offset within indirect: L - 12
Read indirect block, get pointer[L - 12]
Example with 4KB blocks:
Accessing byte 200,000:
Logical block = 200,000 / 4096 = 48 (with remainder 3152)
48 ≥ 12, so we use single indirect
Offset within indirect = 48 - 12 = 36
Steps:
1. Read inode->block[12] → get indirect block number
2. Read indirect block, extract pointer[36]
3. Read data from that pointer's block at offset 3152
The transition from direct to single indirect addressing happens exactly at byte 49152 (with 4KB blocks). Files that grow past 48KB incur the cost of indirect block allocation and extra I/O for subsequent accesses to the extended portion.
Indirection adds I/O cost. Let's carefully analyze what happens when accessing data through single indirect blocks:
Reading byte 100,000 from a 1MB file (first access, no caching):
Step 1: Open file
→ Read inode from disk (1 I/O)
→ Inode now cached in memory
Step 2: Calculate location
→ Byte 100,000 / 4096 = block 24
→ 24 ≥ 12, so single indirect
→ Offset in indirect: 24 - 12 = 12
Step 3: Read indirect block
→ inode->block[12] = 5000
→ Read block 5000 from disk (1 I/O)
→ Indirect block now cached
Step 4: Get data block address
→ indirect_block[12] = 7500
Step 5: Read data block
→ Read block 7500 at offset 100000 % 4096 = 1312 (1 I/O)
→ Return data
Total I/O: 3 reads (inode + indirect + data)
Compare to direct block: 2 reads (inode + data)
Overhead: +1 read (50% more I/O)
An indirect block is simply a data block repurposed to hold pointers. It has no special header or metadata—just a contiguous array of block numbers:
// Conceptual structure of a single indirect block
// (not an explicit struct in the filesystem—it's just an array of integers)
#define BLOCK_SIZE 4096
#define POINTER_SIZE 4
#define PTRS_PER_BLOCK (BLOCK_SIZE / POINTER_SIZE) // 1024
// An indirect block is just:
u32 indirect_block[PTRS_PER_BLOCK];
// where each entry is a physical block number, or 0 for sparse holes
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
/** * Read data from a file using single indirect block addressing * This demonstrates the core algorithm for indirect block traversal */ssize_t read_single_indirect(struct inode *inode, u32 logical_block, void *buffer, size_t bytes) { const u32 PTRS_PER_BLOCK = inode->sb->block_size / sizeof(u32); // Validate logical block is in single indirect range if (logical_block < DIRECT_BLOCKS || logical_block >= DIRECT_BLOCKS + PTRS_PER_BLOCK) { return -EINVAL; // Not in single indirect range } // Get the single indirect block pointer from inode blkcnt_t indirect_block_num = inode->block[12]; if (indirect_block_num == 0) { // Sparse file: no indirect block allocated // The entire single-indirect region is a hole memset(buffer, 0, bytes); return bytes; } // Read the indirect block (may be cached) u32 *indirect_block = read_block_cached(indirect_block_num); if (!indirect_block) { return -EIO; } // Calculate offset within indirect block u32 index = logical_block - DIRECT_BLOCKS; // Get the actual data block number blkcnt_t data_block_num = indirect_block[index]; if (data_block_num == 0) { // Sparse: this specific block is a hole memset(buffer, 0, bytes); return bytes; } // Read the data block return read_data_block(data_block_num, buffer, bytes);}Critical implementation details:
Block caching is essential — Without caching, every access through single indirect would require 2 disk reads. The buffer cache stores the indirect block after first access.
Sparse file support — A zero entry in the indirect block means that logical block is a hole. No I/O needed—return zeros.
Atomic consistency — The indirect block is a single unit that can be written atomically. This simplifies crash recovery.
Allocation on demand — The indirect block itself is allocated only when first needed (when file grows past 48KB).
You might wonder: why use indirection at all? Why not just make the inode bigger with more direct pointers? The answer: fixed inode size enables O(1) inode access. If inodes varied in size based on file size, finding inode N would require O(n) scanning. The indirection scheme keeps inodes fixed while supporting arbitrarily large files.
Indirect blocks consume disk space themselves. This affects how we account for file storage:
Example: A 1MB file
Logical view: 1MB = 1,048,576 bytes
Actual block allocation:
- Data blocks: 256 × 4KB = 1,048,576 bytes
- Single indirect block: 1 × 4KB = 4,096 bytes
- Total disk usage: 1,052,672 bytes
Overhead: 4KB / 1MB = 0.39%
| File Size | Data Blocks | Indirect Blocks | Total Blocks | Overhead |
|---|---|---|---|---|
| 48 KB (no indirect) | 12 | 0 | 12 | 0% |
| 52 KB (just past direct) | 13 | 1 | 14 | 7.7% |
| 100 KB | 25 | 1 | 26 | 4.0% |
| 500 KB | 125 | 1 | 126 | 0.8% |
| 1 MB | 256 | 1 | 257 | 0.39% |
| 4 MB | 1024 | 1 | 1025 | 0.10% |
| 4.05 MB (max single ind.) | 1036 | 1 | 1037 | 0.10% |
Observations:
Overhead is proportionally small — For files fully utilizing single indirect (4MB), the overhead is under 0.1%.
Worst overhead at threshold — A 52KB file (just past direct blocks) wastes the most proportionally—the entire 4KB indirect block holds just one pointer.
One indirect per level — Single indirect always uses exactly one block, regardless of file size (up to its limit).
The inode's block count field:
$ stat 1mb_file.bin
Size: 1048576 Blocks: 2056 IO Block: 4096 regular file
^^^^
2056 × 512 = 1,052,672 bytes
= 1MB data + overhead
The Blocks count includes indirect blocks, so it accurately reflects actual disk usage.
Files just past 48KB pay a disproportionate overhead. A 52KB file uses 14 blocks total (13 data + 1 indirect), where a 48KB file uses only 12. This 17% jump in storage for 8% more data is worth considering for cache sizing or record formats where you control file sizes.
The kernel's buffer cache is crucial for maintaining performance with indirect blocks. Let's examine how caching transforms the I/O pattern:
Without Caching (Theoretical Worst Case):
Access pattern: Random reads in 1MB file
Each read:
1. Read inode (maybe cached)
2. Read indirect block
3. Read data block
100 random reads = 300 I/O operations
Actual data retrieved: 100 × 4KB
Effective bandwidth: 33% of theoretical max
With Buffer Cache (Actual Behavior):
Access pattern: Random reads in 1MB file
First read:
- Read inode (1 I/O, then cached)
- Read indirect block (1 I/O, then cached)
- Read data block (1 I/O)
Subsequent 99 reads:
- Inode cached: 0 I/O
- Indirect block cached: 0 I/O
- Read data block (1 I/O each)
100 random reads = 101 I/O operations
With read-ahead: potentially much less
Cache hierarchy for file access:
┌─────────────────────────────────────────────────────────────┐
│ Application Buffer (user space) │
└─────────────────────────────────────────────────────────────┘
↑
┌─────────────────────────────────────────────────────────────┐
│ Page Cache (data pages, very large, persists across calls) │
│ - Holds file data blocks │
│ - Primary cache for file I/O │
└─────────────────────────────────────────────────────────────┘
↑
┌─────────────────────────────────────────────────────────────┐
│ Buffer Cache (metadata blocks, integrated with page cache) │
│ - Holds indirect blocks │
│ - Holds inode table blocks │
│ - Holds directory blocks │
└─────────────────────────────────────────────────────────────┘
↑
┌─────────────────────────────────────────────────────────────┐
│ Inode Cache (parsed inode structures) │
│ - Holds struct inode for open files │
│ - Avoids re-parsing inode from disk blocks │
└─────────────────────────────────────────────────────────────┘
↑
┌─────────────────────────────────────────────────────────────┐
│ Disk (actual storage) │
└─────────────────────────────────────────────────────────────┘
The combined effect: after a file is opened and accessed once, subsequent random access anywhere in the file involves only data block reads—the indirect blocks remain cached.
Under memory pressure, the kernel may evict cached blocks. Indirect blocks are good eviction candidates when files are closed and unlikely to be accessed soon. If evicted, the next access will incur the extra I/O. Long-running processes with stable file access patterns benefit most from cached indirect blocks.
When a file grows past the direct block limit, the filesystem must allocate an indirect block. Let's trace this process:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
/** * Extend a file that needs its first indirect block * Simplified flow showing key operations */int allocate_single_indirect(struct inode *inode, u32 target_logical_block) { if (target_logical_block < DIRECT_BLOCKS) { return -EINVAL; // Should use direct blocks } // Check if indirect block already exists if (inode->block[12] == 0) { // Need to allocate the indirect block itself blkcnt_t new_indirect = allocate_block(inode->sb); if (new_indirect == 0) { return -ENOSPC; // No space for indirect block } // Initialize indirect block with zeros void *block_data = get_block_buffer(new_indirect); memset(block_data, 0, inode->sb->block_size); mark_buffer_dirty(block_data); // Store indirect block number in inode inode->block[12] = new_indirect; mark_inode_dirty(inode); } // Now allocate the actual data block blkcnt_t new_data_block = allocate_block(inode->sb); if (new_data_block == 0) { // Complex: may need to free the indirect block if // it was just allocated and has no entries return -ENOSPC; } // Read/get indirect block u32 *indirect = get_block_buffer(inode->block[12]); // Store data block pointer in indirect block u32 index = target_logical_block - DIRECT_BLOCKS; indirect[index] = new_data_block; mark_buffer_dirty(indirect); // Update inode metadata inode->blocks += inode->sb->block_size / 512; if ((target_logical_block + 1) * inode->sb->block_size > inode->size) { inode->size = (target_logical_block + 1) * inode->sb->block_size; } mark_inode_dirty(inode); return 0;}Key allocation principles:
Lazy allocation — The indirect block is created only when first needed, not preemptively.
Two allocations required — Growing past the threshold requires allocating both the indirect block AND the data block.
Multiple dirty blocks — A single write past the boundary dirties: the inode, the new indirect block, and the new data block.
Transaction semantics — Journaling filesystems wrap these operations in a transaction for crash consistency.
Corner case: truncation
When a file is truncated below 48KB, the single indirect block becomes unused:
// Pseudocode for truncation
if (new_size <= DIRECT_BLOCKS * block_size) {
if (inode->block[12] != 0) {
// Free all blocks pointed to by indirect block
u32 *indirect = read_block(inode->block[12]);
for (int i = 0; i < PTRS_PER_BLOCK; i++) {
if (indirect[i] != 0) {
free_block(indirect[i]);
}
}
// Free indirect block itself
free_block(inode->block[12]);
inode->block[12] = 0;
}
}
Let's quantify the performance difference between direct and single indirect access:
| Metric | Direct Block | Single Indirect | Difference |
|---|---|---|---|
| Disk reads for first byte | 2 (inode + data) | 3 (inode + indirect + data) | +50% |
| Memory accesses per byte (cached) | 2 | 3 | +50% |
| Blocks traversed | 0 | 1 | +1 |
| Pointer dereferences | 1 | 2 | +100% |
| Write amplification (appending) | 2 blocks touched | 3 blocks touched | +50% |
Real-world impact (latency estimates):
Assume:
- inode cached after open(): 0 latency for inode
- SSD random read: 100μs
- HDD random read: 10ms
- Memory access: 100ns
Reading from direct block (cached):
- Memory: 100ns (get block pointer)
- Disk: 100μs (SSD) or 10ms (HDD)
- Total: ~100μs (SSD) or ~10ms (HDD)
Reading from single indirect (cold):
- Memory: 100ns (get indirect pointer)
- Disk: 100μs or 10ms (read indirect)
- Memory: 100ns (get data pointer from indirect)
- Disk: 100μs or 10ms (read data)
- Total: ~200μs (SSD) or ~20ms (HDD)
Reading from single indirect (indirect cached):
- Same as direct block: ~100μs or ~10ms
Key insight: On HDDs, the difference is dramatic (2× latency). On SSDs, it's still noticeable (2× latency). But once the indirect block is cached—which happens quickly for active files—there's no difference.
Smart filesystem implementations prefetch the indirect block when opening files larger than 48KB. By the time you actually read past the direct blocks, the indirect block is already cached. This hides the latency for sequential read patterns.
We've thoroughly examined single indirect blocks—the first level of indirection in Unix file addressing. Let's consolidate:
What's next:
Single indirect blocks extend capacity to ~4MB—enough for many files, but far from sufficient for databases, media files, or disk images. The next page explores double and triple indirect blocks, which apply the same indirection concept recursively to support files up to 4TB (with 32-bit block pointers) or effectively unlimited sizes (with 64-bit pointers).
You now understand single indirect blocks—how one inode pointer multiplies into 1024 block addresses through indirection. This concept forms the foundation for the deeper indirection levels we'll explore next, where recursive application of this pattern enables support for multi-terabyte files.