Loading content...
In the previous page, we introduced index blocks as the central metadata structures that enable efficient file access. But index blocks are really just arrays of pointers—addresses that tell the file system where to find each piece of a file's data.
The simplest and most common type of pointer in an index block is the direct pointer. Direct pointers are the workhorses of file system performance, providing the fastest possible path from metadata to data.
In this page, we'll dissect direct pointers at the bit level, understand their addressing mechanics, calculate their capacity limits, and see why real-world file systems optimize extensively for the common case where files are small enough to be fully addressed by direct pointers alone.
By the end of this page, you will understand the structure and purpose of direct block pointers, calculate maximum file sizes supported by direct pointers, analyze the performance characteristics that make direct pointers efficient, and recognize why file systems allocate the majority of index block space to direct pointers.
A direct pointer is a block address stored in the index block that points directly to a data block—no intermediate levels, no indirection. When you follow a direct pointer, you immediately arrive at actual file data.
The Term "Direct":
The word "direct" is significant. It contrasts with indirect pointers (which we'll explore later), where following the pointer takes you to another index block rather than data. Direct pointers are the zero-indirection case:
Index Block → Data Block (Direct: 1 hop)
Index Block → Index Block → Data Block (Single Indirect: 2 hops)
Index Block → Index Block → Index Block → Data Block (Double Indirect: 3 hops)
Why Direct Pointers Matter:
Every additional level of indirection requires an extra disk read. For a file system serving millions of I/O operations per second, that extra read is enormously expensive. Direct pointers minimize this cost for the most common access patterns.
| Pointer Type | Disk Reads for Data | Use Case |
|---|---|---|
| Direct Pointer | 1 (just the data block) | First 12 blocks of file (typical) |
| Single Indirect | 2 (indirect block + data block) | Medium files |
| Double Indirect | 3 (two indirect blocks + data) | Large files |
| Triple Indirect | 4 (three indirect blocks + data) | Massive files |
At its core, a direct pointer is simply a binary number representing a block address. However, the specific size and format of this number has profound implications for the file system's capabilities.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
/* * Direct Pointer: Anatomy and Addressing * * A direct pointer is a disk block address that points * immediately to file data (no intermediate index blocks). */ #include <stdint.h> /* Pointer size determines addressable space */typedef uint64_t block_ptr_t; /* 64-bit for modern systems */typedef uint32_t block_ptr32_t; /* 32-bit for legacy systems */ #define BLOCK_SIZE 4096 /* 4KB blocks */#define NULL_BLOCK_PTR 0 /* Indicates unallocated/hole */ /* * Addressing Calculation: * * disk_offset = block_pointer × block_size * * Example: Pointer value 1000 with 4KB blocks * disk_offset = 1000 × 4096 = 4,096,000 bytes * This is where the data block starts on disk. */uint64_t block_ptr_to_disk_offset(block_ptr_t ptr) { return (uint64_t)ptr * BLOCK_SIZE;} /* * Memory Layout of Direct Pointers in an Inode * * Traditional Unix: 12 direct pointers, each 4 or 8 bytes * * Byte Offset Contents * ----------- -------- * [0-7] direct_block[0] → File bytes 0-4095 * [8-15] direct_block[1] → File bytes 4096-8191 * [16-23] direct_block[2] → File bytes 8192-12287 * ... * [88-95] direct_block[11] → File bytes 45056-49151 * [96-103] indirect_block → (for larger files) * [104-111] double_indirect → (for even larger files) * [112-119] triple_indirect → (for massive files) */ #define NUM_DIRECT_POINTERS 12 struct inode_pointers { block_ptr_t direct[NUM_DIRECT_POINTERS]; /* 12 × 8 = 96 bytes */ block_ptr_t indirect; /* 8 bytes */ block_ptr_t double_indirect; /* 8 bytes */ block_ptr_t triple_indirect; /* 8 bytes */}; /* Total: 120 bytes for block pointers */ /* * Maximum file size addressable by direct pointers only: * * 12 pointers × 4096 bytes/block = 49,152 bytes = 48 KB * * This covers the vast majority of files on typical systems! */#define MAX_DIRECT_FILE_SIZE (NUM_DIRECT_POINTERS * BLOCK_SIZE)One of the most important skills in file system design is understanding the mathematical relationships between pointer sizes, block sizes, and addressable capacity. Let's work through the calculations that determine what direct pointers can and cannot address.
Addressable Disk Space = (2^pointer_bits) × block_size
Direct Pointer File Capacity = num_direct_pointers × block_size
| Pointer Size | Block Size | Addressable Blocks | Max Disk Size | Direct Pointers (12) | Max Direct File |
|---|---|---|---|---|---|
| 32-bit | 1 KB | 4.29 billion | 4 TB | 12 × 1KB | 12 KB |
| 32-bit | 4 KB | 4.29 billion | 16 TB | 12 × 4KB | 48 KB |
| 32-bit | 8 KB | 4.29 billion | 32 TB | 12 × 8KB | 96 KB |
| 64-bit | 4 KB | 18.4 quintillion | 64 ZB | 12 × 4KB | 48 KB |
| 64-bit | 8 KB | 18.4 quintillion | 128 ZB | 12 × 8KB | 96 KB |
Critical Observation:
Notice that the maximum file size addressable by direct pointers alone is quite modest—typically 48KB to 96KB depending on configuration. This seems limiting until you consider file size statistics.
Real-World File Size Distribution:
Studies of production file systems consistently show that the vast majority of files are small:
This means direct pointers alone can efficiently serve the overwhelming majority of files. The indirect pointer schemes (which add overhead) are only needed for the rare large files.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
/* * File System Capacity Calculations * * Understanding the math behind indexed allocation capacity */ #include <stdio.h>#include <stdint.h> /* Configuration parameters */#define BLOCK_SIZE_BYTES 4096#define POINTER_SIZE_BYTES 8#define NUM_DIRECT_PTRS 12 /* Derived values */#define PTRS_PER_INDIRECT_BLOCK (BLOCK_SIZE_BYTES / POINTER_SIZE_BYTES) void calculate_file_system_limits() { /* Direct pointer capacity */ uint64_t direct_capacity = NUM_DIRECT_PTRS * BLOCK_SIZE_BYTES; printf("Direct pointers only: %lu bytes (%.1f KB)", direct_capacity, (double)direct_capacity / 1024); /* Single indirect adds another level */ uint64_t single_indirect = PTRS_PER_INDIRECT_BLOCK * BLOCK_SIZE_BYTES; printf("Single indirect block: %lu bytes (%.1f MB)", single_indirect, (double)single_indirect / (1024 * 1024)); /* Double indirect: pointer to block of pointers to data */ uint64_t double_indirect = (uint64_t)PTRS_PER_INDIRECT_BLOCK * PTRS_PER_INDIRECT_BLOCK * BLOCK_SIZE_BYTES; printf("Double indirect block: %lu bytes (%.1f GB)", double_indirect, (double)double_indirect / (1024 * 1024 * 1024)); /* Triple indirect: one more level of indirection */ uint64_t triple_indirect = (uint64_t)PTRS_PER_INDIRECT_BLOCK * PTRS_PER_INDIRECT_BLOCK * PTRS_PER_INDIRECT_BLOCK * BLOCK_SIZE_BYTES; printf("Triple indirect block: %lu bytes (%.1f TB)", triple_indirect, (double)triple_indirect / (1024.0 * 1024 * 1024 * 1024)); /* Total maximum file size */ uint64_t max_file_size = direct_capacity + single_indirect + double_indirect + triple_indirect; printf("Maximum file size: ~%.1f TB", (double)max_file_size / (1024.0 * 1024 * 1024 * 1024));} /* * Output with typical 4KB blocks, 8-byte pointers: * * Direct pointers only: 49152 bytes (48.0 KB) * Single indirect block: 2097152 bytes (2.0 MB) * Double indirect block: 1073741824 bytes (1.0 GB) * Triple indirect block: 549755813888 bytes (512.0 GB) * Maximum file size: ~514.0 TB */Understanding exactly how the file system translates a file offset into a disk location using direct pointers is essential for grasping both the elegance and the performance of indexed allocation.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
/* * Direct Pointer Access: Step-by-Step Implementation * * This function reads data from a file using only direct pointers. * It demonstrates the efficient O(1) mapping from file offset to disk location. */ #define BLOCK_SIZE 4096#define NUM_DIRECT_POINTERS 12 struct file_context { struct inode *inode; /* Cached inode */ uint64_t position; /* Current file position */ int flags; /* Open flags */}; /* * Read bytes from a file at the current position. * * This version handles only direct pointers (first 48KB of file). * Real implementations extend this to handle indirect pointers. */ssize_t read_via_direct_pointers(struct file_context *ctx, void *buffer, size_t count) { struct inode *inode = ctx->inode; char *buf = (char *)buffer; size_t total_read = 0; /* Don't read past end of file */ if (ctx->position >= inode->size) { return 0; /* EOF */ } if (ctx->position + count > inode->size) { count = inode->size - ctx->position; } while (total_read < count) { /* * Step 1: Calculate which logical block contains this position * * With 4KB blocks: * Position 0-4095 → Block 0 * Position 4096-8191 → Block 1 * Position 5000 → Block 1 (5000 / 4096 = 1) */ uint64_t logical_block = ctx->position / BLOCK_SIZE; /* Check if we've exceeded direct pointer range */ if (logical_block >= NUM_DIRECT_POINTERS) { /* Would need indirect pointer handling here */ break; } /* * Step 2: Look up the physical block in the inode * * This is O(1) - just an array access! * The direct pointer tells us exactly where on disk. */ block_ptr_t physical_block = inode->direct[logical_block]; if (physical_block == 0) { /* Hole in sparse file - return zeros */ size_t hole_size = MIN(BLOCK_SIZE, count - total_read); memset(buf + total_read, 0, hole_size); total_read += hole_size; ctx->position += hole_size; continue; } /* * Step 3: Calculate offset within the block * * Position 5000 in block 1: * offset_in_block = 5000 % 4096 = 904 */ uint64_t offset_in_block = ctx->position % BLOCK_SIZE; /* * Step 4: Calculate disk location * * disk_offset = physical_block × block_size + offset_in_block * = 2847 × 4096 + 904 * = 11,662,216 */ uint64_t disk_offset = (physical_block * BLOCK_SIZE) + offset_in_block; /* * Step 5: Calculate how many bytes to read from this block * * We might not read the entire block if: * - We started in the middle (offset_in_block > 0) * - We only need a few more bytes to complete the request */ size_t bytes_available_in_block = BLOCK_SIZE - offset_in_block; size_t bytes_remaining = count - total_read; size_t bytes_to_read = MIN(bytes_available_in_block, bytes_remaining); /* * Step 6: Perform the actual disk read * * Most systems use block caching here - if the block is in * the buffer cache, no disk I/O is needed at all! */ disk_read(disk_offset, buf + total_read, bytes_to_read); total_read += bytes_to_read; ctx->position += bytes_to_read; } /* Update access time (if not mounted with 'noatime') */ inode->atime = current_time(); return total_read;}Direct pointers are optimized for the common case of small, frequently-accessed files. Let's analyze their performance characteristics in depth.
| Operation | Direct Pointer | Single Indirect | Double Indirect |
|---|---|---|---|
| Read first block | 1 disk read | 2 disk reads | 3 disk reads |
| Read random block | 1 disk read | 2 disk reads | 3 disk reads |
| Sequential read (cached inode) | 1 read/block | 1 read/block* | 1 read/block* |
| Write to existing block | 1 disk write | 1-2 disk writes | 1-3 disk writes |
| Append block to file | 1 write + inode update | 2-3 writes | 3-4 writes |
The asterisks indicate that indirect block caching reduces subsequent accesses. Once an indirect block is cached, reading through it costs only one disk read per data block—the same as direct pointers. However, the first access pays the full indirection cost.
Why 12 Direct Pointers?
The traditional Unix choice of 12 direct pointers is not arbitrary. With 4KB blocks:
Some modern file systems have increased this number, recognizing that SSDs and larger RAM caches make inode bloat less of a concern. For example:
When a file grows, the file system allocates new blocks and updates the direct pointers. This process is straightforward for small files but requires transitioning to indirect pointers once direct pointer capacity is exceeded.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
/* * Managing Direct Pointers During File Growth * * This demonstrates how direct pointers are populated as a file * grows from 0 bytes to beyond the direct pointer limit. */ #define BLOCK_SIZE 4096#define NUM_DIRECT_POINTERS 12 struct inode { uint64_t size; block_ptr_t direct[NUM_DIRECT_POINTERS]; block_ptr_t indirect; block_ptr_t double_indirect; block_ptr_t triple_indirect; /* ... other metadata ... */}; /* * Allocate blocks to extend a file. * * Returns the physical block allocated, or 0 on failure. * This function handles only the direct pointer range. */int extend_file_direct(struct inode *inode, size_t new_size) { /* Calculate current and needed logical blocks */ uint64_t current_blocks = (inode->size + BLOCK_SIZE - 1) / BLOCK_SIZE; uint64_t needed_blocks = (new_size + BLOCK_SIZE - 1) / BLOCK_SIZE; if (needed_blocks <= current_blocks) { /* File isn't actually growing in block count */ inode->size = new_size; return 0; } /* Allocate each new block needed */ for (uint64_t block = current_blocks; block < needed_blocks; block++) { if (block >= NUM_DIRECT_POINTERS) { /* * Exceeded direct pointer range! * Would need to allocate indirect block here. * * Real implementation would: * 1. Allocate a block for the indirect pointer * 2. Set inode->indirect to point to it * 3. Continue allocating via indirect block */ return -EFBIG; /* "File too big" for this simple impl */ } /* Allocate a new data block from free space */ block_ptr_t new_block = allocate_block(); if (new_block == 0) { return -ENOSPC; /* No space left */ } /* Update direct pointer */ inode->direct[block] = new_block; /* * Visualization of inode after growing from 0 to 20KB: * * direct[0] = Block 500 ← 0-4095 bytes * direct[1] = Block 501 ← 4096-8191 bytes * direct[2] = Block 502 ← 8192-12287 bytes * direct[3] = Block 503 ← 12288-16383 bytes * direct[4] = Block 504 ← 16384-20479 bytes * direct[5] = 0 ← Not yet allocated * ... * direct[11] = 0 ← Not yet allocated * indirect = 0 ← Not needed yet */ } inode->size = new_size; return 0;} /* * Example: Writing to a new file * * File starts with size = 0, all direct pointers = 0. * * write(fd, "Hello World", 11); * → allocate block (say, block 100) * → set direct[0] = 100 * → write "Hello World" to block 100, offset 0 * → set size = 11 * * write(fd, data, 5000); /* Now at offset 11, writing 5000 bytes */ * → Direct[0] block 100 has room for 4096 - 11 = 4085 more bytes * → Write 4085 bytes to block 100 * → Need 5000 - 4085 = 915 more bytes * → Allocate block (say, block 101) * → Set direct[1] = 101 * → Write remaining 915 bytes to block 101 * → Set size = 5011 */Direct pointers enable an elegant feature: sparse files. A sparse file has "holes"—regions that logically contain zeros but consume no physical disk space.
How Holes Work:
When a direct pointer contains NULL (typically 0), it indicates that the corresponding logical block has never been written. Reading from this block returns zeros, but no disk space was ever allocated.
This is tremendously useful for:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
/* * Sparse Files: Physical vs Logical Size * * A sparse file's logical size (what stat reports) can be much * larger than its physical size (actual disk blocks used). */ /* * Creating a sparse file: * * 1. Create file (size = 0) * 2. Seek to offset 1,000,000 * 3. Write "X" (1 byte) * 4. File is now 1,000,001 bytes logical but uses only 1 block! */void create_sparse_file(const char *path) { int fd = open(path, O_WRONLY | O_CREAT | O_TRUNC, 0644); /* Seek to 1MB offset without writing intervening bytes */ lseek(fd, 1000000, SEEK_SET); /* Write a single byte */ write(fd, "X", 1); close(fd); /* * Resulting inode state (with 4KB blocks): * * Logical file size: 1,000,001 bytes * Physical blocks used: 1 (block for offset 997,376 - 1,001,471) * * Block 244 is the only block containing data (1MB / 4096 = 244) * * direct[0] through direct[11] = 0 (holes) * indirect points to block with: * pointer[0] through pointer[243] = 0 (holes) * pointer[244] = some_block_number (contains "X" at offset 928) * pointer[245+] = 0 (beyond EOF) */} /* * Reading from a sparse file hole: */ssize_t read_with_hole_support(struct inode *inode, uint64_t offset, void *buf, size_t count) { uint64_t logical_block = offset / BLOCK_SIZE; uint64_t block_offset = offset % BLOCK_SIZE; block_ptr_t physical = inode->direct[logical_block]; if (physical == 0) { /* * This is a hole - return zeros. * No disk I/O needed! */ size_t hole_bytes = MIN(count, BLOCK_SIZE - block_offset); memset(buf, 0, hole_bytes); return hole_bytes; } /* Normal read from physical block */ return disk_read(physical * BLOCK_SIZE + block_offset, buf, count);} /* * Detecting sparse files: * * $ ls -ls sparse_file * 4 -rw-r--r-- 1 user user 1000001 Jan 15 10:00 sparse_file * ^ ^^^^^^^ * | Logical size: 1MB * Physical blocks: 4KB (1 block) * * st.st_blocks reports 512-byte sectors (8 sectors = 4KB block) */Copying a sparse file with a naive copy program will materialize all the holes, potentially consuming massive disk space. Tools like 'cp --sparse=always' or 'rsync' handle sparse files correctly by detecting and preserving holes.
Direct pointers are the foundation of efficient file system I/O. By providing immediate, single-hop access to data blocks, they minimize latency for the vast majority of file accesses.
What's Next:
Direct pointers are fast but limited—they can only address tens of kilobytes of file data. When files grow larger, we need a mechanism that scales without bound while still providing reasonable random access performance. In the next page, we'll explore how indexed allocation achieves this through its powerful random access support mechanisms, examining the full addressing hierarchy including indirect and multi-level pointers.
You now have a thorough understanding of direct pointers—their structure, addressing mechanics, performance characteristics, and role in file system efficiency. This knowledge is essential for understanding how file systems balance performance and capacity in the multi-level indexing schemes we'll explore next.