Loading learning content...
Every modern computing system faces a fundamental architectural challenge: the processor operates at billions of cycles per second, while storage devices—even the fastest NVMe SSDs—deliver data millions of times slower. This speed disparity, often spanning five to six orders of magnitude, would cripple system performance entirely if not for one of computing's most elegant solutions: caching.
I/O caching represents far more than a simple performance optimization. It is the architectural foundation that makes interactive computing possible, transforming what would be unbearable waits into imperceptible delays. Without caching, even the fastest modern computers would feel slower than machines from decades past, spending the vast majority of their time idle while waiting for data from storage devices.
By the end of this page, you will understand the fundamental principles of I/O caching, why it is architecturally necessary, how operating systems implement caching layers, and the profound impact caching has on every aspect of system performance. You'll develop the mental models that distinguish engineers who truly understand system behavior from those who merely use it.
To truly appreciate I/O caching, we must first understand why it exists—not as an optional enhancement, but as an architectural necessity born from the fundamental physics of how different system components operate.
Modern computer systems exhibit a dramatic hierarchy of component speeds, each level separated by orders of magnitude:
| Component | Typical Latency | Relative Speed | Capacity |
|---|---|---|---|
| CPU Registers | ~0.3 ns | 1x (baseline) | ~1 KB |
| L1 Cache | ~1 ns | 3-4x slower | 32-64 KB |
| L2 Cache | ~3-10 ns | 10-30x slower | 256 KB - 1 MB |
| L3 Cache | ~10-20 ns | 30-70x slower | 8-128 MB |
| Main Memory (DRAM) | ~50-100 ns | 150-300x slower | 8-512 GB |
| NVMe SSD | ~10-50 µs | 30,000-150,000x slower | 256 GB - 8 TB |
| SATA SSD | ~50-150 µs | 150,000-500,000x slower | 256 GB - 8 TB |
| Hard Disk Drive | ~5-10 ms | 15,000,000-30,000,000x slower | 1-20 TB |
| Network Storage | ~1-100 ms | 3,000,000-300,000,000x slower | Unlimited |
Consider what happens when a process needs to read a file from disk without caching:
During steps 3-5, the CPU executes zero useful instructions for that process. At 3 GHz, a 5ms disk seek represents 15 million wasted CPU cycles—enough to execute the entirety of many programs multiple times over.
This isn't merely inefficient; it fundamentally changes the nature of computer system design. Without mitigation, I/O-bound operations would dominate all system behavior, reducing expensive processors to expensive heat generators that spend 99%+ of their time waiting.
Unlike CPU and memory speeds, which have improved exponentially over decades following Moore's Law, storage latency has improved much more slowly. Disk seek times have improved perhaps 10x over 30 years, while CPU speeds improved 10,000x. This 'latency wall' makes caching increasingly important, not less, as systems evolve.
A cache is a smaller, faster storage layer that temporarily holds copies of data from a larger, slower storage layer. The fundamental insight behind caching—applicable from CPU caches to CDN networks—rests on two empirical observations about how programs actually behave:
Temporal Locality: Data accessed recently is likely to be accessed again soon. When you read a configuration file, you're likely to read it again. When you access a database record, you'll probably access it again within the same session.
Spatial Locality: Data near recently accessed data is also likely to be accessed soon. When you read byte N of a file, you'll probably read bytes N+1, N+2, etc. When you access one field of a structure, you'll likely access adjacent fields.
These aren't theoretical abstractions—they emerge from the fundamental nature of computation:
Every caching system, regardless of level, shares fundamental architectural components:
Cache Memory: Fast storage holding cached data. For I/O caching, this is typically main memory (RAM). The cache memory is orders of magnitude faster than the backing store but smaller in capacity.
Cache Lines/Blocks: The unit of data transfer between cache and backing store. Operating systems typically use pages (4KB) as the block size for I/O caching, though this varies. Block sizes balance overhead (smaller blocks mean more metadata) against waste (larger blocks may cache unused data).
Tags and Metadata: Information identifying what backing store data each cache entry holds, plus state information like modification status and access history. This metadata enables the cache to answer: "Is this data cached?" and "What should be evicted?"
Replacement Policy: Algorithm determining which cached data to evict when space is needed. This is perhaps the most critical design decision, as it determines cache effectiveness under real workloads.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
/* * Conceptual structure of an I/O cache entry * Real implementations vary by OS and filesystem */struct cache_block { /* Identification */ dev_t device; /* Device containing this block */ blkcnt_t block_number; /* Block number on device */ /* Data */ void *data; /* Pointer to cached data (page-aligned) */ size_t size; /* Size of cached data */ /* State flags */ unsigned int valid : 1; /* Data is valid */ unsigned int dirty : 1; /* Data modified, not yet written back */ unsigned int locked : 1; /* Block is locked (I/O in progress) */ unsigned int error : 1; /* I/O error occurred */ /* Replacement policy support */ time_t access_time; /* Last access timestamp (LRU) */ unsigned int access_count; /* Access frequency (LFU) */ struct list_head lru_list; /* Position in LRU list */ /* Concurrency control */ rwlock_t lock; /* Reader-writer lock */ wait_queue_head_t waiters; /* Threads waiting on this block */ /* Hash chain for fast lookup */ struct hlist_node hash_node; /* Position in hash table */}; /* * Cache lookup: O(1) average via hash table */struct cache_block *cache_lookup(dev_t dev, blkcnt_t block) { unsigned int hash = hash_block(dev, block); struct cache_block *entry; rcu_read_lock(); hlist_for_each_entry_rcu(entry, &cache_hash[hash], hash_node) { if (entry->device == dev && entry->block_number == block) { /* Found - update access statistics */ entry->access_time = current_time(); entry->access_count++; rcu_read_unlock(); return entry; } } rcu_read_unlock(); return NULL; /* Cache miss */}The buffer cache (also called the block cache) operates at the block device level, caching raw disk blocks regardless of their contents. This is the oldest and most fundamental form of I/O caching, dating back to early Unix systems.
In traditional Unix systems (and preserved in some modern implementations), the buffer cache served as the single point of caching between filesystems and block devices. Every read from disk first checked the buffer cache; every write went through it. This unified approach had elegance: one caching layer served all filesystem types.
The buffer cache presents a simple abstraction: given a device and block number, return the block data. Behind this simplicity lies sophisticated machinery:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139
/* * Buffer cache: Traditional Unix block caching layer * * This implements the classic buffer cache algorithm with * hash table lookup and LRU replacement */ #define BUFFER_HASH_SIZE 1024#define BUFFER_SIZE 4096 /* Typically matches page size */ struct buffer_head { dev_t b_dev; /* Device identifier */ blkcnt_t b_blocknr; /* Block number */ char *b_data; /* Pointer to data */ /* State management */ unsigned long b_state; /* State flags */ atomic_t b_count; /* Reference count */ /* List management */ struct list_head b_lru; /* LRU list position */ struct hlist_node b_hash; /* Hash chain */ /* I/O completion */ void (*b_end_io)(struct buffer_head *, int); wait_queue_head_t b_wait;}; /* State flags */#define BH_Uptodate 0 /* Contains valid data */#define BH_Dirty 1 /* Data modified */#define BH_Lock 2 /* Locked for I/O */#define BH_Req 3 /* Has been submitted for I/O */#define BH_Mapped 4 /* Has disk mapping */ /* Hash table for O(1) lookup */static struct hlist_head buffer_hash[BUFFER_HASH_SIZE];static DEFINE_SPINLOCK(buffer_hash_lock); /* LRU list for replacement */static LIST_HEAD(lru_list);static DEFINE_SPINLOCK(lru_lock); /* * Get a buffer for the specified block * Returns cached buffer if present, or allocates new one */struct buffer_head *getblk(dev_t dev, blkcnt_t block, int size) { struct buffer_head *bh; unsigned int hash = hash_buffer(dev, block); /* First, try to find in cache */ spin_lock(&buffer_hash_lock); bh = find_buffer(dev, block, hash); if (bh) { /* Cache hit */ atomic_inc(&bh->b_count); spin_unlock(&buffer_hash_lock); /* Move to end of LRU (most recently used) */ spin_lock(&lru_lock); list_move_tail(&bh->b_lru, &lru_list); spin_unlock(&lru_lock); wait_on_buffer(bh); /* Wait if I/O in progress */ return bh; } spin_unlock(&buffer_hash_lock); /* Cache miss: allocate new buffer */ bh = allocate_buffer(dev, block, size); if (!bh) { /* Memory pressure: reclaim from LRU */ bh = reclaim_buffer(); if (!bh) return NULL; /* Out of memory */ /* Reinitialize for new block */ init_buffer(bh, dev, block, size); } /* Insert into hash table */ insert_buffer_hash(bh, hash); return bh;} /* * Read a block, using cache if possible */struct buffer_head *bread(dev_t dev, blkcnt_t block, int size) { struct buffer_head *bh; bh = getblk(dev, block, size); if (!bh) return NULL; /* Check if already has valid data */ if (test_bit(BH_Uptodate, &bh->b_state)) return bh; /* Data already in cache */ /* Need to read from disk */ lock_buffer(bh); if (test_bit(BH_Uptodate, &bh->b_state)) { /* Someone else loaded it while we waited */ unlock_buffer(bh); return bh; } /* Submit I/O request */ bh->b_end_io = end_buffer_read; submit_bh(READ, bh); /* Wait for completion */ wait_on_buffer(bh); if (!test_bit(BH_Uptodate, &bh->b_state)) { brelse(bh); return NULL; /* Read failed */ } return bh;} /* * Release a buffer (decrement reference count) */void brelse(struct buffer_head *bh) { if (!bh) return; if (atomic_dec_and_test(&bh->b_count)) { /* Buffer now unreferenced - eligible for reclaim */ if (test_bit(BH_Dirty, &bh->b_state)) { /* Schedule deferred writeback */ mark_buffer_dirty(bh); } }}The buffer cache possesses several important characteristics that shape system behavior:
Block-Oriented: The buffer cache operates on fixed-size blocks aligned to device block boundaries. This matches how block devices actually work but can be inefficient for small reads or non-aligned access patterns.
Device-Independent: The buffer cache doesn't know or care what filesystem format is stored on the device. This enables caching for any block device, including raw device access.
Write Aggregation: Multiple writes to the same block are coalesced in the cache. Only the final state needs to be written to disk, dramatically reducing I/O traffic for frequently-modified data.
Synchronous Semantics Option: Applications can request synchronous writes (O_SYNC) that bypass caching benefits in exchange for durability guarantees.
Modern Linux unifies the buffer cache with the page cache—they share the same memory. Buffer heads now serve primarily as metadata describing how page cache pages map to disk blocks. This unification eliminated the double-caching problem where data could exist in both caches simultaneously, wasting memory.
The page cache represents a more sophisticated approach to I/O caching, operating at the file level rather than the block level. Instead of caching arbitrary disk blocks, the page cache caches file contents indexed by file identity and offset.
The page cache organizes cached data by file rather than by device location. This enables several powerful optimizations:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132
/* * Page Cache Implementation Concepts * * The page cache maps (inode, offset) -> page * Real Linux implementation uses radix trees and XArrays */ struct address_space { struct inode *host; /* Owner inode */ struct xarray i_pages; /* Cached pages (radix tree) */ atomic_t i_nrpages; /* Number of cached pages */ rwlock_t tree_lock; /* Lock for page tree */ /* Address space operations */ const struct address_space_operations *a_ops; /* Writeback state */ unsigned long flags; spinlock_t private_lock; struct list_head private_list;}; /* * Find a page in the page cache * Returns NULL if not cached */struct page *find_get_page(struct address_space *mapping, pgoff_t index) { struct page *page; rcu_read_lock(); page = xa_load(&mapping->i_pages, index); if (page && !page_cache_get_speculative(page)) { page = NULL; /* Page was being reclaimed */ } rcu_read_unlock(); return page;} /* * Find page or create if not present * This is the workhorse function for file reads */struct page *find_or_create_page(struct address_space *mapping, pgoff_t index, gfp_t gfp_mask) { struct page *page; int error; /* First, try simple lookup */ page = find_get_page(mapping, index); if (page) return page; /* Not in cache: allocate new page */ page = alloc_page(gfp_mask); if (!page) return NULL; /* Try to add to cache */ error = add_to_page_cache_locked(page, mapping, index, gfp_mask); if (error) { /* Someone else added it first - use theirs */ put_page(page); page = find_get_page(mapping, index); } return page;} /* * Generic file read implementation * Shows how page cache integrates with file I/O */ssize_t generic_file_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos) { struct inode *inode = filp->f_inode; struct address_space *mapping = inode->i_mapping; loff_t pos = *ppos; ssize_t read = 0; while (count > 0) { pgoff_t index = pos >> PAGE_SHIFT; size_t offset = pos & ~PAGE_MASK; size_t bytes = min(PAGE_SIZE - offset, count); struct page *page; /* Try to get page from cache */ page = find_get_page(mapping, index); if (!page) { /* Cache miss - need to read from disk */ page = page_cache_alloc(mapping); if (!page) return read ? read : -ENOMEM; /* Add to cache and initiate read */ add_to_page_cache(page, mapping, index); /* Trigger read-ahead for sequential access */ if (mapping->a_ops->readahead) trigger_readahead(mapping, filp, index); /* Read the page from disk */ read_page_from_disk(mapping, page); } /* Wait for page to be uptodate */ wait_on_page_locked(page); if (!PageUptodate(page)) { put_page(page); return read ? read : -EIO; } /* Copy data to user buffer */ if (copy_to_user(buf, page_address(page) + offset, bytes)) { put_page(page); return read ? read : -EFAULT; } put_page(page); buf += bytes; count -= bytes; pos += bytes; read += bytes; } *ppos = pos; return read;}The page cache uses sophisticated data structures to enable efficient operations:
XArray (Radix Tree): Each file's cached pages are stored in an XArray (formerly radix tree) indexed by page offset. This enables O(log n) lookup, insertion, and range queries—essential for operations like 'invalidate pages 100-200'.
LRU Lists: Pages are maintained on LRU-style lists for replacement decisions. Modern Linux uses multiple lists (active/inactive, file-backed/anonymous) to implement sophisticated replacement policies.
Page Flags: Each page structure contains flags indicating state: PG_locked (under I/O), PG_uptodate (contains valid data), PG_dirty (modified), PG_referenced (recently accessed), etc.
Address Space: Each cached file has an address_space structure containing its page tree and operations. The address_space_operations define how to read/write pages for that file type.
Read-ahead (or prefetching) is one of the most impactful I/O optimizations, exploiting spatial locality to initiate I/O before data is actually requested. By reading ahead of the access pattern, the kernel hides I/O latency entirely—when the application requests the next block, it's already in memory.
Modern Linux uses an adaptive read-ahead algorithm that adjusts dynamically based on access patterns:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148
/* * Simplified adaptive read-ahead algorithm * * The kernel tracks read patterns and adjusts * read-ahead window size dynamically */ struct file_ra_state { pgoff_t start; /* Window start */ unsigned int size; /* Window size */ unsigned int async_size; /* Async readahead trigger */ unsigned int ra_pages; /* Maximum readahead */ /* Pattern detection */ pgoff_t prev_pos; /* Previous read position */ unsigned int prev_count; /* Previous read count */ unsigned int pattern; /* Detected pattern flags */}; /* Readahead pattern flags */#define RA_SEQUENTIAL 1 /* Sequential access detected */#define RA_RANDOM 2 /* Random access detected */#define RA_MMAP 4 /* Memory-mapped access */ /* * Main readahead decision function */void page_cache_readahead(struct address_space *mapping, struct file_ra_state *ra, struct file *filp, pgoff_t offset, unsigned long req_size) { unsigned int max = ra->ra_pages; /* Typically 32 pages (128KB) */ pgoff_t expected, prev_index; prev_index = ra->prev_pos >> PAGE_SHIFT; expected = ra->start + ra->size; /* Case 1: Sequential read continues where expected */ if (offset == expected || offset == prev_index + 1) { /* Sequential pattern confirmed - expand window */ ra->pattern = RA_SEQUENTIAL; if (offset == ra->start + ra->async_size) { /* Hit async trigger - time to prefetch more */ ra->start = offset; ra->size = calc_readahead_size(ra, max); ra->async_size = ra->size * 3 / 4; /* Submit async readahead */ do_async_readahead(mapping, filp, offset, ra->size); } } /* Case 2: Start of sequential read or new sequential stream */ else if (offset == 0 || offset == prev_index) { /* Possible new sequential stream - start conservatively */ ra->start = offset; ra->size = initial_readahead_size(max); ra->async_size = ra->size / 2; do_sync_readahead(mapping, filp, offset, ra->size); } /* Case 3: Random access pattern */ else if (abs(offset - prev_index) > ra->ra_pages) { ra->pattern = RA_RANDOM; /* Disable readahead for random access */ ra->size = 0; ra->async_size = 0; } /* Case 4: Stride pattern detection */ else if (detect_stride(ra, offset)) { /* Read stride pattern (e.g., every Nth page) */ handle_stride_readahead(mapping, ra, offset); } /* Update history */ ra->prev_pos = (loff_t)offset << PAGE_SHIFT;} /* * Calculate optimal readahead window size */static unsigned int calc_readahead_size(struct file_ra_state *ra, unsigned int max) { unsigned int size = ra->size; /* Exponential growth up to maximum */ if (ra->pattern == RA_SEQUENTIAL) { size = size * 2; if (size > max) size = max; } /* Consider memory pressure */ if (memory_pressure_high()) size = size / 2; return max(size, 4U); /* Minimum 4 pages */} /* * Submit asynchronous readahead * Returns immediately; I/O completes in background */static void do_async_readahead(struct address_space *mapping, struct file *filp, pgoff_t offset, unsigned long nr_to_read) { struct blk_plug plug; unsigned long i; /* Plug block layer to batch requests */ blk_start_plug(&plug); for (i = 0; i < nr_to_read; i++) { struct page *page; pgoff_t page_offset = offset + i; /* Skip if already cached */ page = find_get_page(mapping, page_offset); if (page) { put_page(page); continue; } /* Allocate and add new page */ page = page_cache_alloc_readahead(mapping); if (!page) break; if (add_to_page_cache_lru(page, mapping, page_offset)) { put_page(page); continue; } /* Mark as readahead page */ SetPageReadahead(page); /* Submit read I/O */ mapping->a_ops->readpage(filp, page); } /* Unplug - submit batched requests */ blk_finish_plug(&plug);}Read-ahead provides substantial benefits but isn't free:
Benefits:
Costs:
The key is adaptation: aggressive read-ahead for sequential access, conservative or disabled for random access. The kernel tracks patterns and adjusts dynamically.
Applications can influence read-ahead behavior via posix_fadvise() with POSIX_FADV_SEQUENTIAL (hint for aggressive read-ahead), POSIX_FADV_RANDOM (disable read-ahead), or POSIX_FADV_WILLNEED (explicit prefetch request). Database systems often disable kernel read-ahead in favor of application-controlled prefetching.
The page cache enables one of the most powerful I/O mechanisms: memory-mapped files. With mmap(), file contents appear directly in process address space, allowing file access with simple memory operations instead of explicit read/write system calls.
When a process maps a file, the kernel creates page table entries that point to the page cache:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
/* * Memory-mapped file I/O through page cache */ #include <sys/mman.h>#include <fcntl.h>#include <unistd.h> /* * Example: Processing a file via memory mapping * * The kernel handles all caching automatically: * - Pages are loaded on-demand (page fault) * - Pages are shared with other mappers * - Modified pages are marked dirty * - Writeback occurs automatically or on msync() */void process_file_via_mmap(const char *filename) { int fd; struct stat sb; char *mapped; fd = open(filename, O_RDWR); fstat(fd, &sb); /* Map file into address space */ mapped = mmap(NULL, /* Let kernel choose address */ sb.st_size, /* Map entire file */ PROT_READ | PROT_WRITE, MAP_SHARED, /* Modifications affect file */ fd, 0); close(fd); /* File stays mapped; fd can be closed */ /* * Now file contents are accessible as memory. * * Reading: byte = mapped[offset] * - If page not in cache: page fault * - Kernel loads page from disk to page cache * - Page table updated to map page cache page * - Access resumes, now using cached page * * Writing: mapped[offset] = byte * - If page not in cache: same as read * - Mark page dirty * - Eventually written back by kernel or msync() */ /* Example: capitalize first 1000 characters */ for (size_t i = 0; i < 1000 && i < sb.st_size; i++) { if (mapped[i] >= 'a' && mapped[i] <= 'z') mapped[i] -= 32; /* Modifies page cache directly */ } /* Ensure changes are written to disk */ msync(mapped, sb.st_size, MS_SYNC); munmap(mapped, sb.st_size);} /* * Kernel page fault handler for memory-mapped files * (Conceptual implementation) */int filemap_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct file *file = vma->vm_file; struct address_space *mapping = file->f_mapping; pgoff_t offset = vmf->pgoff; struct page *page; /* Try to find page in cache */ page = find_get_page(mapping, offset); if (!page) { /* Not cached - need to read from file */ page = find_or_create_page(mapping, offset, GFP_KERNEL); if (!page) return VM_FAULT_OOM; /* Read page from disk if not uptodate */ if (!PageUptodate(page)) { mapping->a_ops->readpage(file, page); wait_on_page_locked(page); } } /* Lock page for mapping */ lock_page(page); /* * Now install page in process page tables. * Multiple processes can map same page cache page. */ vmf->page = page; return VM_FAULT_LOCKED;}Zero-Copy Access: Data never needs to be copied from kernel buffers to user buffers. The process directly accesses the page cache through its page tables.
Automatic Caching: The kernel manages all caching decisions. Pages are loaded on-demand, shared across processes, and written back automatically.
Simplified Programming: File access becomes simple memory operations. No read()/write() calls, no buffer management, no positioning.
Huge File Support: Even files larger than physical memory can be mapped—only accessed regions actually consume memory.
Page Fault Overhead: Every access to not-yet-loaded pages incurs a page fault. For highly random access patterns, explicit read() with proper buffering may outperform mmap().
Signal Handling: I/O errors become SIGBUS signals, which are harder to handle than read() error returns.
TLB Pressure: Large mappings consume TLB entries, potentially impacting other memory accesses.
Writeback Semantics: Modifications may be written back at unpredictable times. Applications requiring durability must explicitly call msync().
Understanding cache behavior requires systematic measurement. Several key metrics reveal cache effectiveness:
| Metric | Definition | Implications |
|---|---|---|
| Hit Rate | Fraction of accesses served from cache | Higher is better; target >90% for file-intensive workloads |
| Miss Rate | Fraction of accesses requiring I/O (1 - hit rate) | Lower is better; indicates cache sizing and replacement effectiveness |
| Fill Rate | Rate at which cache fills with new data | Indicates I/O pressure and miss frequency |
| Eviction Rate | Rate at which data is removed from cache | High rate suggests cache is too small or access pattern is pathological |
| Dirty Ratio | Fraction of cached data modified but not written | Risk indicator: high dirty ratio risks data loss on failure |
| Average Miss Penalty | Time cost of a cache miss | Depends on backing store; SSDs have 100x lower miss penalty than HDDs |
1234567891011121314151617181920212223242526272829303132333435
#!/bin/bash# Monitor page cache performance on Linux # Current cache sizeecho "=== Page Cache Size ==="grep -E "^(Cached|Buffers|Active\(file\)|Inactive\(file\)):" /proc/meminfo # Cache hit/miss statistics (requires perf)echo -e "\n=== Cache Hit/Miss (10 second sample) ==="perf stat -e cache-references,cache-misses,page-faults -a sleep 10 # Per-file cache status (requires vmtouch)echo -e "\n=== File Cache Status ==="vmtouch -v /var/log/syslog # Detailed I/O statisticsecho -e "\n=== Block I/O Statistics ==="cat /proc/diskstats | awk 'NF >= 14 { dev = $3 reads = $4 read_sectors = $6 writes = $8 write_sectors = $10 if (reads + writes > 0) printf "%-10s: %10d reads, %10d writes\n", dev, reads, writes}' # Page cache activity (per second)echo -e "\n=== Page Cache Activity (5 seconds) ==="sar -B 1 5 2>/dev/null || echo "Install sysstat for detailed stats" # Watch dirty page writebackecho -e "\n=== Dirty Pages ==="grep -E "^(Dirty|Writeback):" /proc/meminfoThe effective access time (EAT) quantifies the average time to access data considering cache behavior:
EAT = (hit_rate × cache_access_time) + (miss_rate × miss_penalty)
Where:
miss_penalty = cache_access_time + backing_store_access_time
Example Calculation:
With SSD backing:
EAT = 0.95 × 100ns + 0.05 × 50,100ns = 95ns + 2,505ns = 2,600ns
With HDD backing:
EAT = 0.95 × 100ns + 0.05 × 8,000,100ns = 95ns + 400,005ns = 400,100ns
This illustrates both the power of caching (reducing HDD access from 8ms to 400µs average) and why the backing store still matters (SSD provides 150x lower EAT than HDD even with identical caching).
Improving cache hit rate has diminishing returns. Going from 80% to 90% hit rate halves the miss rate (huge benefit). Going from 98% to 99% also halves the miss rate (same relative improvement, smaller absolute benefit). Going from 99.9% to 99.95% barely matters. Focus optimization effort where it has the most impact.
We've explored the fundamental concepts of I/O caching, establishing the foundation for understanding more advanced caching topics that follow. Let's consolidate the key insights:
What's Next:
The next page examines write caching—how operating systems handle write operations through the cache, the tradeoffs between performance and durability, and the mechanisms that ensure data reaches persistent storage despite the intervening cache layer. Understanding write caching is essential for building systems that are both fast and reliable.
You now understand the fundamental principles of I/O caching: why it exists, how it works at both the buffer and page cache levels, how read-ahead and memory mapping enhance it, and how to measure cache effectiveness. This foundation prepares you to explore the more complex topics of write caching, cache policies, coherence, and optimization.