Caching - Learning Module

Loading content...

0/240

Cache Effectiveness

From Theory to Optimization

A cache can exist without being effective. If the hit rate is 50% and the backing store is 1000x slower, performance improves only 2x—far from the theoretical potential. Cache effectiveness is the measure of how well a cache actually improves system performance.

Maximizing cache effectiveness requires understanding workload characteristics, measuring cache behavior, identifying bottlenecks, and applying targeted optimizations. This page brings together everything we've learned to enable practical cache optimization.

What You Will Learn

By the end of this page, you will understand how to measure cache effectiveness, analyze cache behavior under real workloads, identify and resolve cache performance problems, and apply optimization techniques to maximize cache benefits.

Measuring Cache Effectiveness

Key Metrics

Cache effectiveness is quantified through several interconnected metrics:

Cache Effectiveness Metrics
Metric	Formula	Ideal Value	Interpretation
Hit Rate	Hits / Total Accesses	90%	Fraction of accesses served from cache
Miss Rate	Misses / Total Accesses	<10%	Fraction requiring backing store (1 - hit rate)
Effective Access Time	Hit% × CacheTime + Miss% × MissTime	Close to CacheTime	Average access time considering hits/misses
Speedup	BackingStoreTime / EAT	High	Performance improvement from caching
Cache Efficiency	Useful Data / Cache Size	High	How much cached data is actually used

cache_metrics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
"""
Cache effectiveness analysis
"""
from dataclasses import dataclass
 
@dataclass
class CacheMetrics:
    hits: int
    misses: int
    cache_access_time_ns: float  # Nanoseconds
    backing_access_time_ns: float
    
    @property
    def hit_rate(self) -> float:
        total = self.hits + self.misses
        return self.hits / total if total > 0 else 0
    
    @property
    def miss_rate(self) -> float:
        return 1 - self.hit_rate
    
    @property
    def effective_access_time(self) -> float:
        """Average access time considering hit rate."""
        miss_penalty = self.cache_access_time_ns + self.backing_access_time_ns
        return (self.hit_rate * self.cache_access_time_ns + 
                self.miss_rate * miss_penalty)
    
    @property
    def speedup(self) -> float:
        """How much faster than uncached access."""
        return self.backing_access_time_ns / self.effective_access_time
    
    def print_analysis(self):
        print(f"Hit Rate: {self.hit_rate:.1%}")
        print(f"Miss Rate: {self.miss_rate:.1%}")
        print(f"Effective Access Time: {self.effective_access_time/1000:.1f} µs")
        print(f"Speedup: {self.speedup:.1f}x")
 
# Example: Page cache with SSD backing
ssd_cache = CacheMetrics(
    hits=950000,
    misses=50000,
    cache_access_time_ns=100,        # 100 ns for memory
    backing_access_time_ns=50000     # 50 µs for SSD
)
ssd_cache.print_analysis()
# Hit Rate: 95.0%
# Effective Access Time: 2.6 µs
# Speedup: 19.2x
 
# Example: Same hit rate with HDD backing
hdd_cache = CacheMetrics(
    hits=950000,
    misses=50000,
    cache_access_time_ns=100,
    backing_access_time_ns=8000000   # 8 ms for HDD
)
hdd_cache.print_analysis()
# Hit Rate: 95.0%
# Effective Access Time: 400.1 µs
# Speedup: 20.0x

Amdahl's Law for Caching

Cache improvement is bounded by Amdahl's Law:

Speedup_max = 1 / (1 - CacheableFraction + CacheableFraction/SpeedImprovement)

If only 80% of accesses can be cached, maximum speedup is limited regardless of cache performance. Identifying and increasing the cacheable fraction is often more valuable than perfecting cache hit rate.

Workload Characterization

Effective caching requires understanding the workload. Different access patterns respond differently to caching strategies.

Access Pattern Categories

Workload Patterns and Cache Implications
Pattern	Characteristics	Cache Behavior	Optimization
Working Set	Repeated access to fixed data set	Excellent if cache > working set	Size cache to fit working set
Sequential Scan	One-time linear traversal	Poor (pollutes cache)	Bypass cache or use low priority
Random Access	Uniform random across large space	Poor (no locality)	Increase cache size or accept misses
Zipf/Hot Spots	Few items get most accesses	Good (hot items stay cached)	LFU or frequency-aware policy
Temporal Burst	Heavy access then idle	Good during burst	Size for peak, age out when idle

workload_analysis.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#!/bin/bash
# Analyze I/O workload patterns on Linux
 
echo "=== Page Cache Statistics ==="
grep -E "^(Cached|Buffers|Active|Inactive):" /proc/meminfo
 
echo -e "
=== Cache Hit/Miss Ratios (requires perf) ==="
# Sample cache behavior for 10 seconds
sudo perf stat -e     cache-references,cache-misses,    page-faults,major-faults     -a sleep 10 2>&1 | grep -E "(cache|fault)"
 
echo -e "
=== File Access Pattern (requires blktrace) ==="
# Trace block I/O for specific device
# sudo blktrace -d /dev/sda -o trace &
# sleep 10
# sudo blkparse trace.blktrace.* | head -50
 
echo -e "
=== Working Set Estimation ==="
# Monitor page cache pressure
sar -B 1 5 2>/dev/null || echo "Install sysstat"
 
echo -e "
=== Hot Files (via inotify or opensnoop) ==="
# Which files accessed most frequently
# sudo opensnoop-bpfcc -d 10 2>/dev/null | #     awk '{print $NF}' | sort | uniq -c | sort -rn | head
 
echo -e "
=== Sequential vs Random Pattern ==="
# Analyze with iostat
iostat -x 1 5 | grep -E "Device|sda|nvme"

Cache Sizing Strategies

Cache size directly impacts effectiveness. Too small and hit rate suffers; too large and resources are wasted or other subsystems are starved.

The Working Set Model

Denning's working set model defines the working set W(t, τ) as the set of pages accessed in the time interval (t - τ, t). For optimal caching:

If cache ≥ working set: near 100% hit rate
If cache < working set: hit rate degrades, thrashing possible

Sizing Approaches

Cache Sizing Methods

•Profile-based: Measure working set under production load, size cache to fit
•Miss-rate curve: Plot miss rate vs cache size, find knee of curve
•Cost-based: Balance cache cost against miss penalty savings
•Memory fraction: Allocate percentage of available memory (e.g., 50%)
•Adaptive: Let system adjust dynamically based on demand

cache_sizing_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
"""
Cache sizing analysis using stack distance
"""
from collections import OrderedDict
 
class StackDistanceAnalyzer:
    """
    Compute stack distance distribution.
    Stack distance = number of unique items accessed since last access.
    If stack distance < cache size, access would be a hit.
    """
    
    def __init__(self):
        self.access_history = OrderedDict()
        self.distances = []  # Stack distances for each access
        self.access_count = 0
    
    def access(self, item):
        self.access_count += 1
        
        if item in self.access_history:
            # Calculate stack distance (position from most recent)
            keys = list(self.access_history.keys())
            position = len(keys) - keys.index(item)
            self.distances.append(position)
            
            # Move to most recent
            del self.access_history[item]
        else:
            # First access - infinite stack distance (cold miss)
            self.distances.append(float('inf'))
        
        self.access_history[item] = True
    
    def simulate_hit_rate(self, cache_size: int) -> float:
        """What would hit rate be with given cache size?"""
        hits = sum(1 for d in self.distances if d <= cache_size)
        return hits / len(self.distances) if self.distances else 0
    
    def find_optimal_size(self, target_hit_rate: float) -> int:
        """Find minimum cache size for target hit rate."""
        sorted_distances = sorted(d for d in self.distances 
                                  if d != float('inf'))
        if not sorted_distances:
            return 0
        
        target_hits = int(target_hit_rate * len(self.distances))
        if target_hits <= 0:
            return 0
        if target_hits > len(sorted_distances):
            return sorted_distances[-1] if sorted_distances else 0
            
        return sorted_distances[target_hits - 1]
    
    def print_size_recommendations(self):
        """Print cache size recommendations."""
        print("Cache Size -> Hit Rate")
        for size in [10, 50, 100, 500, 1000, 5000]:
            hr = self.simulate_hit_rate(size)
            print(f"  {size:5} items: {hr:.1%}")
        
        print("
Target Hit Rate -> Required Size")
        for target in [0.80, 0.90, 0.95, 0.99]:
            size = self.find_optimal_size(target)
            print(f"  {target:.0%}: {size} items")

Common Cache Problems

Several patterns cause cache ineffectiveness. Recognizing and addressing them is essential for optimization.

Cache Pathologies and Solutions
Problem	Symptom	Cause	Solution
Cache Thrashing	Near-zero hit rate	Working set > cache	Increase cache or reduce working set
Cache Pollution	Poor hit rate for hot data	Cold data evicts hot data	Scan resistance, admission policy
Cold Start	Poor hit rate after restart	Cache empty after clear/restart	Warm-up strategy, persistent cache
Write Amplification	High I/O despite caching	Dirty pages force writeback	Tune dirty limits, better batching
False Sharing	Unnecessary invalidation	Unrelated data in same cache line	Align data structures

cache_problems.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
/*
 * Detecting and resolving common cache problems
 */
 
/* PROBLEM: Cache pollution from sequential scan */
 
// Bad: Full table scan pollutes cache
void process_all_records_bad(Database *db) {
    for (int i = 0; i < db->record_count; i++) {
        Record *r = db->read(i);  // Caches each record
        process(r);
        // Records evict hot data, never accessed again
    }
}
 
// Good: Use O_DIRECT or advise kernel
void process_all_records_good(Database *db) {
    posix_fadvise(db->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
    
    for (int i = 0; i < db->record_count; i++) {
        Record *r = db->read(i);
        process(r);
        
        // Tell kernel we're done with these pages
        posix_fadvise(db->fd, i * RECORD_SIZE, RECORD_SIZE,
                      POSIX_FADV_DONTNEED);
    }
}
 
/* PROBLEM: Cache cold start */
 
// Solution: Warm cache on startup
void warm_cache(const char *hot_files[], int count) {
    for (int i = 0; i < count; i++) {
        int fd = open(hot_files[i], O_RDONLY);
        if (fd < 0) continue;
        
        // Advise kernel to read ahead
        posix_fadvise(fd, 0, 0, POSIX_FADV_WILLNEED);
        
        // Actually fault in pages
        struct stat st;
        fstat(fd, &st);
        char *map = mmap(NULL, st.st_size, PROT_READ, 
                         MAP_PRIVATE, fd, 0);
        
        // Touch each page to ensure it's loaded
        volatile char c;
        for (size_t off = 0; off < st.st_size; off += 4096) {
            c = map[off];
        }
        
        munmap(map, st.st_size);
        close(fd);
    }
}
 
/* PROBLEM: False sharing in application cache */
 
// Bad: Counter and data in same cache line
struct bad_cache_entry {
    int access_count;      // Updated frequently
    char data[60];         // Read frequently
};  // 64 bytes = 1 cache line - updates invalidate readers
 
// Good: Separate hot and cold fields
struct good_cache_entry {
    char data[64];         // Read-only (mostly)
};
 
struct cache_metadata {
    int access_count;      // Updated separately
    // Padding to own cache line
    char padding[60];
};

Optimization Techniques

Beyond avoiding problems, active optimization improves cache effectiveness.

Cache Optimization Strategies

•Prefetching: Load data before it's needed based on predicted access patterns
•Admission policies: Don't cache data unlikely to be reused (one-time reads)
•Tiered caching: Multiple cache levels with different characteristics
•Cache partitioning: Isolate workloads to prevent interference
•Compression: Store more data in same cache space
•Data locality improvement: Reorganize data for better cache utilization

cache_optimization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
"""
Cache optimization patterns
"""
 
class AdmissionControlledCache:
    """
    Don't cache items unlikely to be reused.
    Tracks recent non-cached accesses; only caches on second access.
    """
    
    def __init__(self, capacity):
        self.cache = {}
        self.first_access = set()  # Seen once, not cached
        self.capacity = capacity
    
    def get(self, key):
        if key in self.cache:
            return self.cache[key]  # Hit
        return None
    
    def put(self, key, value):
        if key in self.cache:
            self.cache[key] = value
            return
        
        if key in self.first_access:
            # Second access - worth caching
            self.first_access.remove(key)
            self._insert(key, value)
        else:
            # First access - just remember it
            self.first_access.add(key)
            # Bound first_access size
            if len(self.first_access) > self.capacity:
                self.first_access.pop()
    
    def _insert(self, key, value):
        if len(self.cache) >= self.capacity:
            self.cache.pop(next(iter(self.cache)))  # LRU evict
        self.cache[key] = value
 
 
class TieredCache:
    """
    L1 (small/fast) + L2 (large/slower) architecture.
    Frequently accessed data promotes to L1.
    """
    
    def __init__(self, l1_capacity, l2_capacity):
        self.l1 = LRUCache(l1_capacity)  # Hot cache
        self.l2 = LRUCache(l2_capacity)  # Warm cache
        self.access_count = {}
        self.promotion_threshold = 3
    
    def get(self, key):
        # Try L1 first
        value = self.l1.get(key)
        if value is not None:
            return value
        
        # Try L2
        value = self.l2.get(key)
        if value is not None:
            # Track accesses for promotion
            self.access_count[key] = self.access_count.get(key, 0) + 1
            if self.access_count[key] >= self.promotion_threshold:
                # Promote to L1
                self.l1.put(key, value)
                del self.access_count[key]
            return value
        
        return None  # Miss
    
    def put(self, key, value):
        # New data goes to L2
        self.l2.put(key, value)
        self.access_count[key] = 0

Linux Cache Tuning

Linux provides numerous parameters to tune page cache behavior:

linux_cache_tuning.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/bin/bash
# Linux page cache tuning
 
# === View current settings ===
echo "=== Memory Pressure Settings ==="
cat /proc/sys/vm/vfs_cache_pressure  # Higher = reclaim caches faster
 
echo "=== Swappiness ==="
cat /proc/sys/vm/swappiness  # Lower = prefer reclaiming page cache
 
echo "=== Dirty Page Settings ==="
sysctl vm.dirty_ratio
sysctl vm.dirty_background_ratio
sysctl vm.dirty_expire_centisecs
 
# === Tuning for different workloads ===
 
# Database server (own cache management)
# - Minimize page cache, let DB manage caching
# sysctl -w vm.swappiness=10
# sysctl -w vm.vfs_cache_pressure=200
# sysctl -w vm.dirty_ratio=5
# sysctl -w vm.dirty_background_ratio=2
 
# File server (maximize cache)
# - Keep as much data cached as possible
# sysctl -w vm.swappiness=10
# sysctl -w vm.vfs_cache_pressure=50
# sysctl -w vm.dirty_ratio=40
# sysctl -w vm.dirty_background_ratio=10
 
# Low-latency application
# - Minimize writeback pauses
# sysctl -w vm.dirty_bytes=16777216      # 16MB max dirty
# sysctl -w vm.dirty_background_bytes=4194304  # 4MB start writeback
 
echo "=== Drop Caches (for testing) ==="
# Sync and drop page cache
# sync; echo 1 > /proc/sys/vm/drop_caches  # Page cache
# sync; echo 2 > /proc/sys/vm/drop_caches  # dentries+inodes
# sync; echo 3 > /proc/sys/vm/drop_caches  # All

Test Tuning Changes

Always test cache tuning under realistic load. What works in benchmarks may hurt production workloads with different access patterns. Use gradual changes and monitor metrics to understand impact.

Summary: Cache Effectiveness

Key Takeaways

•Measure before optimizing — Hit rate, EAT, and speedup quantify cache value; optimize based on data.
•Understand your workload — Different access patterns require different strategies.
•Size caches appropriately — Working set analysis guides sizing decisions.
•Recognize pathologies — Thrashing, pollution, cold starts all have specific solutions.
•Apply targeted optimizations — Prefetching, admission control, tiering improve effectiveness.
•Tune the OS for your workload — Linux parameters allow workload-specific optimization.

Module Complete

You've mastered I/O caching from foundational concepts through practical optimization. You understand how caching bridges the processor-storage speed gap, the tradeoffs in write caching, policy selection, coherence maintenance, and effectiveness optimization. This knowledge enables you to design, analyze, and tune caching systems for any workload.