Loading learning content...
Imagine trying to manage a library where patrons could request any book at any time with equal probability. You'd need every book instantly accessible—an impossibly expensive proposition. Fortunately, real library usage follows predictable patterns: a researcher studying Roman history requests books on ancient Rome, not random volumes on quantum physics or Renaissance art. This predictability transforms library management from an impossible task to a practical one.
Programs behave exactly like library patrons. They don't access memory randomly; they exhibit predictable patterns. This observation—the principle of locality—is perhaps the most important empirical fact in computer science. It explains why caches work, why virtual memory is practical, and why the working set model can effectively prevent thrashing.
By the end of this page, you will understand the principle of locality at a deep, intuitive level. You'll learn to distinguish temporal and spatial locality, recognize locality patterns in real code, quantify locality strength, and understand why locality is the foundation of all modern memory hierarchy design—from CPU caches to virtual memory to distributed systems.
The principle of locality states that programs tend to access a relatively small portion of their address space at any given time. This isn't a designed feature of programs—it's an emergent property of how programs are written and how algorithms work.
Definition (Principle of Locality):
Programs tend to reuse data and instructions they have used recently, or to access data near recently accessed locations. This tendency manifests as two distinct patterns: temporal locality and spatial locality.
This principle was first formally articulated by Peter Denning in 1968, though its effects had been observed and exploited in computer design for years before. Denning's insight was recognizing that this behavior could be formalized and used as the basis for memory management policies.
Locality isn't accidental—it emerges from fundamental aspects of computation: loops iterate over the same instructions, functions operate on local variables, data structures are accessed sequentially, and algorithms typically work on small 'active' portions of their data. The very structure of programs—with functions, loops, and data structures—creates locality.
The Two Forms of Locality:
Temporal Locality (Locality in Time): If a memory location is accessed, it is likely to be accessed again soon. This occurs because of:
Spatial Locality (Locality in Space): If a memory location is accessed, nearby locations are likely to be accessed soon. This occurs because of:
| Aspect | Temporal Locality | Spatial Locality |
|---|---|---|
| Definition | Same location accessed repeatedly | Nearby locations accessed together |
| Time relationship | Reuse over time | Access pattern in address space |
| Primary cause | Loops, function calls, variable reuse | Sequential execution, arrays, structs |
| Cache exploitation | Keep recently used data in cache | Fetch entire cache lines |
| VM exploitation | Keep recently accessed pages resident | Prefetch adjacent pages |
| Example | Loop counter variable i | Array elements a[0], a[1], a[2]... |
Temporal locality is the phenomenon where recently accessed memory locations are likely to be accessed again in the near future. This behavior is so pervasive in programs that it forms the theoretical basis for most cache replacement policies (LRU, for instance, directly exploits temporal locality).
Mathematical Formalization:
Let r(t) be the memory reference made at time t. Temporal locality can be expressed as:
P(r(t+Δ) = r(t)) > P(r(t+Δ) = x) for any random location x
In other words, the probability that we'll access the same location shortly after accessing it is higher than the probability of accessing any random location. This probability typically follows a roughly exponential decay curve—the longer since we accessed a location, the less likely we are to access it again.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
// Example 1: Loop variables exhibit extreme temporal locality// The variable 'i' is accessed on every iterationvoid loop_temporal_locality() { int sum = 0; // Accessed 1001 times (initialization + all iterations) for (int i = 0; i < 1000; i++) { // 'i' accessed ~3000 times sum += i; // sum accessed every iteration } // During execution: // - Instruction at 'for' statement: accessed 1000+ times // - 'i' variable: accessed ~3000 times (init, compare, increment) // - 'sum' variable: accessed ~1001 times // This creates EXTREME temporal locality for these locations} // Example 2: Function calls create temporal locality in instruction spacevoid frequently_called_function(int x) { // These instructions are accessed every time function is called static int call_count = 0; // Strong temporal locality call_count++; // Function code exhibits temporal locality through repeated calls} void caller() { for (int i = 0; i < 10000; i++) { frequently_called_function(i); // Code of this function accessed 10000 times }} // Example 3: Working with a small active set of variablesvoid matrix_multiply(int n, double *A, double *B, double *C) { // 'n' is accessed in every loop iteration (temporal locality) // Loop indices i, j, k have extreme temporal locality for (int i = 0; i < n; i++) { for (int j = 0; j < n; j++) { double sum = 0.0; // sum accessed n times for each (i,j) pair for (int k = 0; k < n; k++) { sum += A[i*n + k] * B[k*n + j]; } C[i*n + j] = sum; } } // Variables n, i, j, k, sum have MUCH stronger temporal locality // than the matrix elements themselves}Temporal locality isn't binary—it decays over time. A location accessed 10 instructions ago is more likely to be accessed again than one accessed 1000 instructions ago. This decay rate varies by program phase and workload type. Understanding this decay is crucial for tuning working set windows and cache replacement policies.
Why Temporal Locality Matters for Working Sets:
Temporal locality is the theoretical justification for the working set model. If programs didn't exhibit temporal locality—if they truly accessed memory randomly—then predicting future memory needs from past behavior would be impossible. The working set model assumes that pages accessed in the recent past will likely be accessed in the near future.
The Working Set Connection:
The working set at time t with window Δ is defined as the set of pages accessed in the interval [t-Δ, t]. This definition directly exploits temporal locality: pages in the working set were accessed recently, and temporal locality tells us they're likely to be accessed again soon. The stronger the temporal locality in a program, the more accurate the working set model's predictions, and the more effective working set-based memory management becomes.
Spatial locality is the tendency for programs to access memory locations that are close to recently accessed locations. This behavior emerges from fundamental aspects of program structure and data organization.
Mathematical Formalization:
Let addr(t) be the address accessed at time t. Spatial locality can be expressed as:
P(|addr(t+1) - addr(t)| < d) > P(|addr(t+1) - x| < d) for any random x
The probability that our next access is within distance d of our current access is higher than the probability of being within distance d of any random location. This probability typically decreases as d increases—accesses are most likely to be very close, less likely to be somewhat close, and least likely to be far away.
for(i) a[i] accesses consecutive elementsp = p->next jumps unpredictably through memoryfor(i) a[i*1000] skips large regions123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
// Example 1: Excellent spatial locality - sequential accessvoid sum_array_good(int *arr, int n) { long sum = 0; for (int i = 0; i < n; i++) { sum += arr[i]; // Accesses arr[0], arr[1], arr[2], ... } // Each access is 4 bytes from the previous // Result: Each cache line (64 bytes) serves 16 consecutive accesses // Page (4KB) serves 1024 consecutive accesses} // Example 2: Terrible spatial locality - strided accessvoid sum_array_bad(int *arr, int n, int stride) { long sum = 0; for (int i = 0; i < n; i++) { sum += arr[i * stride]; // Accesses arr[0], arr[stride], arr[2*stride], ... } // If stride = 1024, each access is 4KB apart // Each access potentially hits a different cache line AND different page // Result: Catastrophic cache and TLB miss rates} // Example 3: Matrix traversal - row vs column order#define N 1000double matrix[N][N]; // Row-major storage in C // GOOD: Row-major access matches storage layoutdouble sum_row_major() { double sum = 0; for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { sum += matrix[i][j]; // Consecutive memory access } } return sum; // ~1 million accesses, ~1000 page accesses if 4KB pages} // BAD: Column-major access fights storage layoutdouble sum_column_major() { double sum = 0; for (int j = 0; j < N; j++) { for (int i = 0; i < N; i++) { sum += matrix[i][j]; // Stride of N elements = 8000 bytes } } return sum; // ~1 million accesses, potentially 1 million page accesses!} // Example 4: Structure layout affects spatial localitystruct BadLayout { char flags; // 1 byte double values[1000]; // 8000 bytes char more_flags; // 1 byte};// If you iterate over array of BadLayouts accessing only flags,// you jump 8002 bytes between each access - terrible spatial locality struct GoodLayout { char flags; char more_flags; // padding to alignment double values[1000];};// If struct array designed for access pattern, spatial locality improvesSpatial locality interacts critically with page size. A 4KB page contains 1024 integers or 512 doubles. If your access pattern exhibits spatial locality at scales smaller than the page size, you benefit from TLB efficiency. If your stride exceeds the page size, each access may require a TLB miss and potentially a page fault. This is why large stride access patterns can devastate virtual memory performance even when all data fits in physical memory.
Understanding theoretical locality is important, but recognizing how locality manifests in real programs is essential for systems practitioners. Different program phases exhibit dramatically different locality characteristics.
Program Phases and Locality:
Programs don't exhibit uniform locality throughout their execution. Instead, they transition through phases, each with distinct memory access patterns:
Initialization Phase: Programs load configuration, initialize data structures, and set up state. Access patterns are often sequential (high spatial locality) but one-time (low temporal locality for specific addresses).
Main Computation Phase: The core algorithm executes. Locality characteristics depend heavily on the algorithm—matrix algorithms show different patterns than graph algorithms.
Transition/Shuffle Phases: When switching tasks or processing new inputs, programs may exhibit temporary poor locality as they abandon one working set and build another.
Cleanup Phase: Memory deallocation, file closing, and shutdown. Often exhibits spatial locality through data structures but low temporal reuse.
| Workload Type | Temporal Locality | Spatial Locality | Working Set Size |
|---|---|---|---|
| Numerical/Scientific | High (loop iterations) | Very High (matrix/array access) | Predictable, often large |
| Database Query | Medium (index reuse) | Variable (depends on query) | Dynamic, query-dependent |
| Compiler | High within phases | High (sequential parsing) | Phase-dependent |
| Web Server | High (hot resources) | Medium (scattered requests) | Depends on caching |
| Graph Algorithms | Low (each node visited once) | Low (pointer chasing) | Unpredictable |
| Text Editor | Very High (cursor region) | High (text buffers) | Small and stable |
| Machine Learning | High (batch iterations) | Very High (tensor ops) | Model-dependent, often huge |
Algorithm designers increasingly consider locality as a first-class design criterion. Cache-oblivious algorithms, recursive blocked algorithms, and space-filling curve orderings are all techniques to improve locality. A theoretically slower algorithm with better locality often outperforms a theoretically faster algorithm with poor locality on real hardware.
Measuring Locality in Practice:
Locality can be quantified through several metrics:
Working Set Size: The number of distinct pages accessed in a time window. Smaller working sets indicate more reuse (temporal locality within that set).
Reuse Distance: For a memory access, how many other memory accesses occur before this address is accessed again. Smaller reuse distances indicate stronger temporal locality.
Stride Distribution: Histogram of distances between consecutive accesses. Concentration near zero indicates strong spatial locality.
Miss Rate Curves: Plot cache miss rate vs. cache size. Steeper curves indicate stronger locality—the program benefits greatly from each additional bit of cache.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107
# Conceptual implementation of locality metrics class LocalityAnalyzer: """Analyze memory access trace for locality characteristics.""" def __init__(self, page_size=4096): self.page_size = page_size self.trace = [] # List of (timestamp, address) def record_access(self, timestamp, address): """Record a memory access.""" self.trace.append((timestamp, address)) def compute_working_set(self, window_size): """ Compute working set size over time with given window. Returns list of (time, working_set_size) pairs. This directly implements Denning's working set definition: W(t, Δ) = {pages accessed in interval [t-Δ, t]} """ results = [] for i, (t, addr) in enumerate(self.trace): # Find all accesses in window [t - window_size, t] window_start = t - window_size pages_in_window = set() for j in range(i, -1, -1): access_time, access_addr = self.trace[j] if access_time < window_start: break page = access_addr // self.page_size pages_in_window.add(page) results.append((t, len(pages_in_window))) return results def compute_reuse_distance(self): """ Compute reuse distance distribution. Reuse distance = number of distinct addresses accessed since last use. Lower reuse distances indicate stronger temporal locality. """ last_access = {} # address -> trace position distinct_since = {} # address -> set of addresses since last access reuse_distances = [] access_stack = [] # Stack to track unique accesses for i, (t, addr) in enumerate(self.trace): page = addr // self.page_size if page in last_access: # Find how many distinct pages accessed since last use # This is the reuse distance (LRU stack depth) if page in access_stack: distance = access_stack.index(page) reuse_distances.append(distance) access_stack.remove(page) else: reuse_distances.append(float('inf')) # First access access_stack.insert(0, page) # Move to top of stack last_access[page] = i return reuse_distances def compute_stride_distribution(self): """ Compute distribution of strides (distances between consecutive accesses). Concentration near zero indicates strong spatial locality. """ strides = [] for i in range(1, len(self.trace)): prev_addr = self.trace[i-1][1] curr_addr = self.trace[i][1] stride = abs(curr_addr - prev_addr) strides.append(stride) return strides def summarize_locality(self): """Generate locality summary statistics.""" strides = self.compute_stride_distribution() reuse = self.compute_reuse_distance() # Spatial locality: fraction of strides within one cache line cache_line = 64 spatial_score = sum(1 for s in strides if s < cache_line) / len(strides) # Temporal locality: fraction of reuses within small distance finite_reuse = [r for r in reuse if r != float('inf')] if finite_reuse: temporal_score = sum(1 for r in finite_reuse if r < 100) / len(finite_reuse) else: temporal_score = 0 return { 'spatial_locality_score': spatial_score, # 0-1, higher = better 'temporal_locality_score': temporal_score, # 0-1, higher = better 'average_stride': sum(strides) / len(strides), 'median_reuse_distance': sorted(finite_reuse)[len(finite_reuse)//2] if finite_reuse else None, 'unique_pages': len(set(addr // self.page_size for _, addr in self.trace)) }The principle of locality justifies the entire modern memory hierarchy. Without locality, the fundamental assumption underlying caches, TLBs, and virtual memory would collapse.
The Memory Hierarchy Depends on Locality:
Modern computer systems use a hierarchy of memories with different speeds and sizes:
| Level | Typical Size | Typical Latency | Role |
|---|---|---|---|
| Registers | 1 KB | 0 cycles | Immediate operands |
| L1 Cache | 32-64 KB | 4 cycles | Hot data/instructions |
| L2 Cache | 256-512 KB | 12 cycles | Warm data |
| L3 Cache | 8-64 MB | 40 cycles | Shared warm data |
| Main Memory | 16-256 GB | 200 cycles | All resident data |
| SSD/Swap | 256+ GB | 50,000+ cycles | Non-resident data |
Without locality, this hierarchy collapses. If programs accessed memory randomly, each level of cache would provide no benefit—miss rates would approach 100%, and effective memory access time would approach the slowest tier.
Fast memory is expensive. SRAM (used in caches) costs ~100x more per byte than DRAM. If we built all memory from SRAM, a 32GB computer would cost tens of thousands of dollars more. Locality lets us use a small amount of fast memory to serve most accesses, achieving near-SRAM performance at near-DRAM prices. This is the economic miracle that locality enables.
Locality Exploitation at Each Level:
CPU Caches: Exploit both temporal locality (keeping recently accessed cache lines) and spatial locality (loading entire 64-byte cache lines, even when only one word is needed). LRU-like replacement assumes temporal locality.
TLB (Translation Lookaside Buffer): Exploits temporal locality in page table lookups. Without locality, every memory access would require multiple memory accesses for page table walking—a 4-5x slowdown.
Virtual Memory / Paging: The entire concept assumes temporal locality. Pages accessed recently are kept in physical memory; pages not accessed recently are candidates for eviction. The working set model formalizes this exploitation.
Prefetching: Exploits spatial locality by predicting future accesses based on current access patterns. Sequential prefetch brings in adjacent cache lines or pages before they're explicitly requested.
Memory-Mapped I/O: Assumes spatial locality in file access. Reading one byte of a memory-mapped file loads an entire page, betting that nearby bytes will be accessed soon.
The working set model is a direct formalization of temporal locality for memory management. Understanding this connection is essential for grasping why working set-based policies work and when they fail.
The Key Insight:
Peter Denning's working set model makes one fundamental observation: temporal locality means that pages accessed in the recent past predict pages needed in the near future. If we keep exactly those pages resident, we prevent page faults without wasting memory on unused pages.
Formal Connection:
Let W(t, Δ) be the working set at time t with window Δ:
W(t, Δ) = {p : page p was accessed in interval [t-Δ, t]}
Temporality locality implies:
P(next access ∈ W(t, Δ)) >> P(next access ∉ W(t, Δ))
The stronger the temporal locality, the higher the probability that our next access will be within our working set. This probability determines page fault rate under working set management.
The working set model is fundamentally a bet: we're betting that pages accessed recently will be accessed again soon. Temporal locality tells us this bet usually wins. The window parameter Δ adjusts how far back we look—a larger window captures more temporal locality but costs more memory. The optimal Δ balances fault rate reduction against memory consumption.
Why Window Size Matters:
The working set window Δ must be chosen to match the temporal locality characteristics of the workload:
Window too small (Δ << locality scale):
Window too large (Δ >> locality scale):
Window matched to locality:
Phase Transitions and Locality Shifts:
Programs transition between phases with different locality characteristics. When a program shifts to a new phase, its working set changes. The working set model naturally adapts—old pages age out of the window, new pages are added. However, during transitions, page fault rates may spike temporarily until the new working set stabilizes.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131
# Demonstration: Locality strength determines working set effectiveness import randomfrom collections import defaultdict class WorkingSetSimulator: """ Simulate working set model effectiveness under different locality conditions. """ def __init__(self, num_pages, memory_frames, window_delta): self.num_pages = num_pages self.memory_frames = memory_frames self.window_delta = window_delta self.access_history = [] # Circular buffer of recent page accesses self.resident_pages = set() self.page_faults = 0 self.total_accesses = 0 def generate_high_locality_trace(self, length, hot_pages=10): """ Generate trace with strong temporal locality. Most accesses go to a small set of 'hot' pages. """ hot_set = list(range(hot_pages)) trace = [] for _ in range(length): if random.random() < 0.9: # 90% accesses to hot pages trace.append(random.choice(hot_set)) else: # 10% accesses to random pages trace.append(random.randint(0, self.num_pages - 1)) return trace def generate_low_locality_trace(self, length): """ Generate trace with poor temporal locality. Accesses uniformly distributed across all pages. """ return [random.randint(0, self.num_pages - 1) for _ in range(length)] def compute_working_set(self, current_time): """Compute working set from recent history.""" working_set = set() window_start = max(0, current_time - self.window_delta) for t in range(window_start, current_time + 1): if t < len(self.access_history): working_set.add(self.access_history[t]) return working_set def access_page(self, page, current_time): """Process a page access using working set policy.""" self.total_accesses += 1 self.access_history.append(page) # Check if page is resident if page not in self.resident_pages: self.page_faults += 1 # Compute current working set ws = self.compute_working_set(current_time) # Make room if needed if len(self.resident_pages) >= self.memory_frames: # Evict pages not in working set pages_to_evict = self.resident_pages - ws if pages_to_evict: # Remove one page not in working set self.resident_pages.remove(pages_to_evict.pop()) else: # All pages are in working set - need more memory or evict LRU # (In true working set model, this indicates insufficient memory) oldest = min(self.resident_pages, key=lambda p: self.access_history[::-1].index(p) if p in self.access_history else float('inf')) self.resident_pages.remove(oldest) self.resident_pages.add(page) def simulate(self, trace): """Run simulation on given trace.""" for t, page in enumerate(trace): self.access_page(page, t) return { 'page_faults': self.page_faults, 'total_accesses': self.total_accesses, 'fault_rate': self.page_faults / self.total_accesses, 'final_working_set_size': len(self.compute_working_set(len(trace) - 1)) } # Compare high vs low locality workloadsdef compare_locality_impact(): """ Demonstrate that working set model works well BECAUSE of locality. """ print("Comparing Working Set Effectiveness vs Locality Strength") print("=" * 60) print(f"Pages: 1000, Memory Frames: 50, Window: 100 accesses") print(f"Trace length: 10000 accesses") print() # High locality workload sim_high = WorkingSetSimulator(num_pages=1000, memory_frames=50, window_delta=100) trace_high = sim_high.generate_high_locality_trace(10000, hot_pages=20) result_high = sim_high.simulate(trace_high) # Low locality workload sim_low = WorkingSetSimulator(num_pages=1000, memory_frames=50, window_delta=100) trace_low = sim_low.generate_low_locality_trace(10000) result_low = sim_low.simulate(trace_low) print("HIGH LOCALITY (90% to 20 hot pages):") print(f" Page faults: {result_high['page_faults']}") print(f" Fault rate: {result_high['fault_rate']:.4f}") print() print("LOW LOCALITY (uniform random):") print(f" Page faults: {result_low['page_faults']}") print(f" Fault rate: {result_low['fault_rate']:.4f}") print() print("CONCLUSION:") print(f" High locality fault rate is {result_low['fault_rate']/result_high['fault_rate']:.1f}x better") print(f" Working set model effectiveness DEPENDS on locality presence!") # Output example:# HIGH LOCALITY: Fault rate ~0.02 (2%)# LOW LOCALITY: Fault rate ~0.95 (95%) # Working set works because locality exists!The principle of locality is the empirical foundation upon which the entire working set model rests. Without locality, predicting future memory needs from past behavior would be futile. With locality, it becomes not only possible but highly effective.
Key Takeaways:
You now understand the principle of locality—the theoretical foundation of the working set model. In the next page, we'll formally define the working set and explore how this abstract concept translates into a concrete memory management mechanism.
Next Up: Working Set Definition
With locality understood, we're ready to formalize the working set concept. We'll develop Denning's definition of W(t, Δ), explore its mathematical properties, and understand how this abstraction maps to memory management decisions. The working set definition transforms the vague notion of 'pages the process is using' into a precise, measurable, and actionable concept.