Operating SystemsWorking Set Model

Working Set Model

LevelIntermediate

Duration75 mins

TopicWorking Set Model

1 / 5

Locality Principle

The Empirical Foundation of Memory Management

Imagine trying to manage a library where patrons could request any book at any time with equal probability. You'd need every book instantly accessible—an impossibly expensive proposition. Fortunately, real library usage follows predictable patterns: a researcher studying Roman history requests books on ancient Rome, not random volumes on quantum physics or Renaissance art. This predictability transforms library management from an impossible task to a practical one.

Programs behave exactly like library patrons. They don't access memory randomly; they exhibit predictable patterns. This observation—the principle of locality—is perhaps the most important empirical fact in computer science. It explains why caches work, why virtual memory is practical, and why the working set model can effectively prevent thrashing.

What You Will Learn

By the end of this page, you will understand the principle of locality at a deep, intuitive level. You'll learn to distinguish temporal and spatial locality, recognize locality patterns in real code, quantify locality strength, and understand why locality is the foundation of all modern memory hierarchy design—from CPU caches to virtual memory to distributed systems.

What is Locality?

The principle of locality states that programs tend to access a relatively small portion of their address space at any given time. This isn't a designed feature of programs—it's an emergent property of how programs are written and how algorithms work.

Definition (Principle of Locality):

Programs tend to reuse data and instructions they have used recently, or to access data near recently accessed locations. This tendency manifests as two distinct patterns: temporal locality and spatial locality.

This principle was first formally articulated by Peter Denning in 1968, though its effects had been observed and exploited in computer design for years before. Denning's insight was recognizing that this behavior could be formalized and used as the basis for memory management policies.

Why Locality Exists

Locality isn't accidental—it emerges from fundamental aspects of computation: loops iterate over the same instructions, functions operate on local variables, data structures are accessed sequentially, and algorithms typically work on small 'active' portions of their data. The very structure of programs—with functions, loops, and data structures—creates locality.

The Two Forms of Locality:

Temporal Locality (Locality in Time): If a memory location is accessed, it is likely to be accessed again soon. This occurs because of:

Loop iterations that execute the same instructions repeatedly
Frequently called functions whose code is accessed over and over
Variables that are read and written multiple times during computation
Stack operations that repeatedly access top-of-stack locations

Spatial Locality (Locality in Space): If a memory location is accessed, nearby locations are likely to be accessed soon. This occurs because of:

Sequential instruction execution (fetching consecutive instructions)
Array traversals that access consecutive elements
Structure traversals that access consecutive fields
Stack growth that accesses consecutive stack frames

Temporal vs Spatial Locality Comparison
Aspect	Temporal Locality	Spatial Locality
Definition	Same location accessed repeatedly	Nearby locations accessed together
Time relationship	Reuse over time	Access pattern in address space
Primary cause	Loops, function calls, variable reuse	Sequential execution, arrays, structs
Cache exploitation	Keep recently used data in cache	Fetch entire cache lines
VM exploitation	Keep recently accessed pages resident	Prefetch adjacent pages
Example	Loop counter variable `i`	Array elements `a[0], a[1], a[2]...`

Temporal Locality: Exploiting Recency

Temporal locality is the phenomenon where recently accessed memory locations are likely to be accessed again in the near future. This behavior is so pervasive in programs that it forms the theoretical basis for most cache replacement policies (LRU, for instance, directly exploits temporal locality).

Mathematical Formalization:

Let r(t) be the memory reference made at time t. Temporal locality can be expressed as:

P(r(t+Δ) = r(t)) > P(r(t+Δ) = x) for any random location x

In other words, the probability that we'll access the same location shortly after accessing it is higher than the probability of accessing any random location. This probability typically follows a roughly exponential decay curve—the longer since we accessed a location, the less likely we are to access it again.

temporal_locality_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Example 1: Loop variables exhibit extreme temporal locality
// The variable 'i' is accessed on every iteration
void loop_temporal_locality() {
    int sum = 0;        // Accessed 1001 times (initialization + all iterations)
    for (int i = 0; i < 1000; i++) {  // 'i' accessed ~3000 times
        sum += i;       // sum accessed every iteration
    }
    // During execution:
    // - Instruction at 'for' statement: accessed 1000+ times
    // - 'i' variable: accessed ~3000 times (init, compare, increment)
    // - 'sum' variable: accessed ~1001 times
    // This creates EXTREME temporal locality for these locations
}
 
// Example 2: Function calls create temporal locality in instruction space
void frequently_called_function(int x) {
    // These instructions are accessed every time function is called
    static int call_count = 0;  // Strong temporal locality
    call_count++;
    // Function code exhibits temporal locality through repeated calls
}
 
void caller() {
    for (int i = 0; i < 10000; i++) {
        frequently_called_function(i);  // Code of this function accessed 10000 times
    }
}
 
// Example 3: Working with a small active set of variables
void matrix_multiply(int n, double *A, double *B, double *C) {
    // 'n' is accessed in every loop iteration (temporal locality)
    // Loop indices i, j, k have extreme temporal locality
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            double sum = 0.0;  // sum accessed n times for each (i,j) pair
            for (int k = 0; k < n; k++) {
                sum += A[i*n + k] * B[k*n + j];
            }
            C[i*n + j] = sum;
        }
    }
    // Variables n, i, j, k, sum have MUCH stronger temporal locality
    // than the matrix elements themselves
}

Temporal Locality Decay

Temporal locality isn't binary—it decays over time. A location accessed 10 instructions ago is more likely to be accessed again than one accessed 1000 instructions ago. This decay rate varies by program phase and workload type. Understanding this decay is crucial for tuning working set windows and cache replacement policies.

Why Temporal Locality Matters for Working Sets:

Temporal locality is the theoretical justification for the working set model. If programs didn't exhibit temporal locality—if they truly accessed memory randomly—then predicting future memory needs from past behavior would be impossible. The working set model assumes that pages accessed in the recent past will likely be accessed in the near future.

The Working Set Connection:

The working set at time t with window Δ is defined as the set of pages accessed in the interval [t-Δ, t]. This definition directly exploits temporal locality: pages in the working set were accessed recently, and temporal locality tells us they're likely to be accessed again soon. The stronger the temporal locality in a program, the more accurate the working set model's predictions, and the more effective working set-based memory management becomes.

Spatial Locality: Exploiting Proximity

Spatial locality is the tendency for programs to access memory locations that are close to recently accessed locations. This behavior emerges from fundamental aspects of program structure and data organization.

Mathematical Formalization:

Let addr(t) be the address accessed at time t. Spatial locality can be expressed as:

P(|addr(t+1) - addr(t)| < d) > P(|addr(t+1) - x| < d) for any random x

The probability that our next access is within distance d of our current access is higher than the probability of being within distance d of any random location. This probability typically decreases as d increases—accesses are most likely to be very close, less likely to be somewhat close, and least likely to be far away.

High Spatial Locality Patterns

•Sequential array access: for(i) a[i] accesses consecutive elements
•Structure traversal: Accessing fields of a struct accesses consecutive memory
•Sequential file reading: Reading bytes in order, mapped to consecutive pages
•Stack operations: Push/pop operate on consecutive stack memory
•String processing: Characters processed in sequence

Low Spatial Locality Patterns

•Pointer chasing: p = p->next jumps unpredictably through memory
•Hash table access: Hash values scatter accesses across table
•Column-major access in row-major array: Stride equals row size
•Random memory access: Deliberately random patterns
•Large stride access: for(i) a[i*1000] skips large regions

spatial_locality_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// Example 1: Excellent spatial locality - sequential access
void sum_array_good(int *arr, int n) {
    long sum = 0;
    for (int i = 0; i < n; i++) {
        sum += arr[i];  // Accesses arr[0], arr[1], arr[2], ...
    }                   // Each access is 4 bytes from the previous
    // Result: Each cache line (64 bytes) serves 16 consecutive accesses
    // Page (4KB) serves 1024 consecutive accesses
}
 
// Example 2: Terrible spatial locality - strided access
void sum_array_bad(int *arr, int n, int stride) {
    long sum = 0;
    for (int i = 0; i < n; i++) {
        sum += arr[i * stride];  // Accesses arr[0], arr[stride], arr[2*stride], ...
    }
    // If stride = 1024, each access is 4KB apart
    // Each access potentially hits a different cache line AND different page
    // Result: Catastrophic cache and TLB miss rates
}
 
// Example 3: Matrix traversal - row vs column order
#define N 1000
double matrix[N][N];  // Row-major storage in C
 
// GOOD: Row-major access matches storage layout
double sum_row_major() {
    double sum = 0;
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            sum += matrix[i][j];  // Consecutive memory access
        }
    }
    return sum;  // ~1 million accesses, ~1000 page accesses if 4KB pages
}
 
// BAD: Column-major access fights storage layout
double sum_column_major() {
    double sum = 0;
    for (int j = 0; j < N; j++) {
        for (int i = 0; i < N; i++) {
            sum += matrix[i][j];  // Stride of N elements = 8000 bytes
        }
    }
    return sum;  // ~1 million accesses, potentially 1 million page accesses!
}
 
// Example 4: Structure layout affects spatial locality
struct BadLayout {
    char flags;           // 1 byte
    double values[1000];  // 8000 bytes
    char more_flags;      // 1 byte
};
// If you iterate over array of BadLayouts accessing only flags,
// you jump 8002 bytes between each access - terrible spatial locality
 
struct GoodLayout {
    char flags;
    char more_flags;
    // padding to alignment
    double values[1000];
};
// If struct array designed for access pattern, spatial locality improves

Spatial Locality and Page Size

Spatial locality interacts critically with page size. A 4KB page contains 1024 integers or 512 doubles. If your access pattern exhibits spatial locality at scales smaller than the page size, you benefit from TLB efficiency. If your stride exceeds the page size, each access may require a TLB miss and potentially a page fault. This is why large stride access patterns can devastate virtual memory performance even when all data fits in physical memory.

Locality Patterns in Real Programs

Understanding theoretical locality is important, but recognizing how locality manifests in real programs is essential for systems practitioners. Different program phases exhibit dramatically different locality characteristics.

Program Phases and Locality:

Programs don't exhibit uniform locality throughout their execution. Instead, they transition through phases, each with distinct memory access patterns:

Initialization Phase: Programs load configuration, initialize data structures, and set up state. Access patterns are often sequential (high spatial locality) but one-time (low temporal locality for specific addresses).
Main Computation Phase: The core algorithm executes. Locality characteristics depend heavily on the algorithm—matrix algorithms show different patterns than graph algorithms.
Transition/Shuffle Phases: When switching tasks or processing new inputs, programs may exhibit temporary poor locality as they abandon one working set and build another.
Cleanup Phase: Memory deallocation, file closing, and shutdown. Often exhibits spatial locality through data structures but low temporal reuse.

Locality Characteristics by Workload Type
Workload Type	Temporal Locality	Spatial Locality	Working Set Size
Numerical/Scientific	High (loop iterations)	Very High (matrix/array access)	Predictable, often large
Database Query	Medium (index reuse)	Variable (depends on query)	Dynamic, query-dependent
Compiler	High within phases	High (sequential parsing)	Phase-dependent
Web Server	High (hot resources)	Medium (scattered requests)	Depends on caching
Graph Algorithms	Low (each node visited once)	Low (pointer chasing)	Unpredictable
Text Editor	Very High (cursor region)	High (text buffers)	Small and stable
Machine Learning	High (batch iterations)	Very High (tensor ops)	Model-dependent, often huge

Locality and Algorithm Design

Algorithm designers increasingly consider locality as a first-class design criterion. Cache-oblivious algorithms, recursive blocked algorithms, and space-filling curve orderings are all techniques to improve locality. A theoretically slower algorithm with better locality often outperforms a theoretically faster algorithm with poor locality on real hardware.

Measuring Locality in Practice:

Locality can be quantified through several metrics:

Working Set Size: The number of distinct pages accessed in a time window. Smaller working sets indicate more reuse (temporal locality within that set).

Reuse Distance: For a memory access, how many other memory accesses occur before this address is accessed again. Smaller reuse distances indicate stronger temporal locality.

Stride Distribution: Histogram of distances between consecutive accesses. Concentration near zero indicates strong spatial locality.

Miss Rate Curves: Plot cache miss rate vs. cache size. Steeper curves indicate stronger locality—the program benefits greatly from each additional bit of cache.

locality_measurement.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# Conceptual implementation of locality metrics
 
class LocalityAnalyzer:
    """Analyze memory access trace for locality characteristics."""
    
    def __init__(self, page_size=4096):
        self.page_size = page_size
        self.trace = []  # List of (timestamp, address)
        
    def record_access(self, timestamp, address):
        """Record a memory access."""
        self.trace.append((timestamp, address))
    
    def compute_working_set(self, window_size):
        """
        Compute working set size over time with given window.
        Returns list of (time, working_set_size) pairs.
        
        This directly implements Denning's working set definition:
        W(t, Δ) = {pages accessed in interval [t-Δ, t]}
        """
        results = []
        for i, (t, addr) in enumerate(self.trace):
            # Find all accesses in window [t - window_size, t]
            window_start = t - window_size
            pages_in_window = set()
            
            for j in range(i, -1, -1):
                access_time, access_addr = self.trace[j]
                if access_time < window_start:
                    break
                page = access_addr // self.page_size
                pages_in_window.add(page)
            
            results.append((t, len(pages_in_window)))
        
        return results
    
    def compute_reuse_distance(self):
        """
        Compute reuse distance distribution.
        Reuse distance = number of distinct addresses accessed since last use.
        
        Lower reuse distances indicate stronger temporal locality.
        """
        last_access = {}  # address -> trace position
        distinct_since = {}  # address -> set of addresses since last access
        reuse_distances = []
        
        access_stack = []  # Stack to track unique accesses
        
        for i, (t, addr) in enumerate(self.trace):
            page = addr // self.page_size
            
            if page in last_access:
                # Find how many distinct pages accessed since last use
                # This is the reuse distance (LRU stack depth)
                if page in access_stack:
                    distance = access_stack.index(page)
                    reuse_distances.append(distance)
                    access_stack.remove(page)
            else:
                reuse_distances.append(float('inf'))  # First access
            
            access_stack.insert(0, page)  # Move to top of stack
            last_access[page] = i
        
        return reuse_distances
    
    def compute_stride_distribution(self):
        """
        Compute distribution of strides (distances between consecutive accesses).
        
        Concentration near zero indicates strong spatial locality.
        """
        strides = []
        for i in range(1, len(self.trace)):
            prev_addr = self.trace[i-1][1]
            curr_addr = self.trace[i][1]
            stride = abs(curr_addr - prev_addr)
            strides.append(stride)
        
        return strides
    
    def summarize_locality(self):
        """Generate locality summary statistics."""
        strides = self.compute_stride_distribution()
        reuse = self.compute_reuse_distance()
        
        # Spatial locality: fraction of strides within one cache line
        cache_line = 64
        spatial_score = sum(1 for s in strides if s < cache_line) / len(strides)
        
        # Temporal locality: fraction of reuses within small distance
        finite_reuse = [r for r in reuse if r != float('inf')]
        if finite_reuse:
            temporal_score = sum(1 for r in finite_reuse if r < 100) / len(finite_reuse)
        else:
            temporal_score = 0
        
        return {
            'spatial_locality_score': spatial_score,  # 0-1, higher = better
            'temporal_locality_score': temporal_score,  # 0-1, higher = better
            'average_stride': sum(strides) / len(strides),
            'median_reuse_distance': sorted(finite_reuse)[len(finite_reuse)//2] if finite_reuse else None,
            'unique_pages': len(set(addr // self.page_size for _, addr in self.trace))
        }

Locality and the Memory Hierarchy

The principle of locality justifies the entire modern memory hierarchy. Without locality, the fundamental assumption underlying caches, TLBs, and virtual memory would collapse.

The Memory Hierarchy Depends on Locality:

Modern computer systems use a hierarchy of memories with different speeds and sizes:

Level	Typical Size	Typical Latency	Role
Registers	1 KB	0 cycles	Immediate operands
L1 Cache	32-64 KB	4 cycles	Hot data/instructions
L2 Cache	256-512 KB	12 cycles	Warm data
L3 Cache	8-64 MB	40 cycles	Shared warm data
Main Memory	16-256 GB	200 cycles	All resident data
SSD/Swap	256+ GB	50,000+ cycles	Non-resident data

Without locality, this hierarchy collapses. If programs accessed memory randomly, each level of cache would provide no benefit—miss rates would approach 100%, and effective memory access time would approach the slowest tier.

The Economic Argument

Fast memory is expensive. SRAM (used in caches) costs ~100x more per byte than DRAM. If we built all memory from SRAM, a 32GB computer would cost tens of thousands of dollars more. Locality lets us use a small amount of fast memory to serve most accesses, achieving near-SRAM performance at near-DRAM prices. This is the economic miracle that locality enables.

Locality Exploitation at Each Level:

CPU Caches: Exploit both temporal locality (keeping recently accessed cache lines) and spatial locality (loading entire 64-byte cache lines, even when only one word is needed). LRU-like replacement assumes temporal locality.

TLB (Translation Lookaside Buffer): Exploits temporal locality in page table lookups. Without locality, every memory access would require multiple memory accesses for page table walking—a 4-5x slowdown.

Virtual Memory / Paging: The entire concept assumes temporal locality. Pages accessed recently are kept in physical memory; pages not accessed recently are candidates for eviction. The working set model formalizes this exploitation.

Prefetching: Exploits spatial locality by predicting future accesses based on current access patterns. Sequential prefetch brings in adjacent cache lines or pages before they're explicitly requested.

Memory-Mapped I/O: Assumes spatial locality in file access. Reading one byte of a memory-mapped file loads an entire page, betting that nearby bytes will be accessed soon.

Quantified Impact of Locality

•Cache hit rates: Well-designed programs achieve 95-99% L1 cache hit rates, meaning only 1-5% of accesses pay the full memory latency
•TLB hit rates: Typical programs achieve 99%+ TLB hit rates, making virtual memory nearly free in the common case
•Page fault rates: Well-tuned systems keep page fault rates below 0.001% of memory accesses
•Effective CPI impact: Without locality exploitation, CPI (cycles per instruction) would increase from ~1 to ~100+
•Speedup from hierarchy: Locality exploitation provides roughly 100x speedup vs. flat memory systems

From Locality to Working Sets

The working set model is a direct formalization of temporal locality for memory management. Understanding this connection is essential for grasping why working set-based policies work and when they fail.

The Key Insight:

Peter Denning's working set model makes one fundamental observation: temporal locality means that pages accessed in the recent past predict pages needed in the near future. If we keep exactly those pages resident, we prevent page faults without wasting memory on unused pages.

Formal Connection:

Let W(t, Δ) be the working set at time t with window Δ:

W(t, Δ) = {p : page p was accessed in interval [t-Δ, t]}

Temporality locality implies:

P(next access ∈ W(t, Δ)) >> P(next access ∉ W(t, Δ))

The stronger the temporal locality, the higher the probability that our next access will be within our working set. This probability determines page fault rate under working set management.

The Working Set Bet

The working set model is fundamentally a bet: we're betting that pages accessed recently will be accessed again soon. Temporal locality tells us this bet usually wins. The window parameter Δ adjusts how far back we look—a larger window captures more temporal locality but costs more memory. The optimal Δ balances fault rate reduction against memory consumption.

Why Window Size Matters:

The working set window Δ must be chosen to match the temporal locality characteristics of the workload:

Window too small (Δ << locality scale):

Working set misses pages that will be reused
Premature eviction causes unnecessary page faults
Process thrashes even with sufficient memory

Window too large (Δ >> locality scale):

Working set contains pages that won't be reused
Memory wasted on inactive pages
Fewer processes can run concurrently (lower multiprogramming)

Window matched to locality:

Working set captures pages likely to be reused
Page faults minimized without memory waste
Optimal multiprogramming level achieved

Phase Transitions and Locality Shifts:

Programs transition between phases with different locality characteristics. When a program shifts to a new phase, its working set changes. The working set model naturally adapts—old pages age out of the window, new pages are added. However, during transitions, page fault rates may spike temporarily until the new working set stabilizes.

locality_to_working_set.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# Demonstration: Locality strength determines working set effectiveness
 
import random
from collections import defaultdict
 
class WorkingSetSimulator:
    """
    Simulate working set model effectiveness under different locality conditions.
    """
    
    def __init__(self, num_pages, memory_frames, window_delta):
        self.num_pages = num_pages
        self.memory_frames = memory_frames
        self.window_delta = window_delta
        self.access_history = []  # Circular buffer of recent page accesses
        self.resident_pages = set()
        self.page_faults = 0
        self.total_accesses = 0
        
    def generate_high_locality_trace(self, length, hot_pages=10):
        """
        Generate trace with strong temporal locality.
        Most accesses go to a small set of 'hot' pages.
        """
        hot_set = list(range(hot_pages))
        trace = []
        for _ in range(length):
            if random.random() < 0.9:  # 90% accesses to hot pages
                trace.append(random.choice(hot_set))
            else:  # 10% accesses to random pages
                trace.append(random.randint(0, self.num_pages - 1))
        return trace
    
    def generate_low_locality_trace(self, length):
        """
        Generate trace with poor temporal locality.
        Accesses uniformly distributed across all pages.
        """
        return [random.randint(0, self.num_pages - 1) for _ in range(length)]
    
    def compute_working_set(self, current_time):
        """Compute working set from recent history."""
        working_set = set()
        window_start = max(0, current_time - self.window_delta)
        
        for t in range(window_start, current_time + 1):
            if t < len(self.access_history):
                working_set.add(self.access_history[t])
        
        return working_set
    
    def access_page(self, page, current_time):
        """Process a page access using working set policy."""
        self.total_accesses += 1
        self.access_history.append(page)
        
        # Check if page is resident
        if page not in self.resident_pages:
            self.page_faults += 1
            
            # Compute current working set
            ws = self.compute_working_set(current_time)
            
            # Make room if needed
            if len(self.resident_pages) >= self.memory_frames:
                # Evict pages not in working set
                pages_to_evict = self.resident_pages - ws
                if pages_to_evict:
                    # Remove one page not in working set
                    self.resident_pages.remove(pages_to_evict.pop())
                else:
                    # All pages are in working set - need more memory or evict LRU
                    # (In true working set model, this indicates insufficient memory)
                    oldest = min(self.resident_pages, 
                                key=lambda p: self.access_history[::-1].index(p) 
                                if p in self.access_history else float('inf'))
                    self.resident_pages.remove(oldest)
            
            self.resident_pages.add(page)
    
    def simulate(self, trace):
        """Run simulation on given trace."""
        for t, page in enumerate(trace):
            self.access_page(page, t)
        
        return {
            'page_faults': self.page_faults,
            'total_accesses': self.total_accesses,
            'fault_rate': self.page_faults / self.total_accesses,
            'final_working_set_size': len(self.compute_working_set(len(trace) - 1))
        }
 
# Compare high vs low locality workloads
def compare_locality_impact():
    """
    Demonstrate that working set model works well BECAUSE of locality.
    """
    print("Comparing Working Set Effectiveness vs Locality Strength")
    print("=" * 60)
    print(f"Pages: 1000, Memory Frames: 50, Window: 100 accesses")
    print(f"Trace length: 10000 accesses")
    print()
    
    # High locality workload
    sim_high = WorkingSetSimulator(num_pages=1000, memory_frames=50, window_delta=100)
    trace_high = sim_high.generate_high_locality_trace(10000, hot_pages=20)
    result_high = sim_high.simulate(trace_high)
    
    # Low locality workload
    sim_low = WorkingSetSimulator(num_pages=1000, memory_frames=50, window_delta=100)
    trace_low = sim_low.generate_low_locality_trace(10000)
    result_low = sim_low.simulate(trace_low)
    
    print("HIGH LOCALITY (90% to 20 hot pages):")
    print(f"  Page faults: {result_high['page_faults']}")
    print(f"  Fault rate: {result_high['fault_rate']:.4f}")
    print()
    
    print("LOW LOCALITY (uniform random):")
    print(f"  Page faults: {result_low['page_faults']}")
    print(f"  Fault rate: {result_low['fault_rate']:.4f}")
    print()
    
    print("CONCLUSION:")
    print(f"  High locality fault rate is {result_low['fault_rate']/result_high['fault_rate']:.1f}x better")
    print(f"  Working set model effectiveness DEPENDS on locality presence!")
 
# Output example:
# HIGH LOCALITY: Fault rate ~0.02 (2%)
# LOW LOCALITY: Fault rate ~0.95 (95%)  
# Working set works because locality exists!

Summary and Looking Ahead

The principle of locality is the empirical foundation upon which the entire working set model rests. Without locality, predicting future memory needs from past behavior would be futile. With locality, it becomes not only possible but highly effective.

Key Takeaways:

Core Concepts Mastered

•Temporal locality: Recently accessed locations are likely to be accessed again soon—the foundation for the working set concept
•Spatial locality: Nearby locations are likely to be accessed together—exploited by page-based memory management
•Locality varies: Different programs, algorithms, and phases exhibit different locality characteristics
•Locality enables the memory hierarchy: Without locality, caches, TLBs, and virtual memory would provide no benefit
•Working set = formalized temporal locality: The working set definition directly captures the temporal locality of a process

Page Complete

You now understand the principle of locality—the theoretical foundation of the working set model. In the next page, we'll formally define the working set and explore how this abstract concept translates into a concrete memory management mechanism.

Next Up: Working Set Definition

With locality understood, we're ready to formalize the working set concept. We'll develop Denning's definition of W(t, Δ), explore its mathematical properties, and understand how this abstraction maps to memory management decisions. The working set definition transforms the vague notion of 'pages the process is using' into a precise, measurable, and actionable concept.

1 / 5

Loading learning content...

Operating SystemsWorking Set Model

Working Set Model

LevelIntermediate

Duration75 mins

TopicWorking Set Model

1 / 5

Locality Principle

The Empirical Foundation of Memory Management

What You Will Learn

What is Locality?

Definition (Principle of Locality):

Programs tend to reuse data and instructions they have used recently, or to access data near recently accessed locations. This tendency manifests as two distinct patterns: temporal locality and spatial locality.

Why Locality Exists

The Two Forms of Locality:

Temporal Locality (Locality in Time): If a memory location is accessed, it is likely to be accessed again soon. This occurs because of:

Loop iterations that execute the same instructions repeatedly
Frequently called functions whose code is accessed over and over
Variables that are read and written multiple times during computation
Stack operations that repeatedly access top-of-stack locations

Spatial Locality (Locality in Space): If a memory location is accessed, nearby locations are likely to be accessed soon. This occurs because of:

Sequential instruction execution (fetching consecutive instructions)
Array traversals that access consecutive elements
Structure traversals that access consecutive fields
Stack growth that accesses consecutive stack frames

Temporal vs Spatial Locality Comparison
Aspect	Temporal Locality	Spatial Locality
Definition	Same location accessed repeatedly	Nearby locations accessed together
Time relationship	Reuse over time	Access pattern in address space
Primary cause	Loops, function calls, variable reuse	Sequential execution, arrays, structs
Cache exploitation	Keep recently used data in cache	Fetch entire cache lines
VM exploitation	Keep recently accessed pages resident	Prefetch adjacent pages
Example	Loop counter variable `i`	Array elements `a[0], a[1], a[2]...`

Temporal Locality: Exploiting Recency

Mathematical Formalization:

Let r(t) be the memory reference made at time t. Temporal locality can be expressed as:

P(r(t+Δ) = r(t)) > P(r(t+Δ) = x) for any random location x

temporal_locality_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Example 1: Loop variables exhibit extreme temporal locality
// The variable 'i' is accessed on every iteration
void loop_temporal_locality() {
    int sum = 0;        // Accessed 1001 times (initialization + all iterations)
    for (int i = 0; i < 1000; i++) {  // 'i' accessed ~3000 times
        sum += i;       // sum accessed every iteration
    }
    // During execution:
    // - Instruction at 'for' statement: accessed 1000+ times
    // - 'i' variable: accessed ~3000 times (init, compare, increment)
    // - 'sum' variable: accessed ~1001 times
    // This creates EXTREME temporal locality for these locations
}
 
// Example 2: Function calls create temporal locality in instruction space
void frequently_called_function(int x) {
    // These instructions are accessed every time function is called
    static int call_count = 0;  // Strong temporal locality
    call_count++;
    // Function code exhibits temporal locality through repeated calls
}
 
void caller() {
    for (int i = 0; i < 10000; i++) {
        frequently_called_function(i);  // Code of this function accessed 10000 times
    }
}
 
// Example 3: Working with a small active set of variables
void matrix_multiply(int n, double *A, double *B, double *C) {
    // 'n' is accessed in every loop iteration (temporal locality)
    // Loop indices i, j, k have extreme temporal locality
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            double sum = 0.0;  // sum accessed n times for each (i,j) pair
            for (int k = 0; k < n; k++) {
                sum += A[i*n + k] * B[k*n + j];
            }
            C[i*n + j] = sum;
        }
    }
    // Variables n, i, j, k, sum have MUCH stronger temporal locality
    // than the matrix elements themselves
}

Temporal Locality Decay

Why Temporal Locality Matters for Working Sets:

The Working Set Connection:

Spatial Locality: Exploiting Proximity

Mathematical Formalization:

Let addr(t) be the address accessed at time t. Spatial locality can be expressed as:

P(|addr(t+1) - addr(t)| < d) > P(|addr(t+1) - x| < d) for any random x

High Spatial Locality Patterns

•Sequential array access: for(i) a[i] accesses consecutive elements
•Structure traversal: Accessing fields of a struct accesses consecutive memory
•Sequential file reading: Reading bytes in order, mapped to consecutive pages
•Stack operations: Push/pop operate on consecutive stack memory
•String processing: Characters processed in sequence

Low Spatial Locality Patterns

•Pointer chasing: p = p->next jumps unpredictably through memory
•Hash table access: Hash values scatter accesses across table
•Column-major access in row-major array: Stride equals row size
•Random memory access: Deliberately random patterns
•Large stride access: for(i) a[i*1000] skips large regions

spatial_locality_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// Example 1: Excellent spatial locality - sequential access
void sum_array_good(int *arr, int n) {
    long sum = 0;
    for (int i = 0; i < n; i++) {
        sum += arr[i];  // Accesses arr[0], arr[1], arr[2], ...
    }                   // Each access is 4 bytes from the previous
    // Result: Each cache line (64 bytes) serves 16 consecutive accesses
    // Page (4KB) serves 1024 consecutive accesses
}
 
// Example 2: Terrible spatial locality - strided access
void sum_array_bad(int *arr, int n, int stride) {
    long sum = 0;
    for (int i = 0; i < n; i++) {
        sum += arr[i * stride];  // Accesses arr[0], arr[stride], arr[2*stride], ...
    }
    // If stride = 1024, each access is 4KB apart
    // Each access potentially hits a different cache line AND different page
    // Result: Catastrophic cache and TLB miss rates
}
 
// Example 3: Matrix traversal - row vs column order
#define N 1000
double matrix[N][N];  // Row-major storage in C
 
// GOOD: Row-major access matches storage layout
double sum_row_major() {
    double sum = 0;
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            sum += matrix[i][j];  // Consecutive memory access
        }
    }
    return sum;  // ~1 million accesses, ~1000 page accesses if 4KB pages
}
 
// BAD: Column-major access fights storage layout
double sum_column_major() {
    double sum = 0;
    for (int j = 0; j < N; j++) {
        for (int i = 0; i < N; i++) {
            sum += matrix[i][j];  // Stride of N elements = 8000 bytes
        }
    }
    return sum;  // ~1 million accesses, potentially 1 million page accesses!
}
 
// Example 4: Structure layout affects spatial locality
struct BadLayout {
    char flags;           // 1 byte
    double values[1000];  // 8000 bytes
    char more_flags;      // 1 byte
};
// If you iterate over array of BadLayouts accessing only flags,
// you jump 8002 bytes between each access - terrible spatial locality
 
struct GoodLayout {
    char flags;
    char more_flags;
    // padding to alignment
    double values[1000];
};
// If struct array designed for access pattern, spatial locality improves

Spatial Locality and Page Size

Locality Patterns in Real Programs

Program Phases and Locality:

Programs don't exhibit uniform locality throughout their execution. Instead, they transition through phases, each with distinct memory access patterns:

Initialization Phase: Programs load configuration, initialize data structures, and set up state. Access patterns are often sequential (high spatial locality) but one-time (low temporal locality for specific addresses).
Main Computation Phase: The core algorithm executes. Locality characteristics depend heavily on the algorithm—matrix algorithms show different patterns than graph algorithms.
Transition/Shuffle Phases: When switching tasks or processing new inputs, programs may exhibit temporary poor locality as they abandon one working set and build another.
Cleanup Phase: Memory deallocation, file closing, and shutdown. Often exhibits spatial locality through data structures but low temporal reuse.

Locality Characteristics by Workload Type
Workload Type	Temporal Locality	Spatial Locality	Working Set Size
Numerical/Scientific	High (loop iterations)	Very High (matrix/array access)	Predictable, often large
Database Query	Medium (index reuse)	Variable (depends on query)	Dynamic, query-dependent
Compiler	High within phases	High (sequential parsing)	Phase-dependent
Web Server	High (hot resources)	Medium (scattered requests)	Depends on caching
Graph Algorithms	Low (each node visited once)	Low (pointer chasing)	Unpredictable
Text Editor	Very High (cursor region)	High (text buffers)	Small and stable
Machine Learning	High (batch iterations)	Very High (tensor ops)	Model-dependent, often huge

Locality and Algorithm Design

Measuring Locality in Practice:

Locality can be quantified through several metrics:

Working Set Size: The number of distinct pages accessed in a time window. Smaller working sets indicate more reuse (temporal locality within that set).

Reuse Distance: For a memory access, how many other memory accesses occur before this address is accessed again. Smaller reuse distances indicate stronger temporal locality.

Stride Distribution: Histogram of distances between consecutive accesses. Concentration near zero indicates strong spatial locality.

Miss Rate Curves: Plot cache miss rate vs. cache size. Steeper curves indicate stronger locality—the program benefits greatly from each additional bit of cache.

locality_measurement.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# Conceptual implementation of locality metrics
 
class LocalityAnalyzer:
    """Analyze memory access trace for locality characteristics."""
    
    def __init__(self, page_size=4096):
        self.page_size = page_size
        self.trace = []  # List of (timestamp, address)
        
    def record_access(self, timestamp, address):
        """Record a memory access."""
        self.trace.append((timestamp, address))
    
    def compute_working_set(self, window_size):
        """
        Compute working set size over time with given window.
        Returns list of (time, working_set_size) pairs.
        
        This directly implements Denning's working set definition:
        W(t, Δ) = {pages accessed in interval [t-Δ, t]}
        """
        results = []
        for i, (t, addr) in enumerate(self.trace):
            # Find all accesses in window [t - window_size, t]
            window_start = t - window_size
            pages_in_window = set()
            
            for j in range(i, -1, -1):
                access_time, access_addr = self.trace[j]
                if access_time < window_start:
                    break
                page = access_addr // self.page_size
                pages_in_window.add(page)
            
            results.append((t, len(pages_in_window)))
        
        return results
    
    def compute_reuse_distance(self):
        """
        Compute reuse distance distribution.
        Reuse distance = number of distinct addresses accessed since last use.
        
        Lower reuse distances indicate stronger temporal locality.
        """
        last_access = {}  # address -> trace position
        distinct_since = {}  # address -> set of addresses since last access
        reuse_distances = []
        
        access_stack = []  # Stack to track unique accesses
        
        for i, (t, addr) in enumerate(self.trace):
            page = addr // self.page_size
            
            if page in last_access:
                # Find how many distinct pages accessed since last use
                # This is the reuse distance (LRU stack depth)
                if page in access_stack:
                    distance = access_stack.index(page)
                    reuse_distances.append(distance)
                    access_stack.remove(page)
            else:
                reuse_distances.append(float('inf'))  # First access
            
            access_stack.insert(0, page)  # Move to top of stack
            last_access[page] = i
        
        return reuse_distances
    
    def compute_stride_distribution(self):
        """
        Compute distribution of strides (distances between consecutive accesses).
        
        Concentration near zero indicates strong spatial locality.
        """
        strides = []
        for i in range(1, len(self.trace)):
            prev_addr = self.trace[i-1][1]
            curr_addr = self.trace[i][1]
            stride = abs(curr_addr - prev_addr)
            strides.append(stride)
        
        return strides
    
    def summarize_locality(self):
        """Generate locality summary statistics."""
        strides = self.compute_stride_distribution()
        reuse = self.compute_reuse_distance()
        
        # Spatial locality: fraction of strides within one cache line
        cache_line = 64
        spatial_score = sum(1 for s in strides if s < cache_line) / len(strides)
        
        # Temporal locality: fraction of reuses within small distance
        finite_reuse = [r for r in reuse if r != float('inf')]
        if finite_reuse:
            temporal_score = sum(1 for r in finite_reuse if r < 100) / len(finite_reuse)
        else:
            temporal_score = 0
        
        return {
            'spatial_locality_score': spatial_score,  # 0-1, higher = better
            'temporal_locality_score': temporal_score,  # 0-1, higher = better
            'average_stride': sum(strides) / len(strides),
            'median_reuse_distance': sorted(finite_reuse)[len(finite_reuse)//2] if finite_reuse else None,
            'unique_pages': len(set(addr // self.page_size for _, addr in self.trace))
        }

Locality and the Memory Hierarchy

The principle of locality justifies the entire modern memory hierarchy. Without locality, the fundamental assumption underlying caches, TLBs, and virtual memory would collapse.

The Memory Hierarchy Depends on Locality:

Modern computer systems use a hierarchy of memories with different speeds and sizes:

Level	Typical Size	Typical Latency	Role
Registers	1 KB	0 cycles	Immediate operands
L1 Cache	32-64 KB	4 cycles	Hot data/instructions
L2 Cache	256-512 KB	12 cycles	Warm data
L3 Cache	8-64 MB	40 cycles	Shared warm data
Main Memory	16-256 GB	200 cycles	All resident data
SSD/Swap	256+ GB	50,000+ cycles	Non-resident data

The Economic Argument

Locality Exploitation at Each Level:

Memory-Mapped I/O: Assumes spatial locality in file access. Reading one byte of a memory-mapped file loads an entire page, betting that nearby bytes will be accessed soon.

Quantified Impact of Locality

•Cache hit rates: Well-designed programs achieve 95-99% L1 cache hit rates, meaning only 1-5% of accesses pay the full memory latency
•TLB hit rates: Typical programs achieve 99%+ TLB hit rates, making virtual memory nearly free in the common case
•Page fault rates: Well-tuned systems keep page fault rates below 0.001% of memory accesses
•Effective CPI impact: Without locality exploitation, CPI (cycles per instruction) would increase from ~1 to ~100+
•Speedup from hierarchy: Locality exploitation provides roughly 100x speedup vs. flat memory systems

From Locality to Working Sets

The Key Insight:

Formal Connection:

Let W(t, Δ) be the working set at time t with window Δ:

W(t, Δ) = {p : page p was accessed in interval [t-Δ, t]}

Temporality locality implies:

P(next access ∈ W(t, Δ)) >> P(next access ∉ W(t, Δ))

The stronger the temporal locality, the higher the probability that our next access will be within our working set. This probability determines page fault rate under working set management.

The Working Set Bet

Why Window Size Matters:

The working set window Δ must be chosen to match the temporal locality characteristics of the workload:

Window too small (Δ << locality scale):

Working set misses pages that will be reused
Premature eviction causes unnecessary page faults
Process thrashes even with sufficient memory

Window too large (Δ >> locality scale):

Working set contains pages that won't be reused
Memory wasted on inactive pages
Fewer processes can run concurrently (lower multiprogramming)

Window matched to locality:

Working set captures pages likely to be reused
Page faults minimized without memory waste
Optimal multiprogramming level achieved

Phase Transitions and Locality Shifts:

locality_to_working_set.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# Demonstration: Locality strength determines working set effectiveness
 
import random
from collections import defaultdict
 
class WorkingSetSimulator:
    """
    Simulate working set model effectiveness under different locality conditions.
    """
    
    def __init__(self, num_pages, memory_frames, window_delta):
        self.num_pages = num_pages
        self.memory_frames = memory_frames
        self.window_delta = window_delta
        self.access_history = []  # Circular buffer of recent page accesses
        self.resident_pages = set()
        self.page_faults = 0
        self.total_accesses = 0
        
    def generate_high_locality_trace(self, length, hot_pages=10):
        """
        Generate trace with strong temporal locality.
        Most accesses go to a small set of 'hot' pages.
        """
        hot_set = list(range(hot_pages))
        trace = []
        for _ in range(length):
            if random.random() < 0.9:  # 90% accesses to hot pages
                trace.append(random.choice(hot_set))
            else:  # 10% accesses to random pages
                trace.append(random.randint(0, self.num_pages - 1))
        return trace
    
    def generate_low_locality_trace(self, length):
        """
        Generate trace with poor temporal locality.
        Accesses uniformly distributed across all pages.
        """
        return [random.randint(0, self.num_pages - 1) for _ in range(length)]
    
    def compute_working_set(self, current_time):
        """Compute working set from recent history."""
        working_set = set()
        window_start = max(0, current_time - self.window_delta)
        
        for t in range(window_start, current_time + 1):
            if t < len(self.access_history):
                working_set.add(self.access_history[t])
        
        return working_set
    
    def access_page(self, page, current_time):
        """Process a page access using working set policy."""
        self.total_accesses += 1
        self.access_history.append(page)
        
        # Check if page is resident
        if page not in self.resident_pages:
            self.page_faults += 1
            
            # Compute current working set
            ws = self.compute_working_set(current_time)
            
            # Make room if needed
            if len(self.resident_pages) >= self.memory_frames:
                # Evict pages not in working set
                pages_to_evict = self.resident_pages - ws
                if pages_to_evict:
                    # Remove one page not in working set
                    self.resident_pages.remove(pages_to_evict.pop())
                else:
                    # All pages are in working set - need more memory or evict LRU
                    # (In true working set model, this indicates insufficient memory)
                    oldest = min(self.resident_pages, 
                                key=lambda p: self.access_history[::-1].index(p) 
                                if p in self.access_history else float('inf'))
                    self.resident_pages.remove(oldest)
            
            self.resident_pages.add(page)
    
    def simulate(self, trace):
        """Run simulation on given trace."""
        for t, page in enumerate(trace):
            self.access_page(page, t)
        
        return {
            'page_faults': self.page_faults,
            'total_accesses': self.total_accesses,
            'fault_rate': self.page_faults / self.total_accesses,
            'final_working_set_size': len(self.compute_working_set(len(trace) - 1))
        }
 
# Compare high vs low locality workloads
def compare_locality_impact():
    """
    Demonstrate that working set model works well BECAUSE of locality.
    """
    print("Comparing Working Set Effectiveness vs Locality Strength")
    print("=" * 60)
    print(f"Pages: 1000, Memory Frames: 50, Window: 100 accesses")
    print(f"Trace length: 10000 accesses")
    print()
    
    # High locality workload
    sim_high = WorkingSetSimulator(num_pages=1000, memory_frames=50, window_delta=100)
    trace_high = sim_high.generate_high_locality_trace(10000, hot_pages=20)
    result_high = sim_high.simulate(trace_high)
    
    # Low locality workload
    sim_low = WorkingSetSimulator(num_pages=1000, memory_frames=50, window_delta=100)
    trace_low = sim_low.generate_low_locality_trace(10000)
    result_low = sim_low.simulate(trace_low)
    
    print("HIGH LOCALITY (90% to 20 hot pages):")
    print(f"  Page faults: {result_high['page_faults']}")
    print(f"  Fault rate: {result_high['fault_rate']:.4f}")
    print()
    
    print("LOW LOCALITY (uniform random):")
    print(f"  Page faults: {result_low['page_faults']}")
    print(f"  Fault rate: {result_low['fault_rate']:.4f}")
    print()
    
    print("CONCLUSION:")
    print(f"  High locality fault rate is {result_low['fault_rate']/result_high['fault_rate']:.1f}x better")
    print(f"  Working set model effectiveness DEPENDS on locality presence!")
 
# Output example:
# HIGH LOCALITY: Fault rate ~0.02 (2%)
# LOW LOCALITY: Fault rate ~0.95 (95%)  
# Working set works because locality exists!

Summary and Looking Ahead

Key Takeaways:

Core Concepts Mastered

•Temporal locality: Recently accessed locations are likely to be accessed again soon—the foundation for the working set concept
•Spatial locality: Nearby locations are likely to be accessed together—exploited by page-based memory management
•Locality varies: Different programs, algorithms, and phases exhibit different locality characteristics
•Locality enables the memory hierarchy: Without locality, caches, TLBs, and virtual memory would provide no benefit
•Working set = formalized temporal locality: The working set definition directly captures the temporal locality of a process

Page Complete

Next Up: Working Set Definition

1 / 5