Operating SystemsGlobal vs Local Replacement

Global vs Local Replacement

LevelIntermediate

Duration60 mins

TopicGlobal vs Local Replacement

4 / 5

Performance Implications

The Performance Trade-off Landscape

The choice between global and local replacement is not simply a technical decision—it fundamentally shapes how a system behaves under load. This choice affects throughput, latency, fairness, predictability, and scalability in complex, sometimes counterintuitive ways.

Understanding these performance implications is essential for system designers who must balance competing objectives: maximizing efficiency while ensuring fairness, optimizing throughput while maintaining predictable latency, and scaling effectively while controlling resource interference.

What You Will Learn

By the end of this page, you will understand how replacement scope affects key performance metrics, the trade-offs inherent in each approach, how to quantify performance differences, and frameworks for evaluating which approach suits specific workloads and requirements.

Throughput Analysis

Throughput—the total amount of useful work completed per unit time—is often the primary metric for evaluating system efficiency. Global and local replacement have fundamentally different throughput characteristics.

Global replacement typically maximizes throughput because it allows memory to flow to wherever demand exists. No frames sit idle while other processes need them. The entire physical memory pool is utilized for active work.

Local replacement may reduce throughput because statically partitioned memory can lead to waste. If Process A has 50 frames but only needs 30, those 20 excess frames cannot help Process B that needs 70 frames but only has 50.

Throughput Comparison: Global vs Local
Workload Scenario	Global Throughput	Local Throughput	Difference
Homogeneous (all processes similar)	100%	~95%	Small gap, both work well
Heterogeneous (varying demands)	100%	~70-85%	Significant waste with local
Bursty (demand spikes)	100%	~60-80%	Local cannot adapt to bursts
Memory overcommit scenario	Depends*	Harder to overcommit	Global enables overcommit
Single memory-intensive process	High	Limited by partition	Global lets it use all memory

throughput_simulation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
"""
Throughput Simulation: Global vs Local Replacement
 
Demonstrates how memory distribution affects total work completed.
"""
 
import random
 
class ThroughputSimulator:
    """
    Simulates system throughput under different replacement policies
    """
    
    def __init__(self, total_frames, num_processes):
        self.total_frames = total_frames
        self.num_processes = num_processes
        
        # Each process has varying working set size
        self.working_sets = [
            random.randint(20, 80) for _ in range(num_processes)
        ]
        
    def simulate_global(self, time_steps):
        """
        Global replacement: memory flows to where needed.
        Each process gets frames proportional to demand.
        """
        total_work = 0
        
        # Dynamic allocation based on demand
        for t in range(time_steps):
            total_demand = sum(self.working_sets)
            
            for i in range(self.num_processes):
                # Frames allocated proportionally to demand
                frames = int((self.working_sets[i] / total_demand) * self.total_frames)
                frames = min(frames, self.working_sets[i])  # Cap at WSS
                
                # Work completed is function of frames vs WSS
                efficiency = self._compute_efficiency(frames, self.working_sets[i])
                work = efficiency * 10  # 10 units per time step at 100% efficiency
                total_work += work
        
        return total_work
    
    def simulate_local(self, time_steps):
        """
        Local replacement: each process gets equal fixed allocation.
        No adaptation to actual demand.
        """
        total_work = 0
        
        # Static equal allocation
        frames_per_process = self.total_frames // self.num_processes
        
        for t in range(time_steps):
            for i in range(self.num_processes):
                frames = min(frames_per_process, self.working_sets[i])
                
                efficiency = self._compute_efficiency(frames, self.working_sets[i])
                work = efficiency * 10
                total_work += work
        
        return total_work
    
    def _compute_efficiency(self, frames, working_set):
        """
        Non-linear efficiency curve:
        - At 100% WSS: 100% efficiency
        - At 80% WSS: ~90% efficiency
        - At 50% WSS: ~50% efficiency
        - Below 40% WSS: efficiency drops rapidly
        """
        ratio = frames / working_set if working_set > 0 else 1.0
        
        if ratio >= 1.0:
            return 1.0
        elif ratio >= 0.8:
            return 0.9 + (ratio - 0.8) * 0.5
        elif ratio >= 0.5:
            return 0.5 + (ratio - 0.5) * (0.4 / 0.3)
        elif ratio >= 0.4:
            return 0.3 + (ratio - 0.4) * 2.0
        else:
            return ratio * 0.75  # Severe thrashing below 40%
    
    def run_comparison(self):
        """Run simulation and compare policies"""
        print("=" * 60)
        print("THROUGHPUT SIMULATION: GLOBAL vs LOCAL REPLACEMENT")
        print("=" * 60)
        print(f"\nConfiguration:")
        print(f"  Total frames: {self.total_frames}")
        print(f"  Processes: {self.num_processes}")
        print(f"  Working sets: {self.working_sets}")
        print(f"  Total demand: {sum(self.working_sets)}")
        print(f"  Overcommit ratio: {sum(self.working_sets) / self.total_frames:.2f}x")
        
        TIME_STEPS = 1000
        
        global_throughput = self.simulate_global(TIME_STEPS)
        local_throughput = self.simulate_local(TIME_STEPS)
        
        print(f"\nResults (over {TIME_STEPS} time steps):")
        print(f"  Global replacement: {global_throughput:.0f} work units")
        print(f"  Local replacement:  {local_throughput:.0f} work units")
        print(f"  Difference: {((global_throughput - local_throughput) / local_throughput * 100):.1f}% more with global")
 
# Example output:
# 
# Configuration:
#   Total frames: 100
#   Processes: 5
#   Working sets: [45, 25, 65, 30, 55]  (total: 220)
#   Overcommit ratio: 2.20x
# 
# Results (over 1000 time steps):
#   Global replacement: 38500 work units
#   Local replacement:  28700 work units
#   Difference: 34.1% more with global
 
sim = ThroughputSimulator(total_frames=100, num_processes=5)
sim.run_comparison()

Overcommit and Throughput

Global replacement enables memory overcommit—running more processes than can fit simultaneously in memory. This works because not all processes actively use all their pages at once. Overcommit dramatically increases throughput (more processes = more work) but increases interference risk. Local replacement makes overcommit difficult because each process must have its allocation guaranteed.

Latency Analysis

While throughput measures total work, latency measures how long individual operations take. The choice of replacement policy significantly impacts request latency—particularly tail latencies (P95, P99) that users most notice.

Global replacement introduces latency variance:

Under global replacement, a process's latency depends not just on its own behavior but on all other processes. A neighbor's burst allocation can cause page evictions that spike your latency unexpectedly. This creates variance that makes latency unpredictable.

Local replacement provides latency stability:

With local replacement, your latency depends only on your own memory behavior within your allocation. No external process can cause your latency to spike. This predictability is valuable for latency-sensitive workloads.

Latency Comparison: Global vs Local
Metric	Global Replacement	Local Replacement	Implications
Mean Latency	Often lower (more memory available)	Stable but potentially higher	Global wins on average case
P50 Latency	Low, similar to local	Low, consistent	Similar in normal operation
P95 Latency	Can spike significantly	Bounded, predictable	Local wins on tail latency
P99 Latency	Highly variable, interference spikes	Bounded by allocation	Local much better for SLAs
Max Latency	Unbounded under interference	Bounded (though may be high if undersized)	Local provides guarantees
Latency Stability	Variable, depends on neighbors	Consistent, depends on self	Local is reproducible

Global Latency Distribution

•Fat-tailed distribution with occasional extreme outliers
•P99 can be 10-100x P50 during interference
•Latency correlates with neighbor activity
•Spikes are unpredictable and hard to reproduce
•Average typically lower when interference absent

Local Latency Distribution

•More normal distribution, bounded tails
•P99 typically 2-5x P50 (consistent)
•Latency depends only on own workload
•Behavior reproducible across runs
•Average may be higher if undersized

latency_distribution.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
"""
Latency Distribution Analysis: Global vs Local Replacement
 
Models how replacement policy affects latency percentiles.
"""
 
import random
import statistics
 
def simulate_global_latencies(base_latency_ms, num_requests, 
                               interference_probability=0.1):
    """
    Global replacement: latencies can spike due to interference.
    """
    latencies = []
    
    for _ in range(num_requests):
        latency = base_latency_ms
        
        # Random interference from other processes
        if random.random() < interference_probability:
            # Page was evicted; need to fault it back in
            page_fault_count = random.randint(1, 5)
            disk_latency = 10.0  # ms per page fault
            latency += page_fault_count * disk_latency
        
        # Occasional severe interference (cascading eviction)
        if random.random() < 0.02:
            latency += random.uniform(50, 200)  # Major spike
        
        latencies.append(latency)
    
    return latencies
 
def simulate_local_latencies(base_latency_ms, num_requests,
                              frames_allocated, working_set_size):
    """
    Local replacement: latencies depend only on own allocation.
    """
    latencies = []
    
    # Miss rate is deterministic based on allocation vs WSS
    miss_probability = max(0, 1.0 - (frames_allocated / working_set_size))
    miss_probability = min(miss_probability, 0.5)  # Cap for realism
    
    for _ in range(num_requests):
        latency = base_latency_ms
        
        # Self-induced page faults (consistent rate)
        if random.random() < miss_probability:
            latency += 10.0  # Fault latency
        
        # No external interference possible
        
        latencies.append(latency)
    
    return latencies
 
def analyze_latency_distribution(name, latencies):
    """Compute and display latency percentiles"""
    latencies_sorted = sorted(latencies)
    n = len(latencies)
    
    p50 = latencies_sorted[int(n * 0.50)]
    p95 = latencies_sorted[int(n * 0.95)]
    p99 = latencies_sorted[int(n * 0.99)]
    p999 = latencies_sorted[int(n * 0.999)]
    max_lat = latencies_sorted[-1]
    
    print(f"\n{name}:")
    print(f"  P50:   {p50:.2f} ms")
    print(f"  P95:   {p95:.2f} ms  ({p95/p50:.1f}x P50)")
    print(f"  P99:   {p99:.2f} ms  ({p99/p50:.1f}x P50)")
    print(f"  P99.9: {p999:.2f} ms ({p999/p50:.1f}x P50)")
    print(f"  Max:   {max_lat:.2f} ms ({max_lat/p50:.1f}x P50)")
    print(f"  StdDev: {statistics.stdev(latencies):.2f} ms")
 
# Simulate 10,000 requests
NUM_REQUESTS = 10000
BASE_LATENCY = 5.0  # 5ms base
 
global_lats = simulate_global_latencies(BASE_LATENCY, NUM_REQUESTS)
local_lats = simulate_local_latencies(BASE_LATENCY, NUM_REQUESTS,
                                       frames_allocated=80,
                                       working_set_size=100)
 
print("="*50)
print("LATENCY DISTRIBUTION COMPARISON")
print("="*50)
 
analyze_latency_distribution("Global Replacement", global_lats)
analyze_latency_distribution("Local Replacement", local_lats)
 
# Example output:
#
# Global Replacement:
#   P50:   5.00 ms
#   P95:   25.00 ms  (5.0x P50)
#   P99:   55.00 ms  (11.0x P50)
#   P99.9: 180.00 ms (36.0x P50)
#   Max:   245.00 ms (49.0x P50)
#   StdDev: 25.30 ms
#
# Local Replacement:
#   P50:   5.00 ms
#   P95:   15.00 ms  (3.0x P50)
#   P99:   15.00 ms  (3.0x P50)
#   P99.9: 15.00 ms  (3.0x P50)
#   Max:   15.00 ms  (3.0x P50)
#   StdDev: 4.47 ms

SLA Implications

For services with SLA commitments on latency (e.g., 'P99 < 100ms'), global replacement introduces risk. External interference can cause SLA violations that are outside your control. Local replacement makes SLA compliance predictable—if your allocation is sufficient, you'll meet your latency targets consistently.

Fairness Analysis

Fairness measures how equitably resources are distributed among competing processes. The two replacement policies have fundamentally different fairness characteristics.

Global replacement: demand-based fairness

Under global replacement, memory flows toward processes that actively demand it. This is 'fair' in the sense that active processes get resources—but 'unfair' in that a memory-hungry process can starve others.

Local replacement: allocation-based fairness

Under local replacement, each process gets exactly its allocation. This is 'fair' in the sense of guaranteed shares—but potentially 'wasteful' if allocations don't match needs.

Fairness Perspectives: Global vs Local
Fairness Concept	Global Behavior	Local Behavior
Equal Opportunity	Yes—all compete for the same pool	Yes—all start with guaranteed allocation
Proportional Share	Implicit, based on demand	Explicit, based on allocation policy
Isolation Fairness	No—neighbors can hurt you	Yes—complete isolation
Utilization Fairness	High—no waste allowed	Lower—unused allocation wasted
Priority Respect	Only if algorithm includes priority	Via differentiated allocations
Freedom from Starvation	Not guaranteed (can be starved)	Guaranteed (allocation is yours)

Jain's Fairness Index Analysis

•Jain's Fairness Index measures resource distribution fairness from 0 (unfair) to 1 (perfectly fair)
•Global replacement often scores lower: dominant processes get more than their 'fair share'
•Local replacement often scores higher: each process gets exactly its allocation, no more, no less
•However, local replacement can have allocation unfairness: who decides each process's allocation?
•The 'fairest' approach depends on what dimension of fairness matters most to stakeholders

fairness_metrics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
"""
Fairness Metrics for Replacement Policies
 
Demonstrates how fairness is measured and compared.
"""
 
def jains_fairness_index(allocations):
    """
    Jain's Fairness Index: measures equality of allocations
    Returns value between 0 (unfair) and 1 (perfectly fair)
    
    J(x) = (sum(xi))^2 / (n * sum(xi^2))
    """
    n = len(allocations)
    if n == 0:
        return 0
    
    sum_x = sum(allocations)
    sum_x_squared = sum(x**2 for x in allocations)
    
    if sum_x_squared == 0:
        return 1  # All zero = equal
    
    return (sum_x ** 2) / (n * sum_x_squared)
 
def analyze_fairness(scenario_name, allocations, needs):
    """Analyze fairness of memory distribution"""
    print(f"\n{scenario_name}:")
    
    # Raw allocation fairness
    jfi_alloc = jains_fairness_index(allocations)
    print(f"  Jain's Index (allocation): {jfi_alloc:.3f}")
    
    # Service (allocation / need) fairness
    service_ratios = [a / n if n > 0 else 1.0 for a, n in zip(allocations, needs)]
    jfi_service = jains_fairness_index(service_ratios)
    print(f"  Jain's Index (service ratio): {jfi_service:.3f}")
    
    # Satisfaction ratio
    satisfaction = [min(a / n, 1.0) if n > 0 else 1.0 for a, n in zip(allocations, needs)]
    avg_satisfaction = sum(satisfaction) / len(satisfaction)
    print(f"  Average satisfaction: {avg_satisfaction:.2%}")
    
    # Show individual process status
    for i, (a, n, s) in enumerate(zip(allocations, needs, satisfaction)):
        status = "SATISFIED" if a >= n else f"UNDERSIZED ({s:.0%})"
        print(f"    Process {i}: {a} frames / {n} needed -> {status}")
 
# Scenario: 100 frames, 4 processes with different needs
needs = [40, 30, 50, 20]  # Total demand = 140 (overcommit)
 
# Global replacement outcome: memory flows to demanding processes
# Process 2 (needs 50) tends to grab more due to higher demand
global_alloc = [32, 18, 42, 8]  # Skewed toward high-demand
 
# Local replacement with equal allocation
local_equal_alloc = [25, 25, 25, 25]  # Equal distribution
 
# Local replacement with proportional allocation
total_need = sum(needs)
local_prop_alloc = [int(n / total_need * 100) for n in needs]
 
print("="*50)
print("FAIRNESS ANALYSIS: GLOBAL vs LOCAL")
print("="*50)
print(f"\nTotal frames: 100")
print(f"Process needs: {needs}")
print(f"Total demand: {sum(needs)} (overcommit ratio: {sum(needs)/100:.2f}x)")
 
analyze_fairness("Global Replacement (demand-driven)", global_alloc, needs)
analyze_fairness("Local Equal Allocation", local_equal_alloc, needs)
analyze_fairness("Local Proportional Allocation", local_prop_alloc, needs)
 
# Output:
#
# Global Replacement (demand-driven):
#   Jain's Index (allocation): 0.867
#   Jain's Index (service ratio): 0.914
#   Average satisfaction: 67.1%
#   ...
#
# Local Equal Allocation:
#   Jain's Index (allocation): 1.000  (perfectly equal)
#   Jain's Index (service ratio): 0.856
#   Average satisfaction: 85.4%
#   ...
#
# Local Proportional Allocation:
#   Jain's Index (allocation): 0.899
#   Jain's Index (service ratio): 1.000  (perfectly proportional)
#   Average satisfaction: 71.4%

Choosing a Fairness Model

What 'fair' means depends on context. For batch processing, equal allocation may be fair. For multi-tier services, proportional to importance may be fair. For cloud billing, 'pay for use' suggests global replacement is fair. Define fairness requirements explicitly before choosing a replacement policy.

Predictability Analysis

Predictability measures how consistently a system behaves across runs and time. For capacity planning, performance modeling, and debugging, predictability is often as valuable as raw performance.

Global replacement inherently reduces predictability:

Because your performance depends on what other processes are doing, identically configured workloads may behave differently at different times or on different machines. This makes benchmarks less meaningful and debugging harder.

Local replacement inherently increases predictability:

Your performance depends only on your workload and your allocation. Given the same allocation, the same workload will behave the same way every time. This enables meaningful capacity planning.

Predictability Comparison
Aspect	Global	Local
Benchmark Reproducibility	Low—results vary with co-located workloads	High—same allocation = same results
Performance Modeling	Difficult—must model all tenants	Feasible—model individual process
Capacity Planning	Requires statistical modeling	Deterministic based on allocations
Regression Testing	False positives from interference	Meaningful comparisons possible
Debugging Performance	Must consider entire system	Focus on individual process
SLA Probability	Lower confidence in guarantees	High confidence if sized correctly

Predictability Impact Examples

•A/B Test Validity: Under global replacement, A/B test results may be confounded by co-located workload variations. Local replacement enables cleaner experiments.
•Database Provisioning: Under global, 'how much memory does this database need?' has no stable answer. Under local, you can measure and provision correctly.
•Autoscaling Decisions: Global replacement makes page fault signals noisy (caused by self or others?). Local replacement gives cleaner signals for autoscaling.
•Build System Performance: A build that takes 10 minutes today might take 30 minutes tomorrow under global if other builds run concurrently. Local provides consistency.

The Value of Boring Predictability

Engineers often undervalue predictability. Faster-on-average but highly variable performance is often worse than slower-but-consistent performance. Variable performance complicates planning, causes pager fatigue from intermittent issues, and erodes user trust. Local replacement's predictability has value beyond raw metrics.

Scalability Considerations

As systems grow to support more processes, larger memory pools, and higher overcommit ratios, the scalability characteristics of replacement policies become increasingly important.

Global replacement scalability patterns:

Global Replacement Scaling Behavior

•More processes = more interference potential: As the number of processes grows, the probability that some process is memory-hungry at any moment increases. Interference becomes more common.
•Larger memory pools = better statistical multiplexing: More total memory enables smoother demand averaging. Burst from one process is a smaller percentage of total.
•Higher overcommit = higher efficiency but higher risk: Overcommit ratios that work at 10 processes may cause problems at 100 processes due to correlated demand.
•Algorithm overhead grows: Global victim selection scanning all frames becomes expensive. LRU approximations like Clock become essential.

Local Replacement Scaling Behavior

•More processes = smaller per-process allocations (with fixed total): Each additional process reduces what others can receive. Eventually hit minimum thresholds.
•Larger memory pools = proportionally larger allocations: Scaling memory lets all processes grow their allocations. Good linear scaling.
•No overcommit ability: Cannot run more processes than memory supports at minimum allocations. Limits consolidation density.
•Algorithm overhead confined: Per-process LRU scanning is faster since each process has fewer frames to scan.

Scalability Trade-offs
Scaling Dimension	Global Favors	Local Favors
More processes (fixed memory)	Higher utilization, more interference	Guaranteed minimums, limits process count
More memory (fixed processes)	Both scale well; global has slight algorithm overhead	Both scale well; local has cleaner allocation
Higher overcommit	Can support; risk increases with ratio	Cannot meaningfully overcommit
Heterogeneous workloads	Adapts well to diversity	Requires careful allocation tuning
Peak load handling	Gracefully degrades (all suffer)	Some processes fail, others protected

Cloud Scale Lessons

Large cloud providers have learned that pure global replacement doesn't scale to thousands of tenants per machine. They use hierarchical approaches: global replacement within a tenant's containers, but strict local (cgroup) boundaries between tenants. This hybrid captures global's efficiency within trust boundaries while enforcing local's isolation across them.

Performance Under Memory Pressure

The true character of a memory management policy is revealed under memory pressure—when total working set demand exceeds physical memory. This is when the differences between global and local replacement become most stark.

Global replacement under pressure:

Global Under Pressure

•Graceful aggregate degradation: All processes slow down somewhat as pages churn, but system continues functioning
•Unpredictable individual impact: Any specific process might be slightly slower or dramatically slower depending on LRU ordering
•Thrashing cascade risk: If pressure is severe, multiple processes may fall below working set simultaneously, causing system-wide collapse
•Recovery is gradual: As pressure eases, pages slowly reload and stabilize
•No process is completely protected: Even high-priority processes may lose pages under extreme pressure

Local Under Pressure

•Sharp boundaries: Processes with adequate allocation continue normally; undersized processes suffer
•Predictable victims: The processes that suffer are those with insufficient allocation—identifiable in advance
•Protection is real: Adequately allocated processes are completely unaffected by pressure on others
•Binary outcomes: Processes either work well (adequate allocation) or suffer significantly (inadequate allocation)
•Recovery is per-process: An undersized process recovers when its allocation increases or its demand decreases

pressure_response.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
"""
System Behavior Under Memory Pressure
 
Compares how global and local replacement respond to overload.
"""
 
def model_global_pressure(total_frames, processes_wss):
    """
    Global replacement: demand exceeds supply
    All processes degrade proportionally to their demand
    """
    total_demand = sum(processes_wss)
    overcommit_ratio = total_demand / total_frames
    
    print(f"\nGlobal Replacement Under {overcommit_ratio:.2f}x Pressure:")
    print("-" * 50)
    
    # Under global, each process gets roughly proportional share
    performances = []
    for i, wss in enumerate(processes_wss):
        # Effective frames = proportional share
        effective_frames = int(wss / total_demand * total_frames)
        
        # Performance = frames / need (capped at 100%)
        perf = min(effective_frames / wss, 1.0)
        performances.append(perf)
        
        status = "DEGRADED" if perf < 0.8 else "OK"
        print(f"  Process {i}: WSS={wss}, gets={effective_frames}, perf={perf:.0%} [{status}]")
    
    avg_perf = sum(performances) / len(performances)
    min_perf = min(performances)
    
    print(f"\n  Average performance: {avg_perf:.0%}")
    print(f"  Worst process: {min_perf:.0%}")
    print(f"  All processes impacted: YES (shared degradation)")
 
def model_local_pressure(total_frames, processes_wss, allocations):
    """
    Local replacement: each process has fixed allocation
    Undersized processes suffer; adequately sized are fine
    """
    total_allocation = sum(allocations)
    
    print(f"\nLocal Replacement with allocations {allocations}:")
    print("-" * 50)
    
    performances = []
    for i, (wss, alloc) in enumerate(zip(processes_wss, allocations)):
        # Performance based on allocation vs need
        perf = min(alloc / wss, 1.0)
        performances.append(perf)
        
        if perf >= 0.95:
            status = "NORMAL"
        elif perf >= 0.6:
            status = "DEGRADED"
        else:
            status = "THRASHING"
        
        print(f"  Process {i}: WSS={wss}, alloc={alloc}, perf={perf:.0%} [{status}]")
    
    avg_perf = sum(performances) / len(performances)
    min_perf = min(performances)
    protected = sum(1 for p in performances if p >= 0.95)
    
    print(f"\n  Average performance: {avg_perf:.0%}")
    print(f"  Worst process: {min_perf:.0%}")
    print(f"  Fully protected processes: {protected} / {len(processes_wss)}")
    print(f"  Isolation maintained: YES")
 
# Scenario: 100 frames, 4 processes
processes_wss = [40, 30, 20, 50]  # Total demand = 140
total_frames = 100
 
print("="*60)
print("MEMORY PRESSURE RESPONSE COMPARISON")
print("="*60)
print(f"\nTotal frames: {total_frames}")
print(f"Process working sets: {processes_wss}")
print(f"Total demand: {sum(processes_wss)} (overcommit: {sum(processes_wss)/total_frames:.2f}x)")
 
model_global_pressure(total_frames, processes_wss)
 
# Two local allocation strategies
model_local_pressure(total_frames, processes_wss, [25, 25, 25, 25])  # Equal
model_local_pressure(total_frames, processes_wss, [30, 25, 20, 25])  # Prioritized
 
# Output demonstrates:
# - Global: all processes degraded somewhat (60-80%)
# - Local equal: processes 0,3 degraded badly; 1,2 protected
# - Local prioritized: better outcomes for priority processes

The Pressure Response Trade-off

Global replacement distributes pain across all processes—nobody fails completely, but everybody suffers. Local replacement protects some processes at the cost of others—protected processes work normally while victims suffer severely. Neither approach is universally better; the right choice depends on whether 'shared degradation' or 'selective protection' better serves your requirements.

Quantitative Comparison Framework

To make informed decisions about replacement policies, engineers need a framework for quantifying the trade-offs. This section provides metrics and formulas for comparing policies objectively.

Key quantitative metrics:

Quantitative Metric Definitions
Metric	Formula	What It Measures
System Throughput	Σ(work_i × efficiency_i)	Total useful work across all processes
Memory Utilization	used_frames / total_frames	Fraction of memory productively employed
Aggregate Page Fault Rate	Σ(fault_rate_i)	Total faults per second system-wide
Latency Variance	Var(latency_i) across samples	Consistency of response times
Jain Fairness Index	(Σx_i)² / (n × Σx_i²)	Equality of resource distribution
Protection Ratio	processes_making_progress / total_processes	Fraction of processes not starved

comparison_framework.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
"""
Quantitative Comparison Framework for Replacement Policies
 
Provides objective metrics for policy evaluation.
"""
 
from dataclasses import dataclass
from typing import List
import statistics
 
@dataclass 
class ProcessMetrics:
    pid: int
    working_set_size: int
    frames_allocated: int
    page_fault_rate: float  # faults per second
    latency_samples: List[float]  # milliseconds
    work_completed: float  # abstract work units
 
class PolicyEvaluator:
    """Evaluate a replacement policy based on process metrics"""
    
    def __init__(self, metrics: List[ProcessMetrics], total_frames: int):
        self.metrics = metrics
        self.total_frames = total_frames
    
    def system_throughput(self) -> float:
        """Total work completed across all processes"""
        return sum(m.work_completed for m in self.metrics)
    
    def memory_utilization(self) -> float:
        """Fraction of memory in productive use"""
        used = sum(m.frames_allocated for m in self.metrics)
        # Adjust for efficiency: frames not meeting WSS are less productive
        effective_use = sum(
            min(m.frames_allocated, m.working_set_size) 
            for m in self.metrics
        )
        return effective_use / self.total_frames
    
    def aggregate_fault_rate(self) -> float:
        """Total page faults per second"""
        return sum(m.page_fault_rate for m in self.metrics)
    
    def latency_variance(self) -> float:
        """Average variance in latency across processes"""
        variances = [
            statistics.variance(m.latency_samples) if len(m.latency_samples) > 1 else 0
            for m in self.metrics
        ]
        return statistics.mean(variances)
    
    def jains_fairness(self) -> float:
        """Jain's fairness index on service ratio (allocation/need)"""
        service_ratios = [
            min(m.frames_allocated / m.working_set_size, 1.0)
            for m in self.metrics
        ]
        n = len(service_ratios)
        sum_x = sum(service_ratios)
        sum_x2 = sum(x**2 for x in service_ratios)
        if sum_x2 == 0:
            return 1.0
        return (sum_x ** 2) / (n * sum_x2)
    
    def protection_ratio(self, threshold=0.8) -> float:
        """Fraction of processes with adequate memory"""
        adequately_served = sum(
            1 for m in self.metrics
            if m.frames_allocated >= threshold * m.working_set_size
        )
        return adequately_served / len(self.metrics)
    
    def generate_report(self, policy_name: str) -> dict:
        """Generate comprehensive evaluation report"""
        report = {
            'policy': policy_name,
            'throughput': self.system_throughput(),
            'utilization': self.memory_utilization(),
            'fault_rate': self.aggregate_fault_rate(),
            'latency_variance': self.latency_variance(),
            'fairness': self.jains_fairness(),
            'protection': self.protection_ratio(),
        }
        
        print(f"\n{'='*60}")
        print(f"POLICY EVALUATION: {policy_name}")
        print(f"{'='*60}")
        for key, value in report.items():
            if key == 'policy':
                continue
            if isinstance(value, float):
                if key in ['utilization', 'fairness', 'protection']:
                    print(f"  {key:20}: {value:.1%}")
                else:
                    print(f"  {key:20}: {value:.2f}")
        
        return report
 
# Example usage would compare global vs local metrics
# collected from actual system observation or simulation

Choosing Based on Priorities

Use this framework to score policies against your specific priorities. If throughput is paramount, global likely wins. If latency predictability matters most, local likely wins. If fairness under load is critical, local wins. Quantify the trade-offs rather than making qualitative assumptions.

Summary: Performance Implications

We have thoroughly explored the performance implications of global versus local replacement policies across multiple dimensions.

Key Takeaways

•Throughput: Global replacement typically maximizes total system throughput by allowing memory to flow to demand; local replacement may waste memory in suboptimal allocations
•Latency: Global introduces latency variance through interference; local provides bounded, predictable latency for adequately sized processes
•Fairness: Global is 'demand-fair' but allows starvation; local is 'allocation-fair' but may allocate unfairly if misconfigured
•Predictability: Local enables reproducible benchmarks and meaningful capacity planning; global makes performance dependent on co-located workloads
•Scalability: Global enables overcommit and high consolidation; local limits consolidation but maintains isolation at scale
•Pressure response: Global degrades all processes somewhat; local protects some while sacrificing others

Next: Choosing Strategy

In the final page of this module, we synthesize everything learned to provide a decision framework for choosing between global and local replacement strategies in various real-world scenarios.

Page Complete

You now have a comprehensive understanding of the performance implications of replacement policy choices. This knowledge enables you to make informed trade-offs between throughput, latency, fairness, predictability, and scalability when designing or configuring memory management systems.

4 / 5

Loading learning content...

Operating SystemsGlobal vs Local Replacement

Global vs Local Replacement

LevelIntermediate

Duration60 mins

TopicGlobal vs Local Replacement

4 / 5

Performance Implications

The Performance Trade-off Landscape

What You Will Learn

Throughput Analysis

Throughput Comparison: Global vs Local
Workload Scenario	Global Throughput	Local Throughput	Difference
Homogeneous (all processes similar)	100%	~95%	Small gap, both work well
Heterogeneous (varying demands)	100%	~70-85%	Significant waste with local
Bursty (demand spikes)	100%	~60-80%	Local cannot adapt to bursts
Memory overcommit scenario	Depends*	Harder to overcommit	Global enables overcommit
Single memory-intensive process	High	Limited by partition	Global lets it use all memory

throughput_simulation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
"""
Throughput Simulation: Global vs Local Replacement
 
Demonstrates how memory distribution affects total work completed.
"""
 
import random
 
class ThroughputSimulator:
    """
    Simulates system throughput under different replacement policies
    """
    
    def __init__(self, total_frames, num_processes):
        self.total_frames = total_frames
        self.num_processes = num_processes
        
        # Each process has varying working set size
        self.working_sets = [
            random.randint(20, 80) for _ in range(num_processes)
        ]
        
    def simulate_global(self, time_steps):
        """
        Global replacement: memory flows to where needed.
        Each process gets frames proportional to demand.
        """
        total_work = 0
        
        # Dynamic allocation based on demand
        for t in range(time_steps):
            total_demand = sum(self.working_sets)
            
            for i in range(self.num_processes):
                # Frames allocated proportionally to demand
                frames = int((self.working_sets[i] / total_demand) * self.total_frames)
                frames = min(frames, self.working_sets[i])  # Cap at WSS
                
                # Work completed is function of frames vs WSS
                efficiency = self._compute_efficiency(frames, self.working_sets[i])
                work = efficiency * 10  # 10 units per time step at 100% efficiency
                total_work += work
        
        return total_work
    
    def simulate_local(self, time_steps):
        """
        Local replacement: each process gets equal fixed allocation.
        No adaptation to actual demand.
        """
        total_work = 0
        
        # Static equal allocation
        frames_per_process = self.total_frames // self.num_processes
        
        for t in range(time_steps):
            for i in range(self.num_processes):
                frames = min(frames_per_process, self.working_sets[i])
                
                efficiency = self._compute_efficiency(frames, self.working_sets[i])
                work = efficiency * 10
                total_work += work
        
        return total_work
    
    def _compute_efficiency(self, frames, working_set):
        """
        Non-linear efficiency curve:
        - At 100% WSS: 100% efficiency
        - At 80% WSS: ~90% efficiency
        - At 50% WSS: ~50% efficiency
        - Below 40% WSS: efficiency drops rapidly
        """
        ratio = frames / working_set if working_set > 0 else 1.0
        
        if ratio >= 1.0:
            return 1.0
        elif ratio >= 0.8:
            return 0.9 + (ratio - 0.8) * 0.5
        elif ratio >= 0.5:
            return 0.5 + (ratio - 0.5) * (0.4 / 0.3)
        elif ratio >= 0.4:
            return 0.3 + (ratio - 0.4) * 2.0
        else:
            return ratio * 0.75  # Severe thrashing below 40%
    
    def run_comparison(self):
        """Run simulation and compare policies"""
        print("=" * 60)
        print("THROUGHPUT SIMULATION: GLOBAL vs LOCAL REPLACEMENT")
        print("=" * 60)
        print(f"\nConfiguration:")
        print(f"  Total frames: {self.total_frames}")
        print(f"  Processes: {self.num_processes}")
        print(f"  Working sets: {self.working_sets}")
        print(f"  Total demand: {sum(self.working_sets)}")
        print(f"  Overcommit ratio: {sum(self.working_sets) / self.total_frames:.2f}x")
        
        TIME_STEPS = 1000
        
        global_throughput = self.simulate_global(TIME_STEPS)
        local_throughput = self.simulate_local(TIME_STEPS)
        
        print(f"\nResults (over {TIME_STEPS} time steps):")
        print(f"  Global replacement: {global_throughput:.0f} work units")
        print(f"  Local replacement:  {local_throughput:.0f} work units")
        print(f"  Difference: {((global_throughput - local_throughput) / local_throughput * 100):.1f}% more with global")
 
# Example output:
# 
# Configuration:
#   Total frames: 100
#   Processes: 5
#   Working sets: [45, 25, 65, 30, 55]  (total: 220)
#   Overcommit ratio: 2.20x
# 
# Results (over 1000 time steps):
#   Global replacement: 38500 work units
#   Local replacement:  28700 work units
#   Difference: 34.1% more with global
 
sim = ThroughputSimulator(total_frames=100, num_processes=5)
sim.run_comparison()

Overcommit and Throughput

Latency Analysis

Global replacement introduces latency variance:

Local replacement provides latency stability:

Latency Comparison: Global vs Local
Metric	Global Replacement	Local Replacement	Implications
Mean Latency	Often lower (more memory available)	Stable but potentially higher	Global wins on average case
P50 Latency	Low, similar to local	Low, consistent	Similar in normal operation
P95 Latency	Can spike significantly	Bounded, predictable	Local wins on tail latency
P99 Latency	Highly variable, interference spikes	Bounded by allocation	Local much better for SLAs
Max Latency	Unbounded under interference	Bounded (though may be high if undersized)	Local provides guarantees
Latency Stability	Variable, depends on neighbors	Consistent, depends on self	Local is reproducible

Global Latency Distribution

•Fat-tailed distribution with occasional extreme outliers
•P99 can be 10-100x P50 during interference
•Latency correlates with neighbor activity
•Spikes are unpredictable and hard to reproduce
•Average typically lower when interference absent

Local Latency Distribution

•More normal distribution, bounded tails
•P99 typically 2-5x P50 (consistent)
•Latency depends only on own workload
•Behavior reproducible across runs
•Average may be higher if undersized

latency_distribution.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
"""
Latency Distribution Analysis: Global vs Local Replacement
 
Models how replacement policy affects latency percentiles.
"""
 
import random
import statistics
 
def simulate_global_latencies(base_latency_ms, num_requests, 
                               interference_probability=0.1):
    """
    Global replacement: latencies can spike due to interference.
    """
    latencies = []
    
    for _ in range(num_requests):
        latency = base_latency_ms
        
        # Random interference from other processes
        if random.random() < interference_probability:
            # Page was evicted; need to fault it back in
            page_fault_count = random.randint(1, 5)
            disk_latency = 10.0  # ms per page fault
            latency += page_fault_count * disk_latency
        
        # Occasional severe interference (cascading eviction)
        if random.random() < 0.02:
            latency += random.uniform(50, 200)  # Major spike
        
        latencies.append(latency)
    
    return latencies
 
def simulate_local_latencies(base_latency_ms, num_requests,
                              frames_allocated, working_set_size):
    """
    Local replacement: latencies depend only on own allocation.
    """
    latencies = []
    
    # Miss rate is deterministic based on allocation vs WSS
    miss_probability = max(0, 1.0 - (frames_allocated / working_set_size))
    miss_probability = min(miss_probability, 0.5)  # Cap for realism
    
    for _ in range(num_requests):
        latency = base_latency_ms
        
        # Self-induced page faults (consistent rate)
        if random.random() < miss_probability:
            latency += 10.0  # Fault latency
        
        # No external interference possible
        
        latencies.append(latency)
    
    return latencies
 
def analyze_latency_distribution(name, latencies):
    """Compute and display latency percentiles"""
    latencies_sorted = sorted(latencies)
    n = len(latencies)
    
    p50 = latencies_sorted[int(n * 0.50)]
    p95 = latencies_sorted[int(n * 0.95)]
    p99 = latencies_sorted[int(n * 0.99)]
    p999 = latencies_sorted[int(n * 0.999)]
    max_lat = latencies_sorted[-1]
    
    print(f"\n{name}:")
    print(f"  P50:   {p50:.2f} ms")
    print(f"  P95:   {p95:.2f} ms  ({p95/p50:.1f}x P50)")
    print(f"  P99:   {p99:.2f} ms  ({p99/p50:.1f}x P50)")
    print(f"  P99.9: {p999:.2f} ms ({p999/p50:.1f}x P50)")
    print(f"  Max:   {max_lat:.2f} ms ({max_lat/p50:.1f}x P50)")
    print(f"  StdDev: {statistics.stdev(latencies):.2f} ms")
 
# Simulate 10,000 requests
NUM_REQUESTS = 10000
BASE_LATENCY = 5.0  # 5ms base
 
global_lats = simulate_global_latencies(BASE_LATENCY, NUM_REQUESTS)
local_lats = simulate_local_latencies(BASE_LATENCY, NUM_REQUESTS,
                                       frames_allocated=80,
                                       working_set_size=100)
 
print("="*50)
print("LATENCY DISTRIBUTION COMPARISON")
print("="*50)
 
analyze_latency_distribution("Global Replacement", global_lats)
analyze_latency_distribution("Local Replacement", local_lats)
 
# Example output:
#
# Global Replacement:
#   P50:   5.00 ms
#   P95:   25.00 ms  (5.0x P50)
#   P99:   55.00 ms  (11.0x P50)
#   P99.9: 180.00 ms (36.0x P50)
#   Max:   245.00 ms (49.0x P50)
#   StdDev: 25.30 ms
#
# Local Replacement:
#   P50:   5.00 ms
#   P95:   15.00 ms  (3.0x P50)
#   P99:   15.00 ms  (3.0x P50)
#   P99.9: 15.00 ms  (3.0x P50)
#   Max:   15.00 ms  (3.0x P50)
#   StdDev: 4.47 ms

SLA Implications

Fairness Analysis

Fairness measures how equitably resources are distributed among competing processes. The two replacement policies have fundamentally different fairness characteristics.

Global replacement: demand-based fairness

Local replacement: allocation-based fairness

Under local replacement, each process gets exactly its allocation. This is 'fair' in the sense of guaranteed shares—but potentially 'wasteful' if allocations don't match needs.

Fairness Perspectives: Global vs Local
Fairness Concept	Global Behavior	Local Behavior
Equal Opportunity	Yes—all compete for the same pool	Yes—all start with guaranteed allocation
Proportional Share	Implicit, based on demand	Explicit, based on allocation policy
Isolation Fairness	No—neighbors can hurt you	Yes—complete isolation
Utilization Fairness	High—no waste allowed	Lower—unused allocation wasted
Priority Respect	Only if algorithm includes priority	Via differentiated allocations
Freedom from Starvation	Not guaranteed (can be starved)	Guaranteed (allocation is yours)

Jain's Fairness Index Analysis

•Jain's Fairness Index measures resource distribution fairness from 0 (unfair) to 1 (perfectly fair)
•Global replacement often scores lower: dominant processes get more than their 'fair share'
•Local replacement often scores higher: each process gets exactly its allocation, no more, no less
•However, local replacement can have allocation unfairness: who decides each process's allocation?
•The 'fairest' approach depends on what dimension of fairness matters most to stakeholders

fairness_metrics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
"""
Fairness Metrics for Replacement Policies
 
Demonstrates how fairness is measured and compared.
"""
 
def jains_fairness_index(allocations):
    """
    Jain's Fairness Index: measures equality of allocations
    Returns value between 0 (unfair) and 1 (perfectly fair)
    
    J(x) = (sum(xi))^2 / (n * sum(xi^2))
    """
    n = len(allocations)
    if n == 0:
        return 0
    
    sum_x = sum(allocations)
    sum_x_squared = sum(x**2 for x in allocations)
    
    if sum_x_squared == 0:
        return 1  # All zero = equal
    
    return (sum_x ** 2) / (n * sum_x_squared)
 
def analyze_fairness(scenario_name, allocations, needs):
    """Analyze fairness of memory distribution"""
    print(f"\n{scenario_name}:")
    
    # Raw allocation fairness
    jfi_alloc = jains_fairness_index(allocations)
    print(f"  Jain's Index (allocation): {jfi_alloc:.3f}")
    
    # Service (allocation / need) fairness
    service_ratios = [a / n if n > 0 else 1.0 for a, n in zip(allocations, needs)]
    jfi_service = jains_fairness_index(service_ratios)
    print(f"  Jain's Index (service ratio): {jfi_service:.3f}")
    
    # Satisfaction ratio
    satisfaction = [min(a / n, 1.0) if n > 0 else 1.0 for a, n in zip(allocations, needs)]
    avg_satisfaction = sum(satisfaction) / len(satisfaction)
    print(f"  Average satisfaction: {avg_satisfaction:.2%}")
    
    # Show individual process status
    for i, (a, n, s) in enumerate(zip(allocations, needs, satisfaction)):
        status = "SATISFIED" if a >= n else f"UNDERSIZED ({s:.0%})"
        print(f"    Process {i}: {a} frames / {n} needed -> {status}")
 
# Scenario: 100 frames, 4 processes with different needs
needs = [40, 30, 50, 20]  # Total demand = 140 (overcommit)
 
# Global replacement outcome: memory flows to demanding processes
# Process 2 (needs 50) tends to grab more due to higher demand
global_alloc = [32, 18, 42, 8]  # Skewed toward high-demand
 
# Local replacement with equal allocation
local_equal_alloc = [25, 25, 25, 25]  # Equal distribution
 
# Local replacement with proportional allocation
total_need = sum(needs)
local_prop_alloc = [int(n / total_need * 100) for n in needs]
 
print("="*50)
print("FAIRNESS ANALYSIS: GLOBAL vs LOCAL")
print("="*50)
print(f"\nTotal frames: 100")
print(f"Process needs: {needs}")
print(f"Total demand: {sum(needs)} (overcommit ratio: {sum(needs)/100:.2f}x)")
 
analyze_fairness("Global Replacement (demand-driven)", global_alloc, needs)
analyze_fairness("Local Equal Allocation", local_equal_alloc, needs)
analyze_fairness("Local Proportional Allocation", local_prop_alloc, needs)
 
# Output:
#
# Global Replacement (demand-driven):
#   Jain's Index (allocation): 0.867
#   Jain's Index (service ratio): 0.914
#   Average satisfaction: 67.1%
#   ...
#
# Local Equal Allocation:
#   Jain's Index (allocation): 1.000  (perfectly equal)
#   Jain's Index (service ratio): 0.856
#   Average satisfaction: 85.4%
#   ...
#
# Local Proportional Allocation:
#   Jain's Index (allocation): 0.899
#   Jain's Index (service ratio): 1.000  (perfectly proportional)
#   Average satisfaction: 71.4%

Choosing a Fairness Model

Predictability Analysis

Predictability measures how consistently a system behaves across runs and time. For capacity planning, performance modeling, and debugging, predictability is often as valuable as raw performance.

Global replacement inherently reduces predictability:

Local replacement inherently increases predictability:

Your performance depends only on your workload and your allocation. Given the same allocation, the same workload will behave the same way every time. This enables meaningful capacity planning.

Predictability Comparison
Aspect	Global	Local
Benchmark Reproducibility	Low—results vary with co-located workloads	High—same allocation = same results
Performance Modeling	Difficult—must model all tenants	Feasible—model individual process
Capacity Planning	Requires statistical modeling	Deterministic based on allocations
Regression Testing	False positives from interference	Meaningful comparisons possible
Debugging Performance	Must consider entire system	Focus on individual process
SLA Probability	Lower confidence in guarantees	High confidence if sized correctly

Predictability Impact Examples

•A/B Test Validity: Under global replacement, A/B test results may be confounded by co-located workload variations. Local replacement enables cleaner experiments.
•Database Provisioning: Under global, 'how much memory does this database need?' has no stable answer. Under local, you can measure and provision correctly.
•Autoscaling Decisions: Global replacement makes page fault signals noisy (caused by self or others?). Local replacement gives cleaner signals for autoscaling.
•Build System Performance: A build that takes 10 minutes today might take 30 minutes tomorrow under global if other builds run concurrently. Local provides consistency.

The Value of Boring Predictability

Scalability Considerations

As systems grow to support more processes, larger memory pools, and higher overcommit ratios, the scalability characteristics of replacement policies become increasingly important.

Global replacement scalability patterns:

Global Replacement Scaling Behavior

•More processes = more interference potential: As the number of processes grows, the probability that some process is memory-hungry at any moment increases. Interference becomes more common.
•Larger memory pools = better statistical multiplexing: More total memory enables smoother demand averaging. Burst from one process is a smaller percentage of total.
•Higher overcommit = higher efficiency but higher risk: Overcommit ratios that work at 10 processes may cause problems at 100 processes due to correlated demand.
•Algorithm overhead grows: Global victim selection scanning all frames becomes expensive. LRU approximations like Clock become essential.

Local Replacement Scaling Behavior

•More processes = smaller per-process allocations (with fixed total): Each additional process reduces what others can receive. Eventually hit minimum thresholds.
•Larger memory pools = proportionally larger allocations: Scaling memory lets all processes grow their allocations. Good linear scaling.
•No overcommit ability: Cannot run more processes than memory supports at minimum allocations. Limits consolidation density.
•Algorithm overhead confined: Per-process LRU scanning is faster since each process has fewer frames to scan.

Scalability Trade-offs
Scaling Dimension	Global Favors	Local Favors
More processes (fixed memory)	Higher utilization, more interference	Guaranteed minimums, limits process count
More memory (fixed processes)	Both scale well; global has slight algorithm overhead	Both scale well; local has cleaner allocation
Higher overcommit	Can support; risk increases with ratio	Cannot meaningfully overcommit
Heterogeneous workloads	Adapts well to diversity	Requires careful allocation tuning
Peak load handling	Gracefully degrades (all suffer)	Some processes fail, others protected

Cloud Scale Lessons

Performance Under Memory Pressure

Global replacement under pressure:

Global Under Pressure

•Graceful aggregate degradation: All processes slow down somewhat as pages churn, but system continues functioning
•Unpredictable individual impact: Any specific process might be slightly slower or dramatically slower depending on LRU ordering
•Thrashing cascade risk: If pressure is severe, multiple processes may fall below working set simultaneously, causing system-wide collapse
•Recovery is gradual: As pressure eases, pages slowly reload and stabilize
•No process is completely protected: Even high-priority processes may lose pages under extreme pressure

Local Under Pressure

•Sharp boundaries: Processes with adequate allocation continue normally; undersized processes suffer
•Predictable victims: The processes that suffer are those with insufficient allocation—identifiable in advance
•Protection is real: Adequately allocated processes are completely unaffected by pressure on others
•Binary outcomes: Processes either work well (adequate allocation) or suffer significantly (inadequate allocation)
•Recovery is per-process: An undersized process recovers when its allocation increases or its demand decreases

pressure_response.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
"""
System Behavior Under Memory Pressure
 
Compares how global and local replacement respond to overload.
"""
 
def model_global_pressure(total_frames, processes_wss):
    """
    Global replacement: demand exceeds supply
    All processes degrade proportionally to their demand
    """
    total_demand = sum(processes_wss)
    overcommit_ratio = total_demand / total_frames
    
    print(f"\nGlobal Replacement Under {overcommit_ratio:.2f}x Pressure:")
    print("-" * 50)
    
    # Under global, each process gets roughly proportional share
    performances = []
    for i, wss in enumerate(processes_wss):
        # Effective frames = proportional share
        effective_frames = int(wss / total_demand * total_frames)
        
        # Performance = frames / need (capped at 100%)
        perf = min(effective_frames / wss, 1.0)
        performances.append(perf)
        
        status = "DEGRADED" if perf < 0.8 else "OK"
        print(f"  Process {i}: WSS={wss}, gets={effective_frames}, perf={perf:.0%} [{status}]")
    
    avg_perf = sum(performances) / len(performances)
    min_perf = min(performances)
    
    print(f"\n  Average performance: {avg_perf:.0%}")
    print(f"  Worst process: {min_perf:.0%}")
    print(f"  All processes impacted: YES (shared degradation)")
 
def model_local_pressure(total_frames, processes_wss, allocations):
    """
    Local replacement: each process has fixed allocation
    Undersized processes suffer; adequately sized are fine
    """
    total_allocation = sum(allocations)
    
    print(f"\nLocal Replacement with allocations {allocations}:")
    print("-" * 50)
    
    performances = []
    for i, (wss, alloc) in enumerate(zip(processes_wss, allocations)):
        # Performance based on allocation vs need
        perf = min(alloc / wss, 1.0)
        performances.append(perf)
        
        if perf >= 0.95:
            status = "NORMAL"
        elif perf >= 0.6:
            status = "DEGRADED"
        else:
            status = "THRASHING"
        
        print(f"  Process {i}: WSS={wss}, alloc={alloc}, perf={perf:.0%} [{status}]")
    
    avg_perf = sum(performances) / len(performances)
    min_perf = min(performances)
    protected = sum(1 for p in performances if p >= 0.95)
    
    print(f"\n  Average performance: {avg_perf:.0%}")
    print(f"  Worst process: {min_perf:.0%}")
    print(f"  Fully protected processes: {protected} / {len(processes_wss)}")
    print(f"  Isolation maintained: YES")
 
# Scenario: 100 frames, 4 processes
processes_wss = [40, 30, 20, 50]  # Total demand = 140
total_frames = 100
 
print("="*60)
print("MEMORY PRESSURE RESPONSE COMPARISON")
print("="*60)
print(f"\nTotal frames: {total_frames}")
print(f"Process working sets: {processes_wss}")
print(f"Total demand: {sum(processes_wss)} (overcommit: {sum(processes_wss)/total_frames:.2f}x)")
 
model_global_pressure(total_frames, processes_wss)
 
# Two local allocation strategies
model_local_pressure(total_frames, processes_wss, [25, 25, 25, 25])  # Equal
model_local_pressure(total_frames, processes_wss, [30, 25, 20, 25])  # Prioritized
 
# Output demonstrates:
# - Global: all processes degraded somewhat (60-80%)
# - Local equal: processes 0,3 degraded badly; 1,2 protected
# - Local prioritized: better outcomes for priority processes

The Pressure Response Trade-off

Quantitative Comparison Framework

To make informed decisions about replacement policies, engineers need a framework for quantifying the trade-offs. This section provides metrics and formulas for comparing policies objectively.

Key quantitative metrics:

Quantitative Metric Definitions
Metric	Formula	What It Measures
System Throughput	Σ(work_i × efficiency_i)	Total useful work across all processes
Memory Utilization	used_frames / total_frames	Fraction of memory productively employed
Aggregate Page Fault Rate	Σ(fault_rate_i)	Total faults per second system-wide
Latency Variance	Var(latency_i) across samples	Consistency of response times
Jain Fairness Index	(Σx_i)² / (n × Σx_i²)	Equality of resource distribution
Protection Ratio	processes_making_progress / total_processes	Fraction of processes not starved

comparison_framework.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
"""
Quantitative Comparison Framework for Replacement Policies
 
Provides objective metrics for policy evaluation.
"""
 
from dataclasses import dataclass
from typing import List
import statistics
 
@dataclass 
class ProcessMetrics:
    pid: int
    working_set_size: int
    frames_allocated: int
    page_fault_rate: float  # faults per second
    latency_samples: List[float]  # milliseconds
    work_completed: float  # abstract work units
 
class PolicyEvaluator:
    """Evaluate a replacement policy based on process metrics"""
    
    def __init__(self, metrics: List[ProcessMetrics], total_frames: int):
        self.metrics = metrics
        self.total_frames = total_frames
    
    def system_throughput(self) -> float:
        """Total work completed across all processes"""
        return sum(m.work_completed for m in self.metrics)
    
    def memory_utilization(self) -> float:
        """Fraction of memory in productive use"""
        used = sum(m.frames_allocated for m in self.metrics)
        # Adjust for efficiency: frames not meeting WSS are less productive
        effective_use = sum(
            min(m.frames_allocated, m.working_set_size) 
            for m in self.metrics
        )
        return effective_use / self.total_frames
    
    def aggregate_fault_rate(self) -> float:
        """Total page faults per second"""
        return sum(m.page_fault_rate for m in self.metrics)
    
    def latency_variance(self) -> float:
        """Average variance in latency across processes"""
        variances = [
            statistics.variance(m.latency_samples) if len(m.latency_samples) > 1 else 0
            for m in self.metrics
        ]
        return statistics.mean(variances)
    
    def jains_fairness(self) -> float:
        """Jain's fairness index on service ratio (allocation/need)"""
        service_ratios = [
            min(m.frames_allocated / m.working_set_size, 1.0)
            for m in self.metrics
        ]
        n = len(service_ratios)
        sum_x = sum(service_ratios)
        sum_x2 = sum(x**2 for x in service_ratios)
        if sum_x2 == 0:
            return 1.0
        return (sum_x ** 2) / (n * sum_x2)
    
    def protection_ratio(self, threshold=0.8) -> float:
        """Fraction of processes with adequate memory"""
        adequately_served = sum(
            1 for m in self.metrics
            if m.frames_allocated >= threshold * m.working_set_size
        )
        return adequately_served / len(self.metrics)
    
    def generate_report(self, policy_name: str) -> dict:
        """Generate comprehensive evaluation report"""
        report = {
            'policy': policy_name,
            'throughput': self.system_throughput(),
            'utilization': self.memory_utilization(),
            'fault_rate': self.aggregate_fault_rate(),
            'latency_variance': self.latency_variance(),
            'fairness': self.jains_fairness(),
            'protection': self.protection_ratio(),
        }
        
        print(f"\n{'='*60}")
        print(f"POLICY EVALUATION: {policy_name}")
        print(f"{'='*60}")
        for key, value in report.items():
            if key == 'policy':
                continue
            if isinstance(value, float):
                if key in ['utilization', 'fairness', 'protection']:
                    print(f"  {key:20}: {value:.1%}")
                else:
                    print(f"  {key:20}: {value:.2f}")
        
        return report
 
# Example usage would compare global vs local metrics
# collected from actual system observation or simulation

Choosing Based on Priorities

Summary: Performance Implications

We have thoroughly explored the performance implications of global versus local replacement policies across multiple dimensions.

Key Takeaways

•Throughput: Global replacement typically maximizes total system throughput by allowing memory to flow to demand; local replacement may waste memory in suboptimal allocations
•Latency: Global introduces latency variance through interference; local provides bounded, predictable latency for adequately sized processes
•Fairness: Global is 'demand-fair' but allows starvation; local is 'allocation-fair' but may allocate unfairly if misconfigured
•Predictability: Local enables reproducible benchmarks and meaningful capacity planning; global makes performance dependent on co-located workloads
•Scalability: Global enables overcommit and high consolidation; local limits consolidation but maintains isolation at scale
•Pressure response: Global degrades all processes somewhat; local protects some while sacrificing others

Next: Choosing Strategy

In the final page of this module, we synthesize everything learned to provide a decision framework for choosing between global and local replacement strategies in various real-world scenarios.

Page Complete

4 / 5