Loading learning content...
The choice between global and local replacement is not simply a technical decision—it fundamentally shapes how a system behaves under load. This choice affects throughput, latency, fairness, predictability, and scalability in complex, sometimes counterintuitive ways.
Understanding these performance implications is essential for system designers who must balance competing objectives: maximizing efficiency while ensuring fairness, optimizing throughput while maintaining predictable latency, and scaling effectively while controlling resource interference.
By the end of this page, you will understand how replacement scope affects key performance metrics, the trade-offs inherent in each approach, how to quantify performance differences, and frameworks for evaluating which approach suits specific workloads and requirements.
Throughput—the total amount of useful work completed per unit time—is often the primary metric for evaluating system efficiency. Global and local replacement have fundamentally different throughput characteristics.
Global replacement typically maximizes throughput because it allows memory to flow to wherever demand exists. No frames sit idle while other processes need them. The entire physical memory pool is utilized for active work.
Local replacement may reduce throughput because statically partitioned memory can lead to waste. If Process A has 50 frames but only needs 30, those 20 excess frames cannot help Process B that needs 70 frames but only has 50.
| Workload Scenario | Global Throughput | Local Throughput | Difference |
|---|---|---|---|
| Homogeneous (all processes similar) | 100% | ~95% | Small gap, both work well |
| Heterogeneous (varying demands) | 100% | ~70-85% | Significant waste with local |
| Bursty (demand spikes) | 100% | ~60-80% | Local cannot adapt to bursts |
| Memory overcommit scenario | Depends* | Harder to overcommit | Global enables overcommit |
| Single memory-intensive process | High | Limited by partition | Global lets it use all memory |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123
"""Throughput Simulation: Global vs Local Replacement Demonstrates how memory distribution affects total work completed.""" import random class ThroughputSimulator: """ Simulates system throughput under different replacement policies """ def __init__(self, total_frames, num_processes): self.total_frames = total_frames self.num_processes = num_processes # Each process has varying working set size self.working_sets = [ random.randint(20, 80) for _ in range(num_processes) ] def simulate_global(self, time_steps): """ Global replacement: memory flows to where needed. Each process gets frames proportional to demand. """ total_work = 0 # Dynamic allocation based on demand for t in range(time_steps): total_demand = sum(self.working_sets) for i in range(self.num_processes): # Frames allocated proportionally to demand frames = int((self.working_sets[i] / total_demand) * self.total_frames) frames = min(frames, self.working_sets[i]) # Cap at WSS # Work completed is function of frames vs WSS efficiency = self._compute_efficiency(frames, self.working_sets[i]) work = efficiency * 10 # 10 units per time step at 100% efficiency total_work += work return total_work def simulate_local(self, time_steps): """ Local replacement: each process gets equal fixed allocation. No adaptation to actual demand. """ total_work = 0 # Static equal allocation frames_per_process = self.total_frames // self.num_processes for t in range(time_steps): for i in range(self.num_processes): frames = min(frames_per_process, self.working_sets[i]) efficiency = self._compute_efficiency(frames, self.working_sets[i]) work = efficiency * 10 total_work += work return total_work def _compute_efficiency(self, frames, working_set): """ Non-linear efficiency curve: - At 100% WSS: 100% efficiency - At 80% WSS: ~90% efficiency - At 50% WSS: ~50% efficiency - Below 40% WSS: efficiency drops rapidly """ ratio = frames / working_set if working_set > 0 else 1.0 if ratio >= 1.0: return 1.0 elif ratio >= 0.8: return 0.9 + (ratio - 0.8) * 0.5 elif ratio >= 0.5: return 0.5 + (ratio - 0.5) * (0.4 / 0.3) elif ratio >= 0.4: return 0.3 + (ratio - 0.4) * 2.0 else: return ratio * 0.75 # Severe thrashing below 40% def run_comparison(self): """Run simulation and compare policies""" print("=" * 60) print("THROUGHPUT SIMULATION: GLOBAL vs LOCAL REPLACEMENT") print("=" * 60) print(f"\nConfiguration:") print(f" Total frames: {self.total_frames}") print(f" Processes: {self.num_processes}") print(f" Working sets: {self.working_sets}") print(f" Total demand: {sum(self.working_sets)}") print(f" Overcommit ratio: {sum(self.working_sets) / self.total_frames:.2f}x") TIME_STEPS = 1000 global_throughput = self.simulate_global(TIME_STEPS) local_throughput = self.simulate_local(TIME_STEPS) print(f"\nResults (over {TIME_STEPS} time steps):") print(f" Global replacement: {global_throughput:.0f} work units") print(f" Local replacement: {local_throughput:.0f} work units") print(f" Difference: {((global_throughput - local_throughput) / local_throughput * 100):.1f}% more with global") # Example output:# # Configuration:# Total frames: 100# Processes: 5# Working sets: [45, 25, 65, 30, 55] (total: 220)# Overcommit ratio: 2.20x# # Results (over 1000 time steps):# Global replacement: 38500 work units# Local replacement: 28700 work units# Difference: 34.1% more with global sim = ThroughputSimulator(total_frames=100, num_processes=5)sim.run_comparison()Global replacement enables memory overcommit—running more processes than can fit simultaneously in memory. This works because not all processes actively use all their pages at once. Overcommit dramatically increases throughput (more processes = more work) but increases interference risk. Local replacement makes overcommit difficult because each process must have its allocation guaranteed.
While throughput measures total work, latency measures how long individual operations take. The choice of replacement policy significantly impacts request latency—particularly tail latencies (P95, P99) that users most notice.
Global replacement introduces latency variance:
Under global replacement, a process's latency depends not just on its own behavior but on all other processes. A neighbor's burst allocation can cause page evictions that spike your latency unexpectedly. This creates variance that makes latency unpredictable.
Local replacement provides latency stability:
With local replacement, your latency depends only on your own memory behavior within your allocation. No external process can cause your latency to spike. This predictability is valuable for latency-sensitive workloads.
| Metric | Global Replacement | Local Replacement | Implications |
|---|---|---|---|
| Mean Latency | Often lower (more memory available) | Stable but potentially higher | Global wins on average case |
| P50 Latency | Low, similar to local | Low, consistent | Similar in normal operation |
| P95 Latency | Can spike significantly | Bounded, predictable | Local wins on tail latency |
| P99 Latency | Highly variable, interference spikes | Bounded by allocation | Local much better for SLAs |
| Max Latency | Unbounded under interference | Bounded (though may be high if undersized) | Local provides guarantees |
| Latency Stability | Variable, depends on neighbors | Consistent, depends on self | Local is reproducible |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
"""Latency Distribution Analysis: Global vs Local Replacement Models how replacement policy affects latency percentiles.""" import randomimport statistics def simulate_global_latencies(base_latency_ms, num_requests, interference_probability=0.1): """ Global replacement: latencies can spike due to interference. """ latencies = [] for _ in range(num_requests): latency = base_latency_ms # Random interference from other processes if random.random() < interference_probability: # Page was evicted; need to fault it back in page_fault_count = random.randint(1, 5) disk_latency = 10.0 # ms per page fault latency += page_fault_count * disk_latency # Occasional severe interference (cascading eviction) if random.random() < 0.02: latency += random.uniform(50, 200) # Major spike latencies.append(latency) return latencies def simulate_local_latencies(base_latency_ms, num_requests, frames_allocated, working_set_size): """ Local replacement: latencies depend only on own allocation. """ latencies = [] # Miss rate is deterministic based on allocation vs WSS miss_probability = max(0, 1.0 - (frames_allocated / working_set_size)) miss_probability = min(miss_probability, 0.5) # Cap for realism for _ in range(num_requests): latency = base_latency_ms # Self-induced page faults (consistent rate) if random.random() < miss_probability: latency += 10.0 # Fault latency # No external interference possible latencies.append(latency) return latencies def analyze_latency_distribution(name, latencies): """Compute and display latency percentiles""" latencies_sorted = sorted(latencies) n = len(latencies) p50 = latencies_sorted[int(n * 0.50)] p95 = latencies_sorted[int(n * 0.95)] p99 = latencies_sorted[int(n * 0.99)] p999 = latencies_sorted[int(n * 0.999)] max_lat = latencies_sorted[-1] print(f"\n{name}:") print(f" P50: {p50:.2f} ms") print(f" P95: {p95:.2f} ms ({p95/p50:.1f}x P50)") print(f" P99: {p99:.2f} ms ({p99/p50:.1f}x P50)") print(f" P99.9: {p999:.2f} ms ({p999/p50:.1f}x P50)") print(f" Max: {max_lat:.2f} ms ({max_lat/p50:.1f}x P50)") print(f" StdDev: {statistics.stdev(latencies):.2f} ms") # Simulate 10,000 requestsNUM_REQUESTS = 10000BASE_LATENCY = 5.0 # 5ms base global_lats = simulate_global_latencies(BASE_LATENCY, NUM_REQUESTS)local_lats = simulate_local_latencies(BASE_LATENCY, NUM_REQUESTS, frames_allocated=80, working_set_size=100) print("="*50)print("LATENCY DISTRIBUTION COMPARISON")print("="*50) analyze_latency_distribution("Global Replacement", global_lats)analyze_latency_distribution("Local Replacement", local_lats) # Example output:## Global Replacement:# P50: 5.00 ms# P95: 25.00 ms (5.0x P50)# P99: 55.00 ms (11.0x P50)# P99.9: 180.00 ms (36.0x P50)# Max: 245.00 ms (49.0x P50)# StdDev: 25.30 ms## Local Replacement:# P50: 5.00 ms# P95: 15.00 ms (3.0x P50)# P99: 15.00 ms (3.0x P50)# P99.9: 15.00 ms (3.0x P50)# Max: 15.00 ms (3.0x P50)# StdDev: 4.47 msFor services with SLA commitments on latency (e.g., 'P99 < 100ms'), global replacement introduces risk. External interference can cause SLA violations that are outside your control. Local replacement makes SLA compliance predictable—if your allocation is sufficient, you'll meet your latency targets consistently.
Fairness measures how equitably resources are distributed among competing processes. The two replacement policies have fundamentally different fairness characteristics.
Global replacement: demand-based fairness
Under global replacement, memory flows toward processes that actively demand it. This is 'fair' in the sense that active processes get resources—but 'unfair' in that a memory-hungry process can starve others.
Local replacement: allocation-based fairness
Under local replacement, each process gets exactly its allocation. This is 'fair' in the sense of guaranteed shares—but potentially 'wasteful' if allocations don't match needs.
| Fairness Concept | Global Behavior | Local Behavior |
|---|---|---|
| Equal Opportunity | Yes—all compete for the same pool | Yes—all start with guaranteed allocation |
| Proportional Share | Implicit, based on demand | Explicit, based on allocation policy |
| Isolation Fairness | No—neighbors can hurt you | Yes—complete isolation |
| Utilization Fairness | High—no waste allowed | Lower—unused allocation wasted |
| Priority Respect | Only if algorithm includes priority | Via differentiated allocations |
| Freedom from Starvation | Not guaranteed (can be starved) | Guaranteed (allocation is yours) |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
"""Fairness Metrics for Replacement Policies Demonstrates how fairness is measured and compared.""" def jains_fairness_index(allocations): """ Jain's Fairness Index: measures equality of allocations Returns value between 0 (unfair) and 1 (perfectly fair) J(x) = (sum(xi))^2 / (n * sum(xi^2)) """ n = len(allocations) if n == 0: return 0 sum_x = sum(allocations) sum_x_squared = sum(x**2 for x in allocations) if sum_x_squared == 0: return 1 # All zero = equal return (sum_x ** 2) / (n * sum_x_squared) def analyze_fairness(scenario_name, allocations, needs): """Analyze fairness of memory distribution""" print(f"\n{scenario_name}:") # Raw allocation fairness jfi_alloc = jains_fairness_index(allocations) print(f" Jain's Index (allocation): {jfi_alloc:.3f}") # Service (allocation / need) fairness service_ratios = [a / n if n > 0 else 1.0 for a, n in zip(allocations, needs)] jfi_service = jains_fairness_index(service_ratios) print(f" Jain's Index (service ratio): {jfi_service:.3f}") # Satisfaction ratio satisfaction = [min(a / n, 1.0) if n > 0 else 1.0 for a, n in zip(allocations, needs)] avg_satisfaction = sum(satisfaction) / len(satisfaction) print(f" Average satisfaction: {avg_satisfaction:.2%}") # Show individual process status for i, (a, n, s) in enumerate(zip(allocations, needs, satisfaction)): status = "SATISFIED" if a >= n else f"UNDERSIZED ({s:.0%})" print(f" Process {i}: {a} frames / {n} needed -> {status}") # Scenario: 100 frames, 4 processes with different needsneeds = [40, 30, 50, 20] # Total demand = 140 (overcommit) # Global replacement outcome: memory flows to demanding processes# Process 2 (needs 50) tends to grab more due to higher demandglobal_alloc = [32, 18, 42, 8] # Skewed toward high-demand # Local replacement with equal allocationlocal_equal_alloc = [25, 25, 25, 25] # Equal distribution # Local replacement with proportional allocationtotal_need = sum(needs)local_prop_alloc = [int(n / total_need * 100) for n in needs] print("="*50)print("FAIRNESS ANALYSIS: GLOBAL vs LOCAL")print("="*50)print(f"\nTotal frames: 100")print(f"Process needs: {needs}")print(f"Total demand: {sum(needs)} (overcommit ratio: {sum(needs)/100:.2f}x)") analyze_fairness("Global Replacement (demand-driven)", global_alloc, needs)analyze_fairness("Local Equal Allocation", local_equal_alloc, needs)analyze_fairness("Local Proportional Allocation", local_prop_alloc, needs) # Output:## Global Replacement (demand-driven):# Jain's Index (allocation): 0.867# Jain's Index (service ratio): 0.914# Average satisfaction: 67.1%# ...## Local Equal Allocation:# Jain's Index (allocation): 1.000 (perfectly equal)# Jain's Index (service ratio): 0.856# Average satisfaction: 85.4%# ...## Local Proportional Allocation:# Jain's Index (allocation): 0.899# Jain's Index (service ratio): 1.000 (perfectly proportional)# Average satisfaction: 71.4%What 'fair' means depends on context. For batch processing, equal allocation may be fair. For multi-tier services, proportional to importance may be fair. For cloud billing, 'pay for use' suggests global replacement is fair. Define fairness requirements explicitly before choosing a replacement policy.
Predictability measures how consistently a system behaves across runs and time. For capacity planning, performance modeling, and debugging, predictability is often as valuable as raw performance.
Global replacement inherently reduces predictability:
Because your performance depends on what other processes are doing, identically configured workloads may behave differently at different times or on different machines. This makes benchmarks less meaningful and debugging harder.
Local replacement inherently increases predictability:
Your performance depends only on your workload and your allocation. Given the same allocation, the same workload will behave the same way every time. This enables meaningful capacity planning.
| Aspect | Global | Local |
|---|---|---|
| Benchmark Reproducibility | Low—results vary with co-located workloads | High—same allocation = same results |
| Performance Modeling | Difficult—must model all tenants | Feasible—model individual process |
| Capacity Planning | Requires statistical modeling | Deterministic based on allocations |
| Regression Testing | False positives from interference | Meaningful comparisons possible |
| Debugging Performance | Must consider entire system | Focus on individual process |
| SLA Probability | Lower confidence in guarantees | High confidence if sized correctly |
Engineers often undervalue predictability. Faster-on-average but highly variable performance is often worse than slower-but-consistent performance. Variable performance complicates planning, causes pager fatigue from intermittent issues, and erodes user trust. Local replacement's predictability has value beyond raw metrics.
As systems grow to support more processes, larger memory pools, and higher overcommit ratios, the scalability characteristics of replacement policies become increasingly important.
Global replacement scalability patterns:
| Scaling Dimension | Global Favors | Local Favors |
|---|---|---|
| More processes (fixed memory) | Higher utilization, more interference | Guaranteed minimums, limits process count |
| More memory (fixed processes) | Both scale well; global has slight algorithm overhead | Both scale well; local has cleaner allocation |
| Higher overcommit | Can support; risk increases with ratio | Cannot meaningfully overcommit |
| Heterogeneous workloads | Adapts well to diversity | Requires careful allocation tuning |
| Peak load handling | Gracefully degrades (all suffer) | Some processes fail, others protected |
Large cloud providers have learned that pure global replacement doesn't scale to thousands of tenants per machine. They use hierarchical approaches: global replacement within a tenant's containers, but strict local (cgroup) boundaries between tenants. This hybrid captures global's efficiency within trust boundaries while enforcing local's isolation across them.
The true character of a memory management policy is revealed under memory pressure—when total working set demand exceeds physical memory. This is when the differences between global and local replacement become most stark.
Global replacement under pressure:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192
"""System Behavior Under Memory Pressure Compares how global and local replacement respond to overload.""" def model_global_pressure(total_frames, processes_wss): """ Global replacement: demand exceeds supply All processes degrade proportionally to their demand """ total_demand = sum(processes_wss) overcommit_ratio = total_demand / total_frames print(f"\nGlobal Replacement Under {overcommit_ratio:.2f}x Pressure:") print("-" * 50) # Under global, each process gets roughly proportional share performances = [] for i, wss in enumerate(processes_wss): # Effective frames = proportional share effective_frames = int(wss / total_demand * total_frames) # Performance = frames / need (capped at 100%) perf = min(effective_frames / wss, 1.0) performances.append(perf) status = "DEGRADED" if perf < 0.8 else "OK" print(f" Process {i}: WSS={wss}, gets={effective_frames}, perf={perf:.0%} [{status}]") avg_perf = sum(performances) / len(performances) min_perf = min(performances) print(f"\n Average performance: {avg_perf:.0%}") print(f" Worst process: {min_perf:.0%}") print(f" All processes impacted: YES (shared degradation)") def model_local_pressure(total_frames, processes_wss, allocations): """ Local replacement: each process has fixed allocation Undersized processes suffer; adequately sized are fine """ total_allocation = sum(allocations) print(f"\nLocal Replacement with allocations {allocations}:") print("-" * 50) performances = [] for i, (wss, alloc) in enumerate(zip(processes_wss, allocations)): # Performance based on allocation vs need perf = min(alloc / wss, 1.0) performances.append(perf) if perf >= 0.95: status = "NORMAL" elif perf >= 0.6: status = "DEGRADED" else: status = "THRASHING" print(f" Process {i}: WSS={wss}, alloc={alloc}, perf={perf:.0%} [{status}]") avg_perf = sum(performances) / len(performances) min_perf = min(performances) protected = sum(1 for p in performances if p >= 0.95) print(f"\n Average performance: {avg_perf:.0%}") print(f" Worst process: {min_perf:.0%}") print(f" Fully protected processes: {protected} / {len(processes_wss)}") print(f" Isolation maintained: YES") # Scenario: 100 frames, 4 processesprocesses_wss = [40, 30, 20, 50] # Total demand = 140total_frames = 100 print("="*60)print("MEMORY PRESSURE RESPONSE COMPARISON")print("="*60)print(f"\nTotal frames: {total_frames}")print(f"Process working sets: {processes_wss}")print(f"Total demand: {sum(processes_wss)} (overcommit: {sum(processes_wss)/total_frames:.2f}x)") model_global_pressure(total_frames, processes_wss) # Two local allocation strategiesmodel_local_pressure(total_frames, processes_wss, [25, 25, 25, 25]) # Equalmodel_local_pressure(total_frames, processes_wss, [30, 25, 20, 25]) # Prioritized # Output demonstrates:# - Global: all processes degraded somewhat (60-80%)# - Local equal: processes 0,3 degraded badly; 1,2 protected# - Local prioritized: better outcomes for priority processesGlobal replacement distributes pain across all processes—nobody fails completely, but everybody suffers. Local replacement protects some processes at the cost of others—protected processes work normally while victims suffer severely. Neither approach is universally better; the right choice depends on whether 'shared degradation' or 'selective protection' better serves your requirements.
To make informed decisions about replacement policies, engineers need a framework for quantifying the trade-offs. This section provides metrics and formulas for comparing policies objectively.
Key quantitative metrics:
| Metric | Formula | What It Measures |
|---|---|---|
| System Throughput | Σ(work_i × efficiency_i) | Total useful work across all processes |
| Memory Utilization | used_frames / total_frames | Fraction of memory productively employed |
| Aggregate Page Fault Rate | Σ(fault_rate_i) | Total faults per second system-wide |
| Latency Variance | Var(latency_i) across samples | Consistency of response times |
| Jain Fairness Index | (Σx_i)² / (n × Σx_i²) | Equality of resource distribution |
| Protection Ratio | processes_making_progress / total_processes | Fraction of processes not starved |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
"""Quantitative Comparison Framework for Replacement Policies Provides objective metrics for policy evaluation.""" from dataclasses import dataclassfrom typing import Listimport statistics @dataclass class ProcessMetrics: pid: int working_set_size: int frames_allocated: int page_fault_rate: float # faults per second latency_samples: List[float] # milliseconds work_completed: float # abstract work units class PolicyEvaluator: """Evaluate a replacement policy based on process metrics""" def __init__(self, metrics: List[ProcessMetrics], total_frames: int): self.metrics = metrics self.total_frames = total_frames def system_throughput(self) -> float: """Total work completed across all processes""" return sum(m.work_completed for m in self.metrics) def memory_utilization(self) -> float: """Fraction of memory in productive use""" used = sum(m.frames_allocated for m in self.metrics) # Adjust for efficiency: frames not meeting WSS are less productive effective_use = sum( min(m.frames_allocated, m.working_set_size) for m in self.metrics ) return effective_use / self.total_frames def aggregate_fault_rate(self) -> float: """Total page faults per second""" return sum(m.page_fault_rate for m in self.metrics) def latency_variance(self) -> float: """Average variance in latency across processes""" variances = [ statistics.variance(m.latency_samples) if len(m.latency_samples) > 1 else 0 for m in self.metrics ] return statistics.mean(variances) def jains_fairness(self) -> float: """Jain's fairness index on service ratio (allocation/need)""" service_ratios = [ min(m.frames_allocated / m.working_set_size, 1.0) for m in self.metrics ] n = len(service_ratios) sum_x = sum(service_ratios) sum_x2 = sum(x**2 for x in service_ratios) if sum_x2 == 0: return 1.0 return (sum_x ** 2) / (n * sum_x2) def protection_ratio(self, threshold=0.8) -> float: """Fraction of processes with adequate memory""" adequately_served = sum( 1 for m in self.metrics if m.frames_allocated >= threshold * m.working_set_size ) return adequately_served / len(self.metrics) def generate_report(self, policy_name: str) -> dict: """Generate comprehensive evaluation report""" report = { 'policy': policy_name, 'throughput': self.system_throughput(), 'utilization': self.memory_utilization(), 'fault_rate': self.aggregate_fault_rate(), 'latency_variance': self.latency_variance(), 'fairness': self.jains_fairness(), 'protection': self.protection_ratio(), } print(f"\n{'='*60}") print(f"POLICY EVALUATION: {policy_name}") print(f"{'='*60}") for key, value in report.items(): if key == 'policy': continue if isinstance(value, float): if key in ['utilization', 'fairness', 'protection']: print(f" {key:20}: {value:.1%}") else: print(f" {key:20}: {value:.2f}") return report # Example usage would compare global vs local metrics# collected from actual system observation or simulationUse this framework to score policies against your specific priorities. If throughput is paramount, global likely wins. If latency predictability matters most, local likely wins. If fairness under load is critical, local wins. Quantify the trade-offs rather than making qualitative assumptions.
We have thoroughly explored the performance implications of global versus local replacement policies across multiple dimensions.
Next: Choosing Strategy
In the final page of this module, we synthesize everything learned to provide a decision framework for choosing between global and local replacement strategies in various real-world scenarios.
You now have a comprehensive understanding of the performance implications of replacement policy choices. This knowledge enables you to make informed trade-offs between throughput, latency, fairness, predictability, and scalability when designing or configuring memory management systems.