Operating SystemsWorking Set and Page Fault Frequency

Working Set Model and Page Fault Frequency

LevelAdvanced

Duration90 mins

TopicWorking Set and Page Fault Frequency

4 / 5

Dynamic Allocation

Orchestrating Memory Across Processes

We've explored how to determine a single process's memory needs—through working set tracking or page fault frequency. But an operating system doesn't manage memory for one process in isolation. It must coordinate allocation across all processes while pursuing multiple, sometimes conflicting goals:

Maximize throughput: Keep as many processes productive as possible
Minimize response time: Give interactive processes the memory they need
Ensure fairness: Don't let one greedy process starve others
Prevent thrashing: Maintain system stability under load
Optimize utilization: Use available memory effectively, not wastefully

This page explores dynamic allocation—the mechanisms and policies that translate per-process memory needs into system-wide allocation decisions.

What You Will Learn

By the end of this page, you will understand: (1) The global allocation problem and its constraints, (2) Allocation policies (equal, proportional, priority-based), (3) How to balance local needs with global resources, (4) Frame allocation algorithms in practice, (5) Handling over-commitment through load control, and (6) Modern approaches in production systems.

The Global Allocation Problem

Consider a system with M physical frames and N running processes, where process i has working set size WSS_i. The global allocation problem is:

Given: M frames, N processes with varying memory needs Goal: Allocate frames to each process to optimize system performance Constraint: Sum of allocations ≤ M (can't allocate more than exists)

Three Fundamental Cases:

Case	Condition	Situation	Solution
Under-committed	Σ WSS_i < M	More frames than needed	Give each process its WSS; pool remainder
Balanced	Σ WSS_i ≈ M	Memory matches demand	Each process gets approximately its WSS
Over-committed	Σ WSS_i > M	More demand than supply	Someone must be shortchanged or suspended

The balanced case is the operating point we seek. The under-committed case is easy (everyone's happy). The over-committed case is the challenge.

The Tragedy of the Commons

Without coordination, over-commitment leads to a tragedy of the commons: each process grabs frames for its own benefit, but collectively they exhaust resources, and everyone suffers. This is exactly the thrashing phenomenon. Dynamic allocation exists to prevent this tragedy through centralized coordination.

Constraints on Allocation:

Any allocation scheme must respect several constraints:

1. Minimum Frames: Every process needs a minimum number of frames to execute at all. This minimum is architecturally determined:

Instruction locality: At least one frame for the current instruction page
Multi-page instructions: Some architectures allow instructions spanning pages
Operand pages: Memory operands may be on different pages
Indirect addressing: Multiple levels of indirection may touch multiple pages

Example: If an instruction can reference itself, a source operand, a destination operand, and two levels of indirect addressing, the minimum is 5 frames.

2. Maximum Frames: A process shouldn't get unlimited frames:

Fairness: Other processes need memory too
Utility: Beyond the working set, additional frames don't help
Policy: Resource limits may cap per-process allocation

3. Total Constraint: Allocations must sum to at most M. This seems obvious but has profound implications—allocation is a zero-sum game. Giving more to one process means taking from others.

4. Dynamic Nature: Allocations must change over time:

Process working sets grow and shrink
Processes start and terminate
System load varies throughout the day
Priority relationships may shift

Allocation Policies

Several policies determine how to divide frames among processes. Each has strengths and weaknesses.

Core Allocation Policies

•Equal Allocation — Divide frames equally among all processes. Simple but ignores varying needs—a 10MB process gets the same as a 100MB process.
•Proportional Allocation — Allocate proportional to process size (virtual address space or working set). Larger processes get more frames.
•Priority-Based Allocation — Higher-priority processes get more frames or preferential treatment during shortages. Can combine with proportional.
•Need-Based Allocation — Allocate based on observed behavior (WSS or PFF). Processes that need more get more; processes that need less give up excess.
•Hybrid Policies — Combine multiple approaches: proportional base allocation with priority-based adjustments and need-based fine-tuning.

Equal Allocation:

Formula: Each process gets M/N frames

Example: 100 frames, 5 processes → 20 frames each

Pro	Con
Simple to implement	Ignores process differences
Clearly fair in one sense	Small processes waste frames
No measurement overhead	Large processes may thrash
Easy to understand	Doesn't adapt to behavior

Proportional Allocation:

Formula: Process i gets (S_i / Σ S_j) × M frames, where S_i is process i's size

Example: Two processes with sizes 10MB and 40MB, 100 frames available:

Process 1: (10/50) × 100 = 20 frames
Process 2: (40/50) × 100 = 80 frames

Pro	Con
Considers process size	Size ≠ working set
Fair in proportion	Sparse address spaces over-allocated
No runtime measurement	Doesn't adapt to phases

Size vs. Need

A critical distinction: process size (virtual address space) differs from need (working set). A 1GB address space process might actively use only 50MB. Proportional allocation based on address space size can severely over-allocate to sparse processes.

Priority-Based Allocation:

Priority can influence allocation in multiple ways:

Proportional scaling: Higher priority increases the effective "size" for proportional calculation
Minimum guarantees: High-priority processes get guaranteed minimums regardless of load
Shortage handling: When frames are scarce, low-priority processes lose frames first
Replacement scope: High-priority processes may steal from low-priority ones (global replacement with priority constraints)

Example with priority weighting:

Process 1: size 10MB, priority weight 2.0
Process 2: size 40MB, priority weight 1.0
Effective sizes: 20MB and 40MB
Allocation: 33 and 67 frames (instead of 20 and 80)

Need-Based (Dynamic) Allocation:

This is where working set and PFF concepts are applied:

Monitor each process's actual needs (WSS or PFF)
Allocate based on observed needs, not static sizes
Continuously adjust as needs change

This is the most sophisticated approach but requires accurate measurement of working set or fault rate.

Global vs. Local Replacement

A crucial design decision is whether page replacement operates globally (across all processes) or locally (within each process).

Local Replacement: When process P needs a frame, it evicts one of its own pages. Its allocation stays constant.

Global Replacement: When process P needs a frame, it can take a frame from any process (including itself). Allocations shift based on demand.

Local Replacement

•Fixed allocation per process
•Process performance depends only on its own behavior
•No interference between processes
•Predictable memory isolation
•Working set must fit in fixed allocation
•Cannot adapt to varying needs

Global Replacement

•Allocations change dynamically
•One process can affect another's memory
•Risk of interference and instability
•Unpredictable per-process allocation
•Naturally adapts to varying needs
•Can utilize memory more efficiently

The Hybrid Approach: Scoped Global Replacement

Most modern systems use a hybrid:

Soft limits per process: Preferred allocation that's maintained when possible
Hard limits: Absolute minimum/maximum that cannot be violated
Global pool: Frames beyond soft limits are shared globally
Priority-aware replacement: When stealing from other processes, prefer low-priority victims

This gives the benefits of both:

Processes have baseline isolation (like local)
Unused memory can flow to where it's needed (like global)
High-priority processes are protected
The system adapts to varying workloads

Implementation Sketch:

scoped_global_replacement.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// Scoped global replacement policy
 
typedef struct {
    int soft_min;     // Preferred minimum allocation
    int hard_min;     // Absolute minimum (never violate)
    int soft_max;     // Preferred maximum
    int hard_max;     // Absolute maximum (cap)
    int current;      // Currently allocated frames
} ProcessMemoryLimits;
 
// Called when a process needs a frame
FrameAllocation find_frame_for_process(Process* requester) {
    // Strategy 1: Take from own excess (if above soft max)
    if (requester->limits.current > requester->limits.soft_max) {
        return evict_from_process(requester, LRU);
    }
    
    // Strategy 2: Take from global free pool
    if (global_free_frames > 0) {
        return allocate_free_frame();
    }
    
    // Strategy 3: Take from other process with excess
    for (Process* p : all_processes_by_priority_ascending) {
        if (p != requester && 
            p->limits.current > p->limits.soft_min &&
            p->priority < requester->priority) {
            return evict_from_process(p, LRU);
        }
    }
    
    // Strategy 4: Take from self (standard local replacement)
    if (requester->limits.current > requester->limits.hard_min) {
        return evict_from_process(requester, LRU);
    }
    
    // Strategy 5: Emergency - steal from anyone above hard_min
    for (Process* p : all_processes_by_priority_ascending) {
        if (p->limits.current > p->limits.hard_min) {
            return evict_from_process(p, LRU);
        }
    }
    
    // Strategy 6: No frames available - must suspend a process
    return initiate_swapping();
}

Priority Inversion Risk

Global replacement can cause priority inversion: a low-priority process grabs memory from a high-priority one. This is why priority-aware victim selection is essential—always prefer to steal from lower-priority processes.

Load Control and Swapping

When total memory demand exceeds supply, no amount of clever allocation can satisfy everyone. The system must reduce the number of competing processes—this is load control.

The Load Control Decision:

When Σ WSS_i > M (working sets exceed frames):

Detect the condition: Monitor system-wide fault rate, memory pressure, swap activity
Select victim(s): Choose process(es) to suspend
Swap out victim: Remove all its pages from memory
Wait for conditions to improve: Suspended processes don't compete for frames
Swap in later: Bring suspended processes back when memory pressure eases

Victim Selection Criteria for Suspension
Criterion	Why It Matters	Tradeoff
Lowest Priority	Protect important processes	Low-priority work may never complete
Largest Working Set	Frees most memory per suspension	May suspend process close to completion
Longest Wait Time	Round-robin fairness	May suspend most productive process
Most Recently Swapped In	Avoid thrashing same process	May repeatedly suspend same 'unlucky' process
Blocked Process	Not actively running anyway	Will need frames when unblocked
Highest Recent Allocation Growth	Penalize memory-hungry process	May be legitimate growing need

Degree of Multiprogramming Control:

The degree of multiprogramming (number of processes in memory) must be controlled to prevent thrashing:

Too high: Insufficient frames per process → thrashing
Too low: CPU idle while waiting for I/O → underutilization
Optimal: Each process has enough frames; CPU busy; I/O balanced

Feedback Control Loop:

Measure: System-wide page fault rate, CPU utilization, swap activity

If thrashing detected (high faults + low CPU):
    Decrease multiprogramming (swap out processes)

If under-utilized (low faults + low CPU due to blocking):
    Increase multiprogramming (swap in processes)

If healthy (low faults + high CPU):
    Maintain current multiprogramming level

Swapping Mechanics:

Swapping a process out involves:

Select victim: Using criteria above
Stop execution: Remove from ready queue
Write dirty pages: Flush modified pages to swap space
Free frames: Return all process's frames to free pool
Mark as swapped: Process exists but has no memory

Swapping back in:

Allocate initial frames: Enough for first execution burst
Load minimal pages: OS pages, entry point, likely data
Add to ready queue: Process can be scheduled
Demand page remainder: Other pages loaded on fault

The Swapping Cost

Swapping is expensive—writing/reading many pages to/from disk. It should be a last resort. Good allocation policies prevent the need for swapping by keeping the system near the balanced operating point. Frequent swapping indicates fundamental memory shortage.

Practical Allocation Algorithms

Let's examine concrete algorithms for dynamic allocation that combine the concepts we've discussed.

Algorithm 1: Combined WSS + PFF

This algorithm tracks both working set size and page fault frequency:

For each process:
    wss = estimate_working_set_size(process)
    pff = measure_page_fault_frequency(process)
    
    // Allocation target based on WSS
    target_allocation = wss * (1 + headroom_factor)
    
    // Adjust based on PFF
    if pff > upper_threshold:
        target_allocation *= 1.2  // Need more than WSS suggests
    else if pff < lower_threshold:
        target_allocation *= 0.9  // Can get by with less
    
    // Constrain to limits
    target_allocation = clamp(target_allocation, min_frames, max_frames)
    
    process.target_frames = target_allocation

// Global reconciliation
if sum(targets) <= available_frames:
    // Under-committed: everyone gets their target
    allocate_to_targets()
else:
    // Over-committed: scale down proportionally, prioritize
    scale_allocations_to_fit()

Algorithm 2: Pure PFF with Global Balancing

This simpler algorithm uses only PFF:

periodically:
    for each process p:
        if p.pff > upper_threshold and free_frames > 0:
            give_frame_to(p)
        else if p.pff < lower_threshold and p.frames > min_frames:
            take_frame_from(p)
    
    // Check for systemic over-commitment
    if global_pff > emergency_threshold:
        victim = select_swap_victim()
        swap_out(victim)
    
    // Check for under-utilization
    if global_pff < low_threshold and swapped_processes exist:
        candidate = select_swap_in_candidate()
        swap_in(candidate)

adaptive_allocation.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// Adaptive allocation algorithm implementation
 
typedef struct {
    Process* proc;
    int current_frames;
    int target_frames;
    double priority_weight;
    double pff;
    int wss_estimate;
} ProcessAllocationEntry;
 
void compute_target_allocations(ProcessAllocationEntry* entries, int n) {
    int total_available = total_physical_frames - kernel_frames;
    
    // Phase 1: Compute unconstrained targets
    int total_demand = 0;
    for (int i = 0; i < n; i++) {
        ProcessAllocationEntry* e = &entries[i];
        
        // Base target is working set estimate with headroom
        e->target_frames = e->wss_estimate * 1.1;
        
        // Adjust based on fault rate
        if (e->pff > UPPER_PFF_THRESHOLD) {
            e->target_frames *= 1.0 + min(0.5, (e->pff - UPPER_PFF_THRESHOLD) / UPPER_PFF_THRESHOLD);
        } else if (e->pff < LOWER_PFF_THRESHOLD) {
            e->target_frames *= 0.9;
        }
        
        // Apply limits
        e->target_frames = max(e->target_frames, MIN_FRAMES);
        e->target_frames = min(e->target_frames, e->proc->max_frames);
        
        total_demand += e->target_frames;
    }
    
    // Phase 2: Scale if over-committed
    if (total_demand > total_available) {
        double scale = (double)total_available / total_demand;
        
        // Priority-weighted scaling: high priority gets less reduction
        double total_weighted_demand = 0;
        for (int i = 0; i < n; i++) {
            total_weighted_demand += entries[i].target_frames * entries[i].priority_weight;
        }
        
        for (int i = 0; i < n; i++) {
            ProcessAllocationEntry* e = &entries[i];
            // Higher weight = less scaling (closer to 1.0)
            double adjusted_scale = scale + (1.0 - scale) * 
                                   (e->priority_weight - 1.0) / MAX_PRIORITY_WEIGHT;
            e->target_frames *= adjusted_scale;
            e->target_frames = max(e->target_frames, MIN_FRAMES);
        }
    }
}
 
void apply_allocations(ProcessAllocationEntry* entries, int n) {
    // Apply targets gradually to avoid thrashing
    for (int i = 0; i < n; i++) {
        ProcessAllocationEntry* e = &entries[i];
        int delta = e->target_frames - e->current_frames;
        
        // Don't change more than 20% per interval for stability
        int max_change = max(1, e->current_frames / 5);
        delta = clamp(delta, -max_change, max_change);
        
        if (delta > 0) {
            add_frames_to_process(e->proc, delta);
        } else if (delta < 0) {
            remove_frames_from_process(e->proc, -delta);
        }
    }
}

Gradual Adjustment

Note the 20% limit on per-interval change. Rapid allocation changes can cause instability—a process suddenly losing 50% of its frames will spike in fault rate, triggering allocation increase, causing oscillation. Gradual adjustment provides stability.

Modern System Approaches

Modern operating systems don't strictly implement either working set or PFF. Instead, they use pragmatic approaches that achieve similar goals with simpler mechanisms.

Linux: Memory Pressure-Based Reclamation

Linux doesn't track per-process working sets or fault rates explicitly. Instead:

Active/Inactive Lists: Pages are on active (recently used) or inactive (candidates for reclaim) lists
Page Age Tracking: Reference bits are sampled periodically; unreferenced pages age
Memory Pressure Detection: When free memory is low, reclamation begins
Proportional Reclamation: Under pressure, reclaim proportionally from all processes
OOM Killer: When pressure is extreme, terminate processes to free memory

This is essentially global replacement with recency-based victim selection, plus load control via OOM.

Linux Memory Management Key Concepts
Concept	Mechanism	Purpose
kswapd	Kernel daemon that reclaims pages in background	Keep free memory above watermarks
Direct Reclaim	Process reclaims synchronously when allocating	Handle sudden memory needs
LRU Lists	Active and inactive lists track page age	Approximate LRU for victim selection
Memory Cgroups	Hierarchical memory limits per container/group	Isolation and resource control
OOM Killer	Terminates process when memory exhausted	Last-resort load control
Writeback	Flush dirty pages to disk in background	Prepare pages for eviction

Windows: Working Set Manager

Windows is closer to classic working set model:

Per-Process Working Set: Each process has a working set with min/max limits
Aging: Reference bits sampled; unreferenced pages age out of working set
Trimming: Under pressure, working sets are trimmed (pages removed but retained in standby list)
Standby List: Recently evicted pages stay in memory as soft cache
Balance Set Manager: System thread that coordinates working set adjustment

Key Differences from Classic Model:

Uses timer-based aging rather than reference-count windows
Standby list provides soft fault handling (no disk I/O for recently evicted pages)
Administrator-configurable min/max working set sizes
Memory priority tiers for different page classes

Cgroups and Containers

In containerized environments (Docker, Kubernetes), memory cgroups provide hierarchical allocation. Each container has memory limits, and the OS enforces them. This moves allocation decisions up a level—the container runtime configures limits; the OS enforces them. This model dominates cloud computing.

FreeBSD: Soft/Hard Limit Model

FreeBSD uses resource limits with soft and hard bounds:

Limit Type	Behavior
Soft limit	Process receives signals/warnings when exceeded
Hard limit	Allocation fails if would exceed
Per-process	Individual process constraints
Per-user	Total across all user's processes
System-wide	Global constraints

This provides flexibility for administrators to tune behavior.

VMware ESXi: Hypervisor Memory Management

Virtualization adds another layer:

Shares: VMs compete for memory proportional to configured shares
Reservation: Guaranteed minimum for critical VMs
Limit: Maximum a VM can use
Balloon driver: Guest OS voluntarily frees pages under pressure
Transparent page sharing: Identical pages across VMs use one physical copy

This is essentially priority-based proportional allocation with guaranteed minimums.

Allocation in Practice: Case Studies

Let's examine how allocation plays out in realistic scenarios.

Scenario 1: Database Server with Web Frontend

Workload:

Database: 16GB working set, steady access pattern
Web server: 500MB base + 100MB per active user, highly variable
System: 20GB physical RAM

Allocation Strategy:

Database: High priority, soft limit 16GB, hard limit 18GB
Web server: Medium priority, soft limit 2GB, hard limit 4GB, with per-user tracking
Reserve 2GB for OS and emergency

Behavior:

Normal load: Database uses 16GB, web uses 1-2GB, 2GB free
Traffic spike: Web grows toward 4GB, steals from database's excess
Database can't drop below 16GB (its working set), but web expansion limited to 4GB
If pressure continues, web starts soft faulting (pages move to standby, not disk)

Result: Stable performance; database protected; web absorbs variation

allocation_scenario.md
# Scenario 2: Build Server with Multiple Jobs
 
## Workload
- Build jobs: Variable memory needs (500MB to 8GB each)
- Concurrent jobs: Up to 16 parallel builds
- Total RAM: 64GB
 
## Challenge
Build jobs have phases:
1. Parse/compile: Small WSS
2. Link: Large WSS (loads all objects)
3. Test: Variable WSS
 
Jobs in link phase need much more memory temporarily.
 
## Allocation Strategy
 
### Priority Tiers
| Tier | Jobs | Memory Share | Preemption |
|------|------|--------------|------------|
| High | CI/critical path | 40% (25.6GB) | Protected |
| Medium | Developer builds | 35% (22.4GB) | Preemptable |
| Low | Speculative builds | 25% (16GB) | First to suspend |
 
### Dynamic Adjustment
1. Jobs start with 1GB allocation
2. PFF monitored; allocation grows as needed
3. When job enters link phase (detected by syscall patterns):
   - Pre-allocate additional frames
   - Delay other job starts if memory tight
4. After link phase, excess reclaimed
 
### Over-Commitment Handling
1. First: Delay starting new low-priority jobs
2. Second: Suspend idle speculative builds
3. Third: OOM-kill if truly exhausted (rare)
 
## Result
- CI jobs reliably complete
- Developer builds usually fast
- Speculative builds best-effort
- System rarely thrashes

Scenario 3: Mixed Interactive and Batch

Workload:

Interactive users: Shell sessions, editors, browsers
Batch jobs: Scientific computation, rendering
Total: 100 users, 50-100 batch jobs, 256GB RAM

Challenge:

Interactive must be responsive (sub-second)
Batch can tolerate delays but should complete eventually

Allocation Strategy:

Interactive tier: High priority, guaranteed response
- Shell/editor: 64KB min, 256MB max per session
- Browser: 256MB min, 4GB max
- Collective: Up to 160GB (100 users × 1.6GB average)
Batch tier: Lower priority, best effort
- Min 256MB per job for forward progress
- Collectively: Whatever's left after interactive
Pressure response:
- Light: Batch jobs share remaining memory
- Medium: Batch job working sets trimmed
- Heavy: Low-priority batch jobs suspended
- Extreme: OOM-kill batch to protect interactive

Key Insight: Interactive responsiveness depends on having working set in memory. Even 100ms of page fault latency is noticeable. Batch jobs can tolerate higher fault rates—they just take longer.

The Responsiveness Imperative

For interactive workloads, preventing page faults is more important than maximizing throughput. A user waiting 200ms for a page fault notices; a batch job taking 60 instead of 58 minutes doesn't matter. Allocation policies should encode this asymmetry.

Summary: Dynamic Allocation

We've explored dynamic allocation in comprehensive depth—from fundamental policies to practical algorithms and real-world case studies. Let's consolidate the key insights:

Key Takeaways

•The Global Problem — Allocation across N processes is constrained by total memory M. Under-commitment is easy; over-commitment requires trade-offs or load control.
•Allocation Policies — Equal, proportional, priority-based, and need-based policies offer different tradeoffs. Most systems use hybrids.
•Global vs. Local Replacement — Pure local provides isolation but can't adapt; pure global adapts but causes interference. Scoped global (with priority awareness) balances both.
•Load Control — When demand exceeds supply, swapping (suspending entire processes) is necessary. Victim selection is policy-dependent.
•Practical Algorithms — Combine WSS and PFF information; scale under over-commitment; adjust gradually for stability.
•Modern Systems — Linux uses pressure-based reclamation; Windows uses working set manager; containers use cgroups. All approximate classic models.

What's Next:

In the final page of this module, we'll explore Thrashing Prevention—how operating systems detect, respond to, and prevent the catastrophic condition where page faulting dominates useful work. This brings together everything we've learned about working sets, PFF, and dynamic allocation.

Page Complete

You now understand dynamic frame allocation—how operating systems coordinate memory across all processes to maximize performance while preventing thrashing. This understanding connects individual process needs (working set, PFF) to system-wide behavior.

4 / 5

Loading learning content...

Operating SystemsWorking Set and Page Fault Frequency

Working Set Model and Page Fault Frequency

LevelAdvanced

Duration90 mins

TopicWorking Set and Page Fault Frequency

4 / 5

Dynamic Allocation

Orchestrating Memory Across Processes

Maximize throughput: Keep as many processes productive as possible
Minimize response time: Give interactive processes the memory they need
Ensure fairness: Don't let one greedy process starve others
Prevent thrashing: Maintain system stability under load
Optimize utilization: Use available memory effectively, not wastefully

This page explores dynamic allocation—the mechanisms and policies that translate per-process memory needs into system-wide allocation decisions.

What You Will Learn

The Global Allocation Problem

Consider a system with M physical frames and N running processes, where process i has working set size WSS_i. The global allocation problem is:

Three Fundamental Cases:

Case	Condition	Situation	Solution
Under-committed	Σ WSS_i < M	More frames than needed	Give each process its WSS; pool remainder
Balanced	Σ WSS_i ≈ M	Memory matches demand	Each process gets approximately its WSS
Over-committed	Σ WSS_i > M	More demand than supply	Someone must be shortchanged or suspended

The balanced case is the operating point we seek. The under-committed case is easy (everyone's happy). The over-committed case is the challenge.

The Tragedy of the Commons

Constraints on Allocation:

Any allocation scheme must respect several constraints:

1. Minimum Frames: Every process needs a minimum number of frames to execute at all. This minimum is architecturally determined:

Instruction locality: At least one frame for the current instruction page
Multi-page instructions: Some architectures allow instructions spanning pages
Operand pages: Memory operands may be on different pages
Indirect addressing: Multiple levels of indirection may touch multiple pages

Example: If an instruction can reference itself, a source operand, a destination operand, and two levels of indirect addressing, the minimum is 5 frames.

2. Maximum Frames: A process shouldn't get unlimited frames:

Fairness: Other processes need memory too
Utility: Beyond the working set, additional frames don't help
Policy: Resource limits may cap per-process allocation

3. Total Constraint: Allocations must sum to at most M. This seems obvious but has profound implications—allocation is a zero-sum game. Giving more to one process means taking from others.

4. Dynamic Nature: Allocations must change over time:

Process working sets grow and shrink
Processes start and terminate
System load varies throughout the day
Priority relationships may shift

Allocation Policies

Several policies determine how to divide frames among processes. Each has strengths and weaknesses.

Core Allocation Policies

•Equal Allocation — Divide frames equally among all processes. Simple but ignores varying needs—a 10MB process gets the same as a 100MB process.
•Proportional Allocation — Allocate proportional to process size (virtual address space or working set). Larger processes get more frames.
•Priority-Based Allocation — Higher-priority processes get more frames or preferential treatment during shortages. Can combine with proportional.
•Need-Based Allocation — Allocate based on observed behavior (WSS or PFF). Processes that need more get more; processes that need less give up excess.
•Hybrid Policies — Combine multiple approaches: proportional base allocation with priority-based adjustments and need-based fine-tuning.

Equal Allocation:

Formula: Each process gets M/N frames

Example: 100 frames, 5 processes → 20 frames each

Pro	Con
Simple to implement	Ignores process differences
Clearly fair in one sense	Small processes waste frames
No measurement overhead	Large processes may thrash
Easy to understand	Doesn't adapt to behavior

Proportional Allocation:

Formula: Process i gets (S_i / Σ S_j) × M frames, where S_i is process i's size

Example: Two processes with sizes 10MB and 40MB, 100 frames available:

Process 1: (10/50) × 100 = 20 frames
Process 2: (40/50) × 100 = 80 frames

Pro	Con
Considers process size	Size ≠ working set
Fair in proportion	Sparse address spaces over-allocated
No runtime measurement	Doesn't adapt to phases

Size vs. Need

Priority-Based Allocation:

Priority can influence allocation in multiple ways:

Proportional scaling: Higher priority increases the effective "size" for proportional calculation
Minimum guarantees: High-priority processes get guaranteed minimums regardless of load
Shortage handling: When frames are scarce, low-priority processes lose frames first
Replacement scope: High-priority processes may steal from low-priority ones (global replacement with priority constraints)

Example with priority weighting:

Process 1: size 10MB, priority weight 2.0
Process 2: size 40MB, priority weight 1.0
Effective sizes: 20MB and 40MB
Allocation: 33 and 67 frames (instead of 20 and 80)

Need-Based (Dynamic) Allocation:

This is where working set and PFF concepts are applied:

Monitor each process's actual needs (WSS or PFF)
Allocate based on observed needs, not static sizes
Continuously adjust as needs change

This is the most sophisticated approach but requires accurate measurement of working set or fault rate.

Global vs. Local Replacement

A crucial design decision is whether page replacement operates globally (across all processes) or locally (within each process).

Local Replacement: When process P needs a frame, it evicts one of its own pages. Its allocation stays constant.

Global Replacement: When process P needs a frame, it can take a frame from any process (including itself). Allocations shift based on demand.

Local Replacement

•Fixed allocation per process
•Process performance depends only on its own behavior
•No interference between processes
•Predictable memory isolation
•Working set must fit in fixed allocation
•Cannot adapt to varying needs

Global Replacement

•Allocations change dynamically
•One process can affect another's memory
•Risk of interference and instability
•Unpredictable per-process allocation
•Naturally adapts to varying needs
•Can utilize memory more efficiently

The Hybrid Approach: Scoped Global Replacement

Most modern systems use a hybrid:

Soft limits per process: Preferred allocation that's maintained when possible
Hard limits: Absolute minimum/maximum that cannot be violated
Global pool: Frames beyond soft limits are shared globally
Priority-aware replacement: When stealing from other processes, prefer low-priority victims

This gives the benefits of both:

Processes have baseline isolation (like local)
Unused memory can flow to where it's needed (like global)
High-priority processes are protected
The system adapts to varying workloads

Implementation Sketch:

scoped_global_replacement.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// Scoped global replacement policy
 
typedef struct {
    int soft_min;     // Preferred minimum allocation
    int hard_min;     // Absolute minimum (never violate)
    int soft_max;     // Preferred maximum
    int hard_max;     // Absolute maximum (cap)
    int current;      // Currently allocated frames
} ProcessMemoryLimits;
 
// Called when a process needs a frame
FrameAllocation find_frame_for_process(Process* requester) {
    // Strategy 1: Take from own excess (if above soft max)
    if (requester->limits.current > requester->limits.soft_max) {
        return evict_from_process(requester, LRU);
    }
    
    // Strategy 2: Take from global free pool
    if (global_free_frames > 0) {
        return allocate_free_frame();
    }
    
    // Strategy 3: Take from other process with excess
    for (Process* p : all_processes_by_priority_ascending) {
        if (p != requester && 
            p->limits.current > p->limits.soft_min &&
            p->priority < requester->priority) {
            return evict_from_process(p, LRU);
        }
    }
    
    // Strategy 4: Take from self (standard local replacement)
    if (requester->limits.current > requester->limits.hard_min) {
        return evict_from_process(requester, LRU);
    }
    
    // Strategy 5: Emergency - steal from anyone above hard_min
    for (Process* p : all_processes_by_priority_ascending) {
        if (p->limits.current > p->limits.hard_min) {
            return evict_from_process(p, LRU);
        }
    }
    
    // Strategy 6: No frames available - must suspend a process
    return initiate_swapping();
}

Priority Inversion Risk

Load Control and Swapping

When total memory demand exceeds supply, no amount of clever allocation can satisfy everyone. The system must reduce the number of competing processes—this is load control.

The Load Control Decision:

When Σ WSS_i > M (working sets exceed frames):

Detect the condition: Monitor system-wide fault rate, memory pressure, swap activity
Select victim(s): Choose process(es) to suspend
Swap out victim: Remove all its pages from memory
Wait for conditions to improve: Suspended processes don't compete for frames
Swap in later: Bring suspended processes back when memory pressure eases

Victim Selection Criteria for Suspension
Criterion	Why It Matters	Tradeoff
Lowest Priority	Protect important processes	Low-priority work may never complete
Largest Working Set	Frees most memory per suspension	May suspend process close to completion
Longest Wait Time	Round-robin fairness	May suspend most productive process
Most Recently Swapped In	Avoid thrashing same process	May repeatedly suspend same 'unlucky' process
Blocked Process	Not actively running anyway	Will need frames when unblocked
Highest Recent Allocation Growth	Penalize memory-hungry process	May be legitimate growing need

Degree of Multiprogramming Control:

The degree of multiprogramming (number of processes in memory) must be controlled to prevent thrashing:

Too high: Insufficient frames per process → thrashing
Too low: CPU idle while waiting for I/O → underutilization
Optimal: Each process has enough frames; CPU busy; I/O balanced

Feedback Control Loop:

Measure: System-wide page fault rate, CPU utilization, swap activity

If thrashing detected (high faults + low CPU):
    Decrease multiprogramming (swap out processes)

If under-utilized (low faults + low CPU due to blocking):
    Increase multiprogramming (swap in processes)

If healthy (low faults + high CPU):
    Maintain current multiprogramming level

Swapping Mechanics:

Swapping a process out involves:

Select victim: Using criteria above
Stop execution: Remove from ready queue
Write dirty pages: Flush modified pages to swap space
Free frames: Return all process's frames to free pool
Mark as swapped: Process exists but has no memory

Swapping back in:

Allocate initial frames: Enough for first execution burst
Load minimal pages: OS pages, entry point, likely data
Add to ready queue: Process can be scheduled
Demand page remainder: Other pages loaded on fault

The Swapping Cost

Practical Allocation Algorithms

Let's examine concrete algorithms for dynamic allocation that combine the concepts we've discussed.

Algorithm 1: Combined WSS + PFF

This algorithm tracks both working set size and page fault frequency:

For each process:
    wss = estimate_working_set_size(process)
    pff = measure_page_fault_frequency(process)
    
    // Allocation target based on WSS
    target_allocation = wss * (1 + headroom_factor)
    
    // Adjust based on PFF
    if pff > upper_threshold:
        target_allocation *= 1.2  // Need more than WSS suggests
    else if pff < lower_threshold:
        target_allocation *= 0.9  // Can get by with less
    
    // Constrain to limits
    target_allocation = clamp(target_allocation, min_frames, max_frames)
    
    process.target_frames = target_allocation

// Global reconciliation
if sum(targets) <= available_frames:
    // Under-committed: everyone gets their target
    allocate_to_targets()
else:
    // Over-committed: scale down proportionally, prioritize
    scale_allocations_to_fit()

Algorithm 2: Pure PFF with Global Balancing

This simpler algorithm uses only PFF:

periodically:
    for each process p:
        if p.pff > upper_threshold and free_frames > 0:
            give_frame_to(p)
        else if p.pff < lower_threshold and p.frames > min_frames:
            take_frame_from(p)
    
    // Check for systemic over-commitment
    if global_pff > emergency_threshold:
        victim = select_swap_victim()
        swap_out(victim)
    
    // Check for under-utilization
    if global_pff < low_threshold and swapped_processes exist:
        candidate = select_swap_in_candidate()
        swap_in(candidate)

adaptive_allocation.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// Adaptive allocation algorithm implementation
 
typedef struct {
    Process* proc;
    int current_frames;
    int target_frames;
    double priority_weight;
    double pff;
    int wss_estimate;
} ProcessAllocationEntry;
 
void compute_target_allocations(ProcessAllocationEntry* entries, int n) {
    int total_available = total_physical_frames - kernel_frames;
    
    // Phase 1: Compute unconstrained targets
    int total_demand = 0;
    for (int i = 0; i < n; i++) {
        ProcessAllocationEntry* e = &entries[i];
        
        // Base target is working set estimate with headroom
        e->target_frames = e->wss_estimate * 1.1;
        
        // Adjust based on fault rate
        if (e->pff > UPPER_PFF_THRESHOLD) {
            e->target_frames *= 1.0 + min(0.5, (e->pff - UPPER_PFF_THRESHOLD) / UPPER_PFF_THRESHOLD);
        } else if (e->pff < LOWER_PFF_THRESHOLD) {
            e->target_frames *= 0.9;
        }
        
        // Apply limits
        e->target_frames = max(e->target_frames, MIN_FRAMES);
        e->target_frames = min(e->target_frames, e->proc->max_frames);
        
        total_demand += e->target_frames;
    }
    
    // Phase 2: Scale if over-committed
    if (total_demand > total_available) {
        double scale = (double)total_available / total_demand;
        
        // Priority-weighted scaling: high priority gets less reduction
        double total_weighted_demand = 0;
        for (int i = 0; i < n; i++) {
            total_weighted_demand += entries[i].target_frames * entries[i].priority_weight;
        }
        
        for (int i = 0; i < n; i++) {
            ProcessAllocationEntry* e = &entries[i];
            // Higher weight = less scaling (closer to 1.0)
            double adjusted_scale = scale + (1.0 - scale) * 
                                   (e->priority_weight - 1.0) / MAX_PRIORITY_WEIGHT;
            e->target_frames *= adjusted_scale;
            e->target_frames = max(e->target_frames, MIN_FRAMES);
        }
    }
}
 
void apply_allocations(ProcessAllocationEntry* entries, int n) {
    // Apply targets gradually to avoid thrashing
    for (int i = 0; i < n; i++) {
        ProcessAllocationEntry* e = &entries[i];
        int delta = e->target_frames - e->current_frames;
        
        // Don't change more than 20% per interval for stability
        int max_change = max(1, e->current_frames / 5);
        delta = clamp(delta, -max_change, max_change);
        
        if (delta > 0) {
            add_frames_to_process(e->proc, delta);
        } else if (delta < 0) {
            remove_frames_from_process(e->proc, -delta);
        }
    }
}

Gradual Adjustment

Modern System Approaches

Modern operating systems don't strictly implement either working set or PFF. Instead, they use pragmatic approaches that achieve similar goals with simpler mechanisms.

Linux: Memory Pressure-Based Reclamation

Linux doesn't track per-process working sets or fault rates explicitly. Instead:

Active/Inactive Lists: Pages are on active (recently used) or inactive (candidates for reclaim) lists
Page Age Tracking: Reference bits are sampled periodically; unreferenced pages age
Memory Pressure Detection: When free memory is low, reclamation begins
Proportional Reclamation: Under pressure, reclaim proportionally from all processes
OOM Killer: When pressure is extreme, terminate processes to free memory

This is essentially global replacement with recency-based victim selection, plus load control via OOM.

Linux Memory Management Key Concepts
Concept	Mechanism	Purpose
kswapd	Kernel daemon that reclaims pages in background	Keep free memory above watermarks
Direct Reclaim	Process reclaims synchronously when allocating	Handle sudden memory needs
LRU Lists	Active and inactive lists track page age	Approximate LRU for victim selection
Memory Cgroups	Hierarchical memory limits per container/group	Isolation and resource control
OOM Killer	Terminates process when memory exhausted	Last-resort load control
Writeback	Flush dirty pages to disk in background	Prepare pages for eviction

Windows: Working Set Manager

Windows is closer to classic working set model:

Per-Process Working Set: Each process has a working set with min/max limits
Aging: Reference bits sampled; unreferenced pages age out of working set
Trimming: Under pressure, working sets are trimmed (pages removed but retained in standby list)
Standby List: Recently evicted pages stay in memory as soft cache
Balance Set Manager: System thread that coordinates working set adjustment

Key Differences from Classic Model:

Uses timer-based aging rather than reference-count windows
Standby list provides soft fault handling (no disk I/O for recently evicted pages)
Administrator-configurable min/max working set sizes
Memory priority tiers for different page classes

Cgroups and Containers

FreeBSD: Soft/Hard Limit Model

FreeBSD uses resource limits with soft and hard bounds:

Limit Type	Behavior
Soft limit	Process receives signals/warnings when exceeded
Hard limit	Allocation fails if would exceed
Per-process	Individual process constraints
Per-user	Total across all user's processes
System-wide	Global constraints

This provides flexibility for administrators to tune behavior.

VMware ESXi: Hypervisor Memory Management

Virtualization adds another layer:

Shares: VMs compete for memory proportional to configured shares
Reservation: Guaranteed minimum for critical VMs
Limit: Maximum a VM can use
Balloon driver: Guest OS voluntarily frees pages under pressure
Transparent page sharing: Identical pages across VMs use one physical copy

This is essentially priority-based proportional allocation with guaranteed minimums.

Allocation in Practice: Case Studies

Let's examine how allocation plays out in realistic scenarios.

Scenario 1: Database Server with Web Frontend

Workload:

Database: 16GB working set, steady access pattern
Web server: 500MB base + 100MB per active user, highly variable
System: 20GB physical RAM

Allocation Strategy:

Database: High priority, soft limit 16GB, hard limit 18GB
Web server: Medium priority, soft limit 2GB, hard limit 4GB, with per-user tracking
Reserve 2GB for OS and emergency

Behavior:

Normal load: Database uses 16GB, web uses 1-2GB, 2GB free
Traffic spike: Web grows toward 4GB, steals from database's excess
Database can't drop below 16GB (its working set), but web expansion limited to 4GB
If pressure continues, web starts soft faulting (pages move to standby, not disk)

Result: Stable performance; database protected; web absorbs variation

allocation_scenario.md
# Scenario 2: Build Server with Multiple Jobs
 
## Workload
- Build jobs: Variable memory needs (500MB to 8GB each)
- Concurrent jobs: Up to 16 parallel builds
- Total RAM: 64GB
 
## Challenge
Build jobs have phases:
1. Parse/compile: Small WSS
2. Link: Large WSS (loads all objects)
3. Test: Variable WSS
 
Jobs in link phase need much more memory temporarily.
 
## Allocation Strategy
 
### Priority Tiers
| Tier | Jobs | Memory Share | Preemption |
|------|------|--------------|------------|
| High | CI/critical path | 40% (25.6GB) | Protected |
| Medium | Developer builds | 35% (22.4GB) | Preemptable |
| Low | Speculative builds | 25% (16GB) | First to suspend |
 
### Dynamic Adjustment
1. Jobs start with 1GB allocation
2. PFF monitored; allocation grows as needed
3. When job enters link phase (detected by syscall patterns):
   - Pre-allocate additional frames
   - Delay other job starts if memory tight
4. After link phase, excess reclaimed
 
### Over-Commitment Handling
1. First: Delay starting new low-priority jobs
2. Second: Suspend idle speculative builds
3. Third: OOM-kill if truly exhausted (rare)
 
## Result
- CI jobs reliably complete
- Developer builds usually fast
- Speculative builds best-effort
- System rarely thrashes

Scenario 3: Mixed Interactive and Batch

Workload:

Interactive users: Shell sessions, editors, browsers
Batch jobs: Scientific computation, rendering
Total: 100 users, 50-100 batch jobs, 256GB RAM

Challenge:

Interactive must be responsive (sub-second)
Batch can tolerate delays but should complete eventually

Allocation Strategy:

Interactive tier: High priority, guaranteed response
- Shell/editor: 64KB min, 256MB max per session
- Browser: 256MB min, 4GB max
- Collective: Up to 160GB (100 users × 1.6GB average)
Batch tier: Lower priority, best effort
- Min 256MB per job for forward progress
- Collectively: Whatever's left after interactive
Pressure response:
- Light: Batch jobs share remaining memory
- Medium: Batch job working sets trimmed
- Heavy: Low-priority batch jobs suspended
- Extreme: OOM-kill batch to protect interactive

Key Insight: Interactive responsiveness depends on having working set in memory. Even 100ms of page fault latency is noticeable. Batch jobs can tolerate higher fault rates—they just take longer.

The Responsiveness Imperative

Summary: Dynamic Allocation

We've explored dynamic allocation in comprehensive depth—from fundamental policies to practical algorithms and real-world case studies. Let's consolidate the key insights:

Key Takeaways

•The Global Problem — Allocation across N processes is constrained by total memory M. Under-commitment is easy; over-commitment requires trade-offs or load control.
•Allocation Policies — Equal, proportional, priority-based, and need-based policies offer different tradeoffs. Most systems use hybrids.
•Global vs. Local Replacement — Pure local provides isolation but can't adapt; pure global adapts but causes interference. Scoped global (with priority awareness) balances both.
•Load Control — When demand exceeds supply, swapping (suspending entire processes) is necessary. Victim selection is policy-dependent.
•Practical Algorithms — Combine WSS and PFF information; scale under over-commitment; adjust gradually for stability.
•Modern Systems — Linux uses pressure-based reclamation; Windows uses working set manager; containers use cgroups. All approximate classic models.

What's Next:

Page Complete

4 / 5