Operating SystemsWorking Set and Page Fault Frequency

Working Set Model and Page Fault Frequency

LevelAdvanced

Duration90 mins

TopicWorking Set and Page Fault Frequency

5 / 5

Thrashing Prevention

The System Death Spiral

In 1968, the same year Peter Denning introduced the working set model, operators of early multiprogramming systems encountered a terrifying phenomenon. Under certain conditions, systems that had been running smoothly would suddenly grind to a halt. CPU utilization would plummet to near zero. Disk activity would spike to 100%. Users would see their terminals freeze. And the more the operators tried to increase throughput by adding processes, the worse the situation became.

They called this phenomenon thrashing.

Thrashing is the most severe failure mode in memory management—a state where the system spends more time moving pages between memory and disk than executing actual instructions. It represents a positive feedback catastrophe: insufficient memory causes page faults, which cause more page faults, which cause the system to collapse.

This page explores everything about thrashing: what it is, why it happens, how to detect it, and—most importantly—how to prevent it. This is where working set theory, page fault frequency, and dynamic allocation all come together to solve one of operating systems' most critical challenges.

What You Will Learn

By the end of this page, you will understand: (1) The precise dynamics of thrashing, (2) Why thrashing exhibits positive feedback, (3) Detection mechanisms and metrics, (4) Prevention strategies using working set and PFF, (5) Recovery procedures when thrashing occurs, and (6) Modern anti-thrashing mechanisms in production systems.

Understanding Thrashing

Definition:

Thrashing occurs when a system spends more time paging (transferring pages between memory and disk) than executing processes. It's characterized by:

Extremely high page fault rates
Near-100% disk I/O utilization (for swapping)
Near-0% CPU utilization (processes blocked on I/O)
Unresponsive user interfaces
Forward progress effectively halted

The Cause:

Thrashing has a simple root cause: Σ WSS_i > M — the sum of all processes' working set sizes exceeds available physical memory.

When this occurs:

Some pages that processes need aren't in memory
These processes fault when accessing missing pages
Faulted pages must be loaded from disk
To make room, other pages (possibly from other processes' working sets) are evicted
Those evicted pages are soon needed, causing more faults
The cycle accelerates

The Positive Feedback Loop

Thrashing gets worse over time, not better. Each page fault takes ~10ms (disk access). During that time, other processes run and possibly fault. With everyone faulting, no one makes progress. The more processes fault, the longer each must wait in the I/O queue. The longer they wait, the more likely their other pages get evicted. It's a death spiral.

Quantifying the Damage:

Let's illustrate with numbers:

Scenario	Memory Access	Page Fault	Effective Time
Normal	100ns	< 1% fault rate	~100ns average
Mild Pressure	100ns	1% fault rate	~100μs average
Thrashing	100ns	50% fault rate	~5ms average
Severe Thrashing	100ns	90% fault rate	~9ms average

Under thrashing, effective memory access time degrades by 50,000x to 90,000x. A program that would complete in 1 minute takes 50-90 days.

The CPU Utilization Paradox:

Operators initially responded to perceived CPU idleness by adding more processes—surely the CPU could do more work. But this made things worse:

Processes	Total WSS	Frame Shortage	CPU Utilization
5	400MB	None (M=500MB)	85%
10	600MB	100MB	60%
15	900MB	400MB	30%
20	1200MB	700MB	5%

The graph of CPU utilization vs. degree of multiprogramming rises initially, then collapses. The point just before collapse is the optimal operating point.

Thrashing vs. High I/O:

It's important to distinguish thrashing from legitimate high I/O activity:

Characteristic	Thrashing	High I/O Workload
Page fault rate	Extremely high	Low (pages cached)
I/O type	Swap/page file	Application data
CPU utilization	Near zero	Variable but productive
Forward progress	None	Normal
Cured by adding RAM	Yes	Not necessarily
Cured by reducing processes	Yes	No

The Thrashing Cascade

Let's trace through exactly how thrashing develops, step by step.

Phase 1: Normal Operation

10 processes running, each with WSS ≈ 40 pages
Total demand: 400 pages
Available frames: 500
Each process holds its working set
Page fault rate: Very low (phase transitions only)
CPU utilization: 85%

Phase 2: Increased Load

5 more processes admitted
Each new process needs ~40 pages
Total demand: 600 pages, but only 500 available
Shortage: 100 pages
System must evict some working set pages
Some processes can't hold full working sets

Phase 3: Contention Begins

A process whose working set page was evicted faults
Fault handler evicts another page (possibly from a different process)
That process will fault when it needs the evicted page
Multi-process fault chain begins
Page fault rate increases noticeably
CPU utilization starts to drop

thrashing_cascade.md
# Thrashing Cascade Timeline
 
## T=0: Equilibrium (15 processes, need 600 pages, have 500)
Process A faults on page P1
├── Evict page P2 from Process B
├── Load P1 for Process A
└── A resumes
 
## T=10ms: Cascade Begins
Process B faults on P2 (just evicted!)
├── Evict page P3 from Process C
├── Load P2 for Process B
├── B resumes
└── C will fault on P3 when scheduled...
 
## T=20ms: Cascade Spreads
Process C faults on P3
├── Evict page P4 from Process D  
├── Load P3 for Process C
└── Now 4 processes involved in cascade
 
## T=50ms: Full Thrashing
├── 12 of 15 processes have missing working set pages
├── I/O queue: 8 processes waiting for page load
├── Running queue: 3 processes (soon to fault)
├── Disk: 100% utilized doing page I/O
├── CPU: ~5% utilized (mostly OS overhead)
└── User-visible progress: ~0%

Phase 4: Positive Feedback Acceleration

Once thrashing begins, several mechanisms make it worse:

1. I/O Queue Buildup:

Page fault → request added to disk queue
More faults → longer queue
Longer queue → longer wait per fault
Longer wait → more elapsed time for other pages to be evicted

2. TLB Thrashing:

Page faults cause TLB invalidations
TLB refills become frequent
Address translation slows even for resident pages

3. Page Table Walking Overhead:

Frequent faults require frequent page table walks
Page tables themselves may need to be paged in
Meta-paging adds to I/O load

4. Context Switch Amplification:

Each fault blocks the process
Context switch to another process
That process faults
More context switches, more overhead

Phase 5: System Collapse

Disk queue saturated (hundreds of pending I/Os)
CPU spent entirely in OS overhead (handling faults, context switches)
No user code executes
Interactive response time: infinite (system appears frozen)
The only way out is external intervention (reduce load or add memory)

The Irrecoverability Problem

Once severe thrashing establishes, the system often cannot recover automatically. Even if load decreases (processes terminate), the remaining processes may have fragmented working sets that take significant time to reconstitute. Manual intervention—killing processes or rebooting—may be required.

Detecting Thrashing

Early detection is crucial—thrashing is far easier to prevent than to recover from. Several metrics can detect thrashing or impending thrashing.

Thrashing Detection Metrics
Metric	Normal	Warning	Thrashing	How to Measure
Page Fault Rate	< 10/sec	10-100/sec	100/sec	Fault counter / time
CPU Utilization	70%	30-70%	< 30%	Idle time measurement
Swap I/O Rate	< 10 MB/s	10-50 MB/s	50 MB/s	Disk statistics
Free Pages	Watermark	< Watermark	Near zero	Page allocator stats
I/O Wait %	< 10%	10-30%	30%	CPU accounting
Memory Pressure	Mild	Moderate	Severe	PSI or similar

Composite Detection:

No single metric is definitive. The key signature of thrashing is the combination:

Thrashing = (High Page Fault Rate) AND (Low CPU Utilization) AND (High Disk I/O)

High page faults with high CPU might just be a program loading (cold start). Low CPU with low I/O might be processes blocked on network. High disk I/O with high CPU is a normal I/O workload. Only the combination indicates thrashing.

Linux PSI (Pressure Stall Information):

Modern Linux provides PSI metrics that directly measure resource starvation:

$ cat /proc/pressure/memory
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0

some: Percentage of time some task is stalled on memory
full: Percentage of time all tasks are stalled on memory
avg10/60/300: Averages over 10, 60, 300 seconds

PSI 'full' memory pressure above 10-20% indicates thrashing.

thrashing_detection.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// Thrashing detection algorithm
 
typedef struct {
    double page_fault_rate;      // Faults per second (system-wide)
    double cpu_utilization;       // 0.0 to 1.0
    double swap_io_rate;         // MB/s swap activity
    double io_wait_percentage;    // Percentage of CPU time in I/O wait
    int free_pages;              // Current free page count
    int low_watermark;           // Page reclamation threshold
} SystemMetrics;
 
typedef enum {
    THRASHING_NONE,
    THRASHING_WARNING,      // Approaching thrashing
    THRASHING_MILD,         // Early thrashing, recoverable
    THRASHING_SEVERE,       // Full thrashing, intervention needed
    THRASHING_CRITICAL      // System unresponsive
} ThrashingLevel;
 
ThrashingLevel detect_thrashing(SystemMetrics* m) {
    // Composite scoring
    int score = 0;
    
    // Factor 1: Page fault rate
    if (m->page_fault_rate > 500) score += 4;
    else if (m->page_fault_rate > 100) score += 2;
    else if (m->page_fault_rate > 50) score += 1;
    
    // Factor 2: CPU utilization (inverted - low is bad)
    if (m->cpu_utilization < 0.10) score += 4;
    else if (m->cpu_utilization < 0.30) score += 2;
    else if (m->cpu_utilization < 0.50) score += 1;
    
    // Factor 3: I/O wait
    if (m->io_wait_percentage > 50) score += 3;
    else if (m->io_wait_percentage > 30) score += 2;
    else if (m->io_wait_percentage > 15) score += 1;
    
    // Factor 4: Swap activity
    if (m->swap_io_rate > 100) score += 3;
    else if (m->swap_io_rate > 50) score += 2;
    else if (m->swap_io_rate > 20) score += 1;
    
    // Factor 5: Free pages
    if (m->free_pages < m->low_watermark / 4) score += 2;
    else if (m->free_pages < m->low_watermark / 2) score += 1;
    
    // Classify based on composite score
    if (score >= 12) return THRASHING_CRITICAL;
    if (score >= 9) return THRASHING_SEVERE;
    if (score >= 6) return THRASHING_MILD;
    if (score >= 3) return THRASHING_WARNING;
    return THRASHING_NONE;
}

Early Warning Systems

The best thrashing detection catches THRASHING_WARNING before it progresses. At this stage, gentle intervention (slowing new process admission, slightly reducing multiprogramming) can prevent crisis. By THRASHING_SEVERE, drastic action is needed.

Prevention Strategies

Thrashing prevention is preferable to thrashing recovery. Several strategies keep the system away from the thrashing cliff.

Core Prevention Strategies

•Working Set-Based Admission Control — Before admitting a new process, verify that Σ WSS (including new process) ≤ M. If not, delay admission or require an existing process to suspend first.
•PFF-Based Allocation Control — Monitor per-process page fault frequency. If a process's PFF exceeds thresholds and no frames are available to give, suspend one process to free frames for others.
•Degree of Multiprogramming Limits — Hard cap on how many processes can be active simultaneously. The cap is set based on typical working set sizes and available memory.
•Progressive Load Shedding — As memory pressure increases, progressively restrict what can run: deny new processes, throttle memory allocation, suspend low-priority processes.
•Memory Reservation — Reserve a pool of frames that's never fully allocated. Provides buffer for transient demand spikes.

Strategy 1: Working Set-Based Admission Control

This is the direct application of Denning's working set principle:

On process_create(new_process):
    new_wss = estimate_working_set_size(new_process)
    current_total_wss = sum(wss for all running processes)
    
    if current_total_wss + new_wss <= available_frames:
        admit(new_process)
    else:
        // Cannot admit without risking thrashing
        option1: queue(new_process) until memory available
        option2: suspend_process(select_victim()) then admit
        option3: reject(new_process) with 'out of memory' error

Challenge: Estimating working set size before a process runs. Options:

Use historical data for known executables
Start with default estimate, adjust quickly based on actual behavior
Use hints from the process (if it declares memory requirements)

Strategy 2: PFF-Based Prevention

Use page fault frequency as a feedback signal:

periodically:
    for each process p:
        if p.pff > crisis_threshold:
            if free_frames > 0:
                allocate_frame_to(p)
            else:
                // System is at capacity and someone is suffering
                victim = select_lowest_priority_process()
                if victim != p:
                    suspend(victim)  // Free up frames
                else:
                    // Even lowest priority can't get frames - overloaded
                    suspend_newest_process()  // LIFO fairness

Prevention Strategy Comparison
Strategy	When Applied	Overhead	Accuracy	Disruption
WSS Admission Control	Process creation	Low	Medium (estimation)	Queues new processes
PFF Control	Continuous	Low	High (observed)	Suspends processes
Multiprogramming Limit	Process scheduling	Very Low	Low (static)	Queues new processes
Load Shedding	As pressure rises	Medium	Medium	Progressive
Memory Reservation	Allocation time	Low	N/A	Reduces utilization

Defense in Depth

The best systems use multiple strategies simultaneously. Admission control is the first line of defense. PFF monitoring is the second. Memory reservation provides buffer. Load shedding is the last resort before intervention. Each layer catches what the previous layer missed.

Recovery from Thrashing

When thrashing occurs despite prevention, the system must recover. The fundamental principle is simple: reduce demand until it fits within supply.

Recovery Options (in order of increasing severity):

Thrashing Recovery Ladder

•Stop new admissions — Immediately stop admitting new processes. Let queue build up rather than making thrashing worse.
•Aggressive page reclamation — Reclaim all pages beyond process minimums. Force processes to their smallest viable working sets.
•Suspend low-priority processes — Select low-priority processes and suspend them entirely (swap out all their pages), freeing frames for high-priority processes.
•OOM kill sacrificial processes — As a last resort, terminate processes to free memory. Select victims carefully (see below).
•Manual intervention — Administrator kills processes, adds swap space, or reboots. This is failure mode—systems shouldn't need manual intervention.

Suspension vs. Termination:

Action	Memory Freed	Process State	Recovery	User Impact
Suspend	All pages	Preserved (on disk)	Resume later	Work delayed
Terminate	All pages + resources	Lost	None	Work lost

Suspension is reversible; termination is not. Systems should prefer suspension unless:

Suspended process would never get to resume (priority too low)
Process is runaway memory consumer (keeping it would cause repeated thrashing)
System policy requires termination (e.g., OOM killer)

Linux OOM Killer:

Linux uses an OOM (Out Of Memory) killer as the last resort:

Trigger: Allocation fails and no pages can be reclaimed
Victim Selection: Heuristic scoring based on:
- Memory usage (higher = more likely victim)
- Runtime (longer = less likely victim)
- Nice value (lower priority = more likely victim)
- oom_score_adj (administrator tuning)
Action: Send SIGKILL to selected process
Goal: Free enough memory to allow system recovery

thrashing_recovery.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// Thrashing recovery algorithm
 
void recover_from_thrashing(ThrashingLevel level) {
    switch (level) {
        case THRASHING_WARNING:
            // Level 0: Gentle intervention
            pause_process_admission();
            trigger_aggressive_reclamation();
            break;
            
        case THRASHING_MILD:
            // Level 1: Reduce load
            pause_process_admission();
            trigger_aggressive_reclamation();
            
            // Suspend some low-priority processes
            int to_suspend = count_processes() / 4;  // Suspend 25%
            for (int i = 0; i < to_suspend; i++) {
                Process* victim = select_suspension_victim();
                if (victim) suspend_process(victim);
            }
            break;
            
        case THRASHING_SEVERE:
            // Level 2: Aggressive reduction
            pause_process_admission();
            
            // Suspend half of processes (preferring low priority)
            int to_suspend = count_processes() / 2;
            for (int i = 0; i < to_suspend; i++) {
                Process* victim = select_suspension_victim();
                if (victim) suspend_process(victim);
            }
            
            // If still thrashing after suspensions, begin termination
            if (detect_thrashing(get_current_metrics()) >= THRASHING_SEVERE) {
                Process* kill_victim = select_oom_victim();
                terminate_process(kill_victim, SIGKILL);
            }
            break;
            
        case THRASHING_CRITICAL:
            // Level 3: Emergency - terminate immediately
            // Suspend most processes, terminate memory hogs
            for (int i = 0; i < 3; i++) {  // Kill up to 3
                Process* victim = select_oom_victim();
                if (victim) terminate_process(victim, SIGKILL);
            }
            
            // Suspend all except essential
            for (Process* p : all_processes) {
                if (!is_essential(p)) {
                    suspend_process(p);
                }
            }
            break;
    }
}
 
Process* select_suspension_victim() {
    // Prefer: low priority, high memory, recently started
    Process* best = NULL;
    int best_score = -1;
    
    for (Process* p : all_processes) {
        if (is_suspended(p) || is_essential(p)) continue;
        
        int score = 0;
        score += (MAX_PRIORITY - p->priority) * 10;  // Low priority = high score
        score += p->memory_usage / 1024;  // More memory = higher score
        score -= p->runtime_seconds / 60;  // Longer running = lower score
        
        if (score > best_score) {
            best_score = score;
            best = p;
        }
    }
    return best;
}

Recovery Latency

Even after suspending/killing processes, recovery isn't instant. Remaining processes must reload their working sets. The disk queue must drain. This can take seconds to minutes. Systems should not resume normal operation until metrics confirm recovery.

Integration: A Complete System

Let's see how all the concepts—working set, PFF, dynamic allocation, and thrashing prevention—work together in an integrated system.

integrated_memory_management.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
// Integrated memory management system
 
// ===== CONSTANTS =====
#define PFF_UPPER 25.0           // Faults/sec - needs more frames
#define PFF_LOWER 5.0            // Faults/sec - can give up frames
#define THRASH_THRESHOLD 0.20    // 20% memory pressure = thrashing
#define RESERVE_PERCENT 0.10     // Keep 10% frames reserved
 
// ===== STATE =====
typedef struct {
    // Per-process tracking
    double pff;                  // Page fault frequency
    int allocated_frames;        // Current allocation
    int min_frames;              // Minimum allowed
    int max_frames;              // Maximum allowed
    int estimated_wss;           // Estimated working set size
    Process* proc;
} ProcessMemoryState;
 
// ===== PERIODIC MEMORY MANAGER (runs every 100ms) =====
void memory_management_tick() {
    SystemMetrics metrics = collect_metrics();
    ThrashingLevel thrash = detect_thrashing(&metrics);
    
    // Step 1: Handle thrashing if detected
    if (thrash >= THRASHING_MILD) {
        recover_from_thrashing(thrash);
        return;  // Don't do normal allocation during recovery
    }
    
    // Step 2: Calculate allocation targets
    int total_target = 0;
    for (ProcessMemoryState* pms : all_active_processes) {
        update_pff(pms);
        
        // Base target is estimated WSS
        pms->target = pms->estimated_wss;
        
        // Adjust based on PFF feedback
        if (pms->pff > PFF_UPPER) {
            pms->target = pms->allocated_frames * 1.2;
        } else if (pms->pff < PFF_LOWER) {
            pms->target = pms->allocated_frames * 0.95;
        }
        
        // Clamp to limits
        pms->target = clamp(pms->target, pms->min_frames, pms->max_frames);
        total_target += pms->target;
    }
    
    // Step 3: Scale if over-committed
    int available = total_frames * (1 - RESERVE_PERCENT);
    if (total_target > available) {
        double scale = (double)available / total_target;
        for (ProcessMemoryState* pms : all_active_processes) {
            pms->target = max(pms->min_frames, pms->target * scale);
        }
    }
    
    // Step 4: Apply allocations gradually
    for (ProcessMemoryState* pms : all_active_processes) {
        int delta = pms->target - pms->allocated_frames;
        delta = clamp(delta, -MAX_CHANGE, MAX_CHANGE);
        
        if (delta > 0 && free_frames >= delta) {
            allocate_frames(pms->proc, delta);
        } else if (delta < 0) {
            reclaim_frames(pms->proc, -delta);
        }
    }
    
    // Step 5: Admission control for waiting processes
    if (thrash == THRASHING_NONE && waiting_processes_exist()) {
        ProcessMemoryState* candidate = first_waiting();
        if (can_admit(candidate)) {
            admit_process(candidate);
        }
    }
}
 
// ===== ADMISSION CONTROL =====
bool can_admit(ProcessMemoryState* new_pms) {
    int required = new_pms->estimated_wss;
    int currently_used = sum_allocated_frames();
    int reserve = total_frames * RESERVE_PERCENT;
    
    return (currently_used + required) <= (total_frames - reserve);
}

The Integrated System's Behavior:

Normal Operation:

Each process's PFF tracked continuously
Allocations adjusted to keep PFF in target range
New processes admitted if room exists
Reserve pool maintained for transients

Approaching Overload:

PFF rises system-wide as memory tightens
Allocation increases requested for suffering processes
Low-PFF processes give up frames
Admission slows/stops as reserve depletes

At Overload:

Thrashing detection triggers
Admission halts
Low-priority processes suspended
Surviving processes get adequate frames
System stabilizes at lower multiprogramming

Recovery:

Metrics return to normal
Admission gradually resumes
Suspended processes resumed if room
System returns to normal operation

The Goal: Never Thrash

A well-designed system should rarely or never enter thrashing. Prevention layers should catch overload early. If thrashing occurs, it indicates insufficient prevention—either thresholds were wrong, or an unexpected workload spike occurred. Post-thrashing analysis should tune the system to prevent recurrence.

Modern Anti-Thrashing Mechanisms

Modern operating systems and platforms have evolved sophisticated anti-thrashing mechanisms beyond the classic approaches.

Modern Platform Features

•Memory Cgroups (Linux) — Hierarchical memory limits per container/group. Each cgroup has hard limits that prevent any workload from consuming unbounded memory. OOM killing is scoped to the cgroup, not system-wide.
•Pressure Stall Information (Linux PSI) — Real-time metrics on resource starvation. Enables proactive response before thrashing fully develops. Userspace tools can monitor and react.
•Memory Compression (zswap, ZRAM) — Compress pages before writing to swap, effectively increasing memory capacity. Reduces swap I/O pressure, delaying thrashing onset.
•Low Memory Killer (Android) — Mobile-specific OOM killer that terminates background apps to keep foreground responsive. Preemptive rather than last-resort.
•Memory Ballooning (VMs) — Guest VMs can be asked to voluntarily release memory to host. Cooperative mechanism prevents host-level thrashing.
•Kubernetes Memory Limits — Pod memory limits with OOM kill and eviction policies. Cluster-level workload scheduling considers memory availability.

Memory Cgroups in Detail:

Linux cgroups provide per-group memory limits:

# Set 1GB limit for a container
echo 1G > /sys/fs/cgroup/memory/mycontainer/memory.limit_in_bytes

# Set soft limit (reclaimed under pressure)
echo 512M > /sys/fs/cgroup/memory/mycontainer/memory.soft_limit_in_bytes

# Enable OOM killer for this group
echo 0 > /sys/fs/cgroup/memory/mycontainer/memory.oom_control

With cgroups:

Container A can't steal memory from Container B
Container exceeding its limit gets OOM kills internally
System-wide thrashing becomes rare
Resource isolation resembles separate machines

Pressure Stall Information Use:

# Monitor memory pressure
$ cat /proc/pressure/memory
some avg10=0.50 avg60=0.30 avg300=0.10 total=123456
full avg10=0.00 avg60=0.00 avg300=0.00 total=0

Tools like systemd can react to PSI:

# systemd unit snippet
[Unit]
MemoryPressureWatch=yes
MemoryPressureWatchInterval=2s
IOSchedulingPriority=7  # Deprioritize under memory pressure

Platform Anti-Thrashing Comparison
Platform	Primary Mechanism	Scope	Preemptive?
Linux Desktop	OOM killer, swappiness tuning	System-wide	Mostly reactive
Linux Container	Cgroups + OOM per container	Per-container	Can be proactive
Android	Low Memory Killer	Per-app	Proactive
Windows	Working set trimming + paging file	Per-process	Mixed
macOS	Memory compression + app nap	Per-app	Proactive
VMware ESXi	Ballooning + swapping + TPS	Per-VM	Proactive
Kubernetes	Pod eviction + limits	Per-pod	Proactive

The Hierarchy of Protection

Modern systems often have multiple layers: application-level (well-behaved apps limit themselves), container-level (cgroups), OS-level (OOM killer), and hypervisor-level (ballooning). Each layer catches what lower layers miss. This defense in depth makes thrashing rare in well-configured systems.

Summary: Thrashing Prevention

We've explored thrashing in comprehensive depth—from its devastating dynamics to modern prevention mechanisms. Let's consolidate the key insights:

Key Takeaways

•Thrashing Defined — A state where the system spends more time paging than computing. Characterized by high page faults, low CPU utilization, and high swap I/O simultaneously.
•Positive Feedback — Thrashing is self-reinforcing. Once started, it accelerates until external intervention. The system cannot self-recover without reducing load.
•Detection — Multiple metrics (PFF, CPU, swap rate, memory pressure) combine to detect thrashing. No single metric is sufficient. PSI provides modern direct measurement.
•Prevention — Working set-based admission control, PFF-based allocation, multiprogramming limits, and memory reservation work together to prevent thrashing.
•Recovery — Stop admissions, suspend low-priority processes, OOM kill if necessary. Recovery isn't instant—working sets must reload.
•Modern Approaches — Cgroups, PSI, compression, and container orchestration provide layered protection in contemporary systems.

Module Complete:

This concludes our exploration of the Working Set Model and Page Fault Frequency. We've traced the complete arc from Denning's foundational theory through practical implementation to thrashing prevention. Together, these concepts form the intellectual backbone of memory management—enabling operating systems to efficiently share limited physical memory among competing processes while maintaining stability and performance.

The principles you've learned apply not just to traditional operating systems but to any resource management scenario: container orchestration, database buffer management, cache sizing, and distributed systems coordination. Wherever limited resources must be shared among competing demands, working set concepts and feedback-based allocation remain relevant.

Module Complete

Congratulations! You've mastered the Working Set Model and Page Fault Frequency module. You now understand the theoretical foundations (working set, locality), practical mechanisms (PFF, dynamic allocation), and failure prevention (thrashing detection and recovery). These concepts are essential for understanding how modern operating systems manage memory at scale.

5 / 5

Loading learning content...

Operating SystemsWorking Set and Page Fault Frequency

Working Set Model and Page Fault Frequency

LevelAdvanced

Duration90 mins

TopicWorking Set and Page Fault Frequency

5 / 5

Thrashing Prevention

The System Death Spiral

They called this phenomenon thrashing.

What You Will Learn

Understanding Thrashing

Definition:

Thrashing occurs when a system spends more time paging (transferring pages between memory and disk) than executing processes. It's characterized by:

Extremely high page fault rates
Near-100% disk I/O utilization (for swapping)
Near-0% CPU utilization (processes blocked on I/O)
Unresponsive user interfaces
Forward progress effectively halted

The Cause:

Thrashing has a simple root cause: Σ WSS_i > M — the sum of all processes' working set sizes exceeds available physical memory.

When this occurs:

Some pages that processes need aren't in memory
These processes fault when accessing missing pages
Faulted pages must be loaded from disk
To make room, other pages (possibly from other processes' working sets) are evicted
Those evicted pages are soon needed, causing more faults
The cycle accelerates

The Positive Feedback Loop

Quantifying the Damage:

Let's illustrate with numbers:

Scenario	Memory Access	Page Fault	Effective Time
Normal	100ns	< 1% fault rate	~100ns average
Mild Pressure	100ns	1% fault rate	~100μs average
Thrashing	100ns	50% fault rate	~5ms average
Severe Thrashing	100ns	90% fault rate	~9ms average

Under thrashing, effective memory access time degrades by 50,000x to 90,000x. A program that would complete in 1 minute takes 50-90 days.

The CPU Utilization Paradox:

Operators initially responded to perceived CPU idleness by adding more processes—surely the CPU could do more work. But this made things worse:

Processes	Total WSS	Frame Shortage	CPU Utilization
5	400MB	None (M=500MB)	85%
10	600MB	100MB	60%
15	900MB	400MB	30%
20	1200MB	700MB	5%

The graph of CPU utilization vs. degree of multiprogramming rises initially, then collapses. The point just before collapse is the optimal operating point.

Thrashing vs. High I/O:

It's important to distinguish thrashing from legitimate high I/O activity:

Characteristic	Thrashing	High I/O Workload
Page fault rate	Extremely high	Low (pages cached)
I/O type	Swap/page file	Application data
CPU utilization	Near zero	Variable but productive
Forward progress	None	Normal
Cured by adding RAM	Yes	Not necessarily
Cured by reducing processes	Yes	No

The Thrashing Cascade

Let's trace through exactly how thrashing develops, step by step.

Phase 1: Normal Operation

10 processes running, each with WSS ≈ 40 pages
Total demand: 400 pages
Available frames: 500
Each process holds its working set
Page fault rate: Very low (phase transitions only)
CPU utilization: 85%

Phase 2: Increased Load

5 more processes admitted
Each new process needs ~40 pages
Total demand: 600 pages, but only 500 available
Shortage: 100 pages
System must evict some working set pages
Some processes can't hold full working sets

Phase 3: Contention Begins

A process whose working set page was evicted faults
Fault handler evicts another page (possibly from a different process)
That process will fault when it needs the evicted page
Multi-process fault chain begins
Page fault rate increases noticeably
CPU utilization starts to drop

thrashing_cascade.md
# Thrashing Cascade Timeline
 
## T=0: Equilibrium (15 processes, need 600 pages, have 500)
Process A faults on page P1
├── Evict page P2 from Process B
├── Load P1 for Process A
└── A resumes
 
## T=10ms: Cascade Begins
Process B faults on P2 (just evicted!)
├── Evict page P3 from Process C
├── Load P2 for Process B
├── B resumes
└── C will fault on P3 when scheduled...
 
## T=20ms: Cascade Spreads
Process C faults on P3
├── Evict page P4 from Process D  
├── Load P3 for Process C
└── Now 4 processes involved in cascade
 
## T=50ms: Full Thrashing
├── 12 of 15 processes have missing working set pages
├── I/O queue: 8 processes waiting for page load
├── Running queue: 3 processes (soon to fault)
├── Disk: 100% utilized doing page I/O
├── CPU: ~5% utilized (mostly OS overhead)
└── User-visible progress: ~0%

Phase 4: Positive Feedback Acceleration

Once thrashing begins, several mechanisms make it worse:

1. I/O Queue Buildup:

Page fault → request added to disk queue
More faults → longer queue
Longer queue → longer wait per fault
Longer wait → more elapsed time for other pages to be evicted

2. TLB Thrashing:

Page faults cause TLB invalidations
TLB refills become frequent
Address translation slows even for resident pages

3. Page Table Walking Overhead:

Frequent faults require frequent page table walks
Page tables themselves may need to be paged in
Meta-paging adds to I/O load

4. Context Switch Amplification:

Each fault blocks the process
Context switch to another process
That process faults
More context switches, more overhead

Phase 5: System Collapse

Disk queue saturated (hundreds of pending I/Os)
CPU spent entirely in OS overhead (handling faults, context switches)
No user code executes
Interactive response time: infinite (system appears frozen)
The only way out is external intervention (reduce load or add memory)

The Irrecoverability Problem

Detecting Thrashing

Early detection is crucial—thrashing is far easier to prevent than to recover from. Several metrics can detect thrashing or impending thrashing.

Thrashing Detection Metrics
Metric	Normal	Warning	Thrashing	How to Measure
Page Fault Rate	< 10/sec	10-100/sec	100/sec	Fault counter / time
CPU Utilization	70%	30-70%	< 30%	Idle time measurement
Swap I/O Rate	< 10 MB/s	10-50 MB/s	50 MB/s	Disk statistics
Free Pages	Watermark	< Watermark	Near zero	Page allocator stats
I/O Wait %	< 10%	10-30%	30%	CPU accounting
Memory Pressure	Mild	Moderate	Severe	PSI or similar

Composite Detection:

No single metric is definitive. The key signature of thrashing is the combination:

Thrashing = (High Page Fault Rate) AND (Low CPU Utilization) AND (High Disk I/O)

Linux PSI (Pressure Stall Information):

Modern Linux provides PSI metrics that directly measure resource starvation:

$ cat /proc/pressure/memory
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0

some: Percentage of time some task is stalled on memory
full: Percentage of time all tasks are stalled on memory
avg10/60/300: Averages over 10, 60, 300 seconds

PSI 'full' memory pressure above 10-20% indicates thrashing.

thrashing_detection.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// Thrashing detection algorithm
 
typedef struct {
    double page_fault_rate;      // Faults per second (system-wide)
    double cpu_utilization;       // 0.0 to 1.0
    double swap_io_rate;         // MB/s swap activity
    double io_wait_percentage;    // Percentage of CPU time in I/O wait
    int free_pages;              // Current free page count
    int low_watermark;           // Page reclamation threshold
} SystemMetrics;
 
typedef enum {
    THRASHING_NONE,
    THRASHING_WARNING,      // Approaching thrashing
    THRASHING_MILD,         // Early thrashing, recoverable
    THRASHING_SEVERE,       // Full thrashing, intervention needed
    THRASHING_CRITICAL      // System unresponsive
} ThrashingLevel;
 
ThrashingLevel detect_thrashing(SystemMetrics* m) {
    // Composite scoring
    int score = 0;
    
    // Factor 1: Page fault rate
    if (m->page_fault_rate > 500) score += 4;
    else if (m->page_fault_rate > 100) score += 2;
    else if (m->page_fault_rate > 50) score += 1;
    
    // Factor 2: CPU utilization (inverted - low is bad)
    if (m->cpu_utilization < 0.10) score += 4;
    else if (m->cpu_utilization < 0.30) score += 2;
    else if (m->cpu_utilization < 0.50) score += 1;
    
    // Factor 3: I/O wait
    if (m->io_wait_percentage > 50) score += 3;
    else if (m->io_wait_percentage > 30) score += 2;
    else if (m->io_wait_percentage > 15) score += 1;
    
    // Factor 4: Swap activity
    if (m->swap_io_rate > 100) score += 3;
    else if (m->swap_io_rate > 50) score += 2;
    else if (m->swap_io_rate > 20) score += 1;
    
    // Factor 5: Free pages
    if (m->free_pages < m->low_watermark / 4) score += 2;
    else if (m->free_pages < m->low_watermark / 2) score += 1;
    
    // Classify based on composite score
    if (score >= 12) return THRASHING_CRITICAL;
    if (score >= 9) return THRASHING_SEVERE;
    if (score >= 6) return THRASHING_MILD;
    if (score >= 3) return THRASHING_WARNING;
    return THRASHING_NONE;
}

Early Warning Systems

Prevention Strategies

Thrashing prevention is preferable to thrashing recovery. Several strategies keep the system away from the thrashing cliff.

Core Prevention Strategies

•Working Set-Based Admission Control — Before admitting a new process, verify that Σ WSS (including new process) ≤ M. If not, delay admission or require an existing process to suspend first.
•PFF-Based Allocation Control — Monitor per-process page fault frequency. If a process's PFF exceeds thresholds and no frames are available to give, suspend one process to free frames for others.
•Degree of Multiprogramming Limits — Hard cap on how many processes can be active simultaneously. The cap is set based on typical working set sizes and available memory.
•Progressive Load Shedding — As memory pressure increases, progressively restrict what can run: deny new processes, throttle memory allocation, suspend low-priority processes.
•Memory Reservation — Reserve a pool of frames that's never fully allocated. Provides buffer for transient demand spikes.

Strategy 1: Working Set-Based Admission Control

This is the direct application of Denning's working set principle:

On process_create(new_process):
    new_wss = estimate_working_set_size(new_process)
    current_total_wss = sum(wss for all running processes)
    
    if current_total_wss + new_wss <= available_frames:
        admit(new_process)
    else:
        // Cannot admit without risking thrashing
        option1: queue(new_process) until memory available
        option2: suspend_process(select_victim()) then admit
        option3: reject(new_process) with 'out of memory' error

Challenge: Estimating working set size before a process runs. Options:

Use historical data for known executables
Start with default estimate, adjust quickly based on actual behavior
Use hints from the process (if it declares memory requirements)

Strategy 2: PFF-Based Prevention

Use page fault frequency as a feedback signal:

periodically:
    for each process p:
        if p.pff > crisis_threshold:
            if free_frames > 0:
                allocate_frame_to(p)
            else:
                // System is at capacity and someone is suffering
                victim = select_lowest_priority_process()
                if victim != p:
                    suspend(victim)  // Free up frames
                else:
                    // Even lowest priority can't get frames - overloaded
                    suspend_newest_process()  // LIFO fairness

Prevention Strategy Comparison
Strategy	When Applied	Overhead	Accuracy	Disruption
WSS Admission Control	Process creation	Low	Medium (estimation)	Queues new processes
PFF Control	Continuous	Low	High (observed)	Suspends processes
Multiprogramming Limit	Process scheduling	Very Low	Low (static)	Queues new processes
Load Shedding	As pressure rises	Medium	Medium	Progressive
Memory Reservation	Allocation time	Low	N/A	Reduces utilization

Defense in Depth

Recovery from Thrashing

When thrashing occurs despite prevention, the system must recover. The fundamental principle is simple: reduce demand until it fits within supply.

Recovery Options (in order of increasing severity):

Thrashing Recovery Ladder

•Stop new admissions — Immediately stop admitting new processes. Let queue build up rather than making thrashing worse.
•Aggressive page reclamation — Reclaim all pages beyond process minimums. Force processes to their smallest viable working sets.
•Suspend low-priority processes — Select low-priority processes and suspend them entirely (swap out all their pages), freeing frames for high-priority processes.
•OOM kill sacrificial processes — As a last resort, terminate processes to free memory. Select victims carefully (see below).
•Manual intervention — Administrator kills processes, adds swap space, or reboots. This is failure mode—systems shouldn't need manual intervention.

Suspension vs. Termination:

Action	Memory Freed	Process State	Recovery	User Impact
Suspend	All pages	Preserved (on disk)	Resume later	Work delayed
Terminate	All pages + resources	Lost	None	Work lost

Suspension is reversible; termination is not. Systems should prefer suspension unless:

Suspended process would never get to resume (priority too low)
Process is runaway memory consumer (keeping it would cause repeated thrashing)
System policy requires termination (e.g., OOM killer)

Linux OOM Killer:

Linux uses an OOM (Out Of Memory) killer as the last resort:

Trigger: Allocation fails and no pages can be reclaimed
Victim Selection: Heuristic scoring based on:
- Memory usage (higher = more likely victim)
- Runtime (longer = less likely victim)
- Nice value (lower priority = more likely victim)
- oom_score_adj (administrator tuning)
Action: Send SIGKILL to selected process
Goal: Free enough memory to allow system recovery

thrashing_recovery.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// Thrashing recovery algorithm
 
void recover_from_thrashing(ThrashingLevel level) {
    switch (level) {
        case THRASHING_WARNING:
            // Level 0: Gentle intervention
            pause_process_admission();
            trigger_aggressive_reclamation();
            break;
            
        case THRASHING_MILD:
            // Level 1: Reduce load
            pause_process_admission();
            trigger_aggressive_reclamation();
            
            // Suspend some low-priority processes
            int to_suspend = count_processes() / 4;  // Suspend 25%
            for (int i = 0; i < to_suspend; i++) {
                Process* victim = select_suspension_victim();
                if (victim) suspend_process(victim);
            }
            break;
            
        case THRASHING_SEVERE:
            // Level 2: Aggressive reduction
            pause_process_admission();
            
            // Suspend half of processes (preferring low priority)
            int to_suspend = count_processes() / 2;
            for (int i = 0; i < to_suspend; i++) {
                Process* victim = select_suspension_victim();
                if (victim) suspend_process(victim);
            }
            
            // If still thrashing after suspensions, begin termination
            if (detect_thrashing(get_current_metrics()) >= THRASHING_SEVERE) {
                Process* kill_victim = select_oom_victim();
                terminate_process(kill_victim, SIGKILL);
            }
            break;
            
        case THRASHING_CRITICAL:
            // Level 3: Emergency - terminate immediately
            // Suspend most processes, terminate memory hogs
            for (int i = 0; i < 3; i++) {  // Kill up to 3
                Process* victim = select_oom_victim();
                if (victim) terminate_process(victim, SIGKILL);
            }
            
            // Suspend all except essential
            for (Process* p : all_processes) {
                if (!is_essential(p)) {
                    suspend_process(p);
                }
            }
            break;
    }
}
 
Process* select_suspension_victim() {
    // Prefer: low priority, high memory, recently started
    Process* best = NULL;
    int best_score = -1;
    
    for (Process* p : all_processes) {
        if (is_suspended(p) || is_essential(p)) continue;
        
        int score = 0;
        score += (MAX_PRIORITY - p->priority) * 10;  // Low priority = high score
        score += p->memory_usage / 1024;  // More memory = higher score
        score -= p->runtime_seconds / 60;  // Longer running = lower score
        
        if (score > best_score) {
            best_score = score;
            best = p;
        }
    }
    return best;
}

Recovery Latency

Integration: A Complete System

Let's see how all the concepts—working set, PFF, dynamic allocation, and thrashing prevention—work together in an integrated system.

integrated_memory_management.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
// Integrated memory management system
 
// ===== CONSTANTS =====
#define PFF_UPPER 25.0           // Faults/sec - needs more frames
#define PFF_LOWER 5.0            // Faults/sec - can give up frames
#define THRASH_THRESHOLD 0.20    // 20% memory pressure = thrashing
#define RESERVE_PERCENT 0.10     // Keep 10% frames reserved
 
// ===== STATE =====
typedef struct {
    // Per-process tracking
    double pff;                  // Page fault frequency
    int allocated_frames;        // Current allocation
    int min_frames;              // Minimum allowed
    int max_frames;              // Maximum allowed
    int estimated_wss;           // Estimated working set size
    Process* proc;
} ProcessMemoryState;
 
// ===== PERIODIC MEMORY MANAGER (runs every 100ms) =====
void memory_management_tick() {
    SystemMetrics metrics = collect_metrics();
    ThrashingLevel thrash = detect_thrashing(&metrics);
    
    // Step 1: Handle thrashing if detected
    if (thrash >= THRASHING_MILD) {
        recover_from_thrashing(thrash);
        return;  // Don't do normal allocation during recovery
    }
    
    // Step 2: Calculate allocation targets
    int total_target = 0;
    for (ProcessMemoryState* pms : all_active_processes) {
        update_pff(pms);
        
        // Base target is estimated WSS
        pms->target = pms->estimated_wss;
        
        // Adjust based on PFF feedback
        if (pms->pff > PFF_UPPER) {
            pms->target = pms->allocated_frames * 1.2;
        } else if (pms->pff < PFF_LOWER) {
            pms->target = pms->allocated_frames * 0.95;
        }
        
        // Clamp to limits
        pms->target = clamp(pms->target, pms->min_frames, pms->max_frames);
        total_target += pms->target;
    }
    
    // Step 3: Scale if over-committed
    int available = total_frames * (1 - RESERVE_PERCENT);
    if (total_target > available) {
        double scale = (double)available / total_target;
        for (ProcessMemoryState* pms : all_active_processes) {
            pms->target = max(pms->min_frames, pms->target * scale);
        }
    }
    
    // Step 4: Apply allocations gradually
    for (ProcessMemoryState* pms : all_active_processes) {
        int delta = pms->target - pms->allocated_frames;
        delta = clamp(delta, -MAX_CHANGE, MAX_CHANGE);
        
        if (delta > 0 && free_frames >= delta) {
            allocate_frames(pms->proc, delta);
        } else if (delta < 0) {
            reclaim_frames(pms->proc, -delta);
        }
    }
    
    // Step 5: Admission control for waiting processes
    if (thrash == THRASHING_NONE && waiting_processes_exist()) {
        ProcessMemoryState* candidate = first_waiting();
        if (can_admit(candidate)) {
            admit_process(candidate);
        }
    }
}
 
// ===== ADMISSION CONTROL =====
bool can_admit(ProcessMemoryState* new_pms) {
    int required = new_pms->estimated_wss;
    int currently_used = sum_allocated_frames();
    int reserve = total_frames * RESERVE_PERCENT;
    
    return (currently_used + required) <= (total_frames - reserve);
}

The Integrated System's Behavior:

Normal Operation:

Each process's PFF tracked continuously
Allocations adjusted to keep PFF in target range
New processes admitted if room exists
Reserve pool maintained for transients

Approaching Overload:

PFF rises system-wide as memory tightens
Allocation increases requested for suffering processes
Low-PFF processes give up frames
Admission slows/stops as reserve depletes

At Overload:

Thrashing detection triggers
Admission halts
Low-priority processes suspended
Surviving processes get adequate frames
System stabilizes at lower multiprogramming

Recovery:

Metrics return to normal
Admission gradually resumes
Suspended processes resumed if room
System returns to normal operation

The Goal: Never Thrash

Modern Anti-Thrashing Mechanisms

Modern operating systems and platforms have evolved sophisticated anti-thrashing mechanisms beyond the classic approaches.

Modern Platform Features

•Memory Cgroups (Linux) — Hierarchical memory limits per container/group. Each cgroup has hard limits that prevent any workload from consuming unbounded memory. OOM killing is scoped to the cgroup, not system-wide.
•Pressure Stall Information (Linux PSI) — Real-time metrics on resource starvation. Enables proactive response before thrashing fully develops. Userspace tools can monitor and react.
•Memory Compression (zswap, ZRAM) — Compress pages before writing to swap, effectively increasing memory capacity. Reduces swap I/O pressure, delaying thrashing onset.
•Low Memory Killer (Android) — Mobile-specific OOM killer that terminates background apps to keep foreground responsive. Preemptive rather than last-resort.
•Memory Ballooning (VMs) — Guest VMs can be asked to voluntarily release memory to host. Cooperative mechanism prevents host-level thrashing.
•Kubernetes Memory Limits — Pod memory limits with OOM kill and eviction policies. Cluster-level workload scheduling considers memory availability.

Memory Cgroups in Detail:

Linux cgroups provide per-group memory limits:

# Set 1GB limit for a container
echo 1G > /sys/fs/cgroup/memory/mycontainer/memory.limit_in_bytes

# Set soft limit (reclaimed under pressure)
echo 512M > /sys/fs/cgroup/memory/mycontainer/memory.soft_limit_in_bytes

# Enable OOM killer for this group
echo 0 > /sys/fs/cgroup/memory/mycontainer/memory.oom_control

With cgroups:

Container A can't steal memory from Container B
Container exceeding its limit gets OOM kills internally
System-wide thrashing becomes rare
Resource isolation resembles separate machines

Pressure Stall Information Use:

# Monitor memory pressure
$ cat /proc/pressure/memory
some avg10=0.50 avg60=0.30 avg300=0.10 total=123456
full avg10=0.00 avg60=0.00 avg300=0.00 total=0

Tools like systemd can react to PSI:

# systemd unit snippet
[Unit]
MemoryPressureWatch=yes
MemoryPressureWatchInterval=2s
IOSchedulingPriority=7  # Deprioritize under memory pressure

Platform Anti-Thrashing Comparison
Platform	Primary Mechanism	Scope	Preemptive?
Linux Desktop	OOM killer, swappiness tuning	System-wide	Mostly reactive
Linux Container	Cgroups + OOM per container	Per-container	Can be proactive
Android	Low Memory Killer	Per-app	Proactive
Windows	Working set trimming + paging file	Per-process	Mixed
macOS	Memory compression + app nap	Per-app	Proactive
VMware ESXi	Ballooning + swapping + TPS	Per-VM	Proactive
Kubernetes	Pod eviction + limits	Per-pod	Proactive

The Hierarchy of Protection

Summary: Thrashing Prevention

We've explored thrashing in comprehensive depth—from its devastating dynamics to modern prevention mechanisms. Let's consolidate the key insights:

Key Takeaways

•Thrashing Defined — A state where the system spends more time paging than computing. Characterized by high page faults, low CPU utilization, and high swap I/O simultaneously.
•Positive Feedback — Thrashing is self-reinforcing. Once started, it accelerates until external intervention. The system cannot self-recover without reducing load.
•Detection — Multiple metrics (PFF, CPU, swap rate, memory pressure) combine to detect thrashing. No single metric is sufficient. PSI provides modern direct measurement.
•Prevention — Working set-based admission control, PFF-based allocation, multiprogramming limits, and memory reservation work together to prevent thrashing.
•Recovery — Stop admissions, suspend low-priority processes, OOM kill if necessary. Recovery isn't instant—working sets must reload.
•Modern Approaches — Cgroups, PSI, compression, and container orchestration provide layered protection in contemporary systems.

Module Complete:

Module Complete

5 / 5