Loading learning content...
In 1968, the same year Peter Denning introduced the working set model, operators of early multiprogramming systems encountered a terrifying phenomenon. Under certain conditions, systems that had been running smoothly would suddenly grind to a halt. CPU utilization would plummet to near zero. Disk activity would spike to 100%. Users would see their terminals freeze. And the more the operators tried to increase throughput by adding processes, the worse the situation became.
They called this phenomenon thrashing.
Thrashing is the most severe failure mode in memory management—a state where the system spends more time moving pages between memory and disk than executing actual instructions. It represents a positive feedback catastrophe: insufficient memory causes page faults, which cause more page faults, which cause the system to collapse.
This page explores everything about thrashing: what it is, why it happens, how to detect it, and—most importantly—how to prevent it. This is where working set theory, page fault frequency, and dynamic allocation all come together to solve one of operating systems' most critical challenges.
By the end of this page, you will understand: (1) The precise dynamics of thrashing, (2) Why thrashing exhibits positive feedback, (3) Detection mechanisms and metrics, (4) Prevention strategies using working set and PFF, (5) Recovery procedures when thrashing occurs, and (6) Modern anti-thrashing mechanisms in production systems.
Definition:
Thrashing occurs when a system spends more time paging (transferring pages between memory and disk) than executing processes. It's characterized by:
The Cause:
Thrashing has a simple root cause: Σ WSS_i > M — the sum of all processes' working set sizes exceeds available physical memory.
When this occurs:
Thrashing gets worse over time, not better. Each page fault takes ~10ms (disk access). During that time, other processes run and possibly fault. With everyone faulting, no one makes progress. The more processes fault, the longer each must wait in the I/O queue. The longer they wait, the more likely their other pages get evicted. It's a death spiral.
Quantifying the Damage:
Let's illustrate with numbers:
| Scenario | Memory Access | Page Fault | Effective Time |
|---|---|---|---|
| Normal | 100ns | < 1% fault rate | ~100ns average |
| Mild Pressure | 100ns | 1% fault rate | ~100μs average |
| Thrashing | 100ns | 50% fault rate | ~5ms average |
| Severe Thrashing | 100ns | 90% fault rate | ~9ms average |
Under thrashing, effective memory access time degrades by 50,000x to 90,000x. A program that would complete in 1 minute takes 50-90 days.
The CPU Utilization Paradox:
Operators initially responded to perceived CPU idleness by adding more processes—surely the CPU could do more work. But this made things worse:
| Processes | Total WSS | Frame Shortage | CPU Utilization |
|---|---|---|---|
| 5 | 400MB | None (M=500MB) | 85% |
| 10 | 600MB | 100MB | 60% |
| 15 | 900MB | 400MB | 30% |
| 20 | 1200MB | 700MB | 5% |
The graph of CPU utilization vs. degree of multiprogramming rises initially, then collapses. The point just before collapse is the optimal operating point.
Thrashing vs. High I/O:
It's important to distinguish thrashing from legitimate high I/O activity:
| Characteristic | Thrashing | High I/O Workload |
|---|---|---|
| Page fault rate | Extremely high | Low (pages cached) |
| I/O type | Swap/page file | Application data |
| CPU utilization | Near zero | Variable but productive |
| Forward progress | None | Normal |
| Cured by adding RAM | Yes | Not necessarily |
| Cured by reducing processes | Yes | No |
Let's trace through exactly how thrashing develops, step by step.
Phase 1: Normal Operation
Phase 2: Increased Load
Phase 3: Contention Begins
# Thrashing Cascade Timeline ## T=0: Equilibrium (15 processes, need 600 pages, have 500)Process A faults on page P1├── Evict page P2 from Process B├── Load P1 for Process A└── A resumes ## T=10ms: Cascade BeginsProcess B faults on P2 (just evicted!)├── Evict page P3 from Process C├── Load P2 for Process B├── B resumes└── C will fault on P3 when scheduled... ## T=20ms: Cascade SpreadsProcess C faults on P3├── Evict page P4 from Process D ├── Load P3 for Process C└── Now 4 processes involved in cascade ## T=50ms: Full Thrashing├── 12 of 15 processes have missing working set pages├── I/O queue: 8 processes waiting for page load├── Running queue: 3 processes (soon to fault)├── Disk: 100% utilized doing page I/O├── CPU: ~5% utilized (mostly OS overhead)└── User-visible progress: ~0%Phase 4: Positive Feedback Acceleration
Once thrashing begins, several mechanisms make it worse:
1. I/O Queue Buildup:
2. TLB Thrashing:
3. Page Table Walking Overhead:
4. Context Switch Amplification:
Phase 5: System Collapse
Once severe thrashing establishes, the system often cannot recover automatically. Even if load decreases (processes terminate), the remaining processes may have fragmented working sets that take significant time to reconstitute. Manual intervention—killing processes or rebooting—may be required.
Early detection is crucial—thrashing is far easier to prevent than to recover from. Several metrics can detect thrashing or impending thrashing.
| Metric | Normal | Warning | Thrashing | How to Measure |
|---|---|---|---|---|
| Page Fault Rate | < 10/sec | 10-100/sec | 100/sec | Fault counter / time |
| CPU Utilization | 70% | 30-70% | < 30% | Idle time measurement |
| Swap I/O Rate | < 10 MB/s | 10-50 MB/s | 50 MB/s | Disk statistics |
| Free Pages | Watermark | < Watermark | Near zero | Page allocator stats |
| I/O Wait % | < 10% | 10-30% | 30% | CPU accounting |
| Memory Pressure | Mild | Moderate | Severe | PSI or similar |
Composite Detection:
No single metric is definitive. The key signature of thrashing is the combination:
Thrashing = (High Page Fault Rate) AND (Low CPU Utilization) AND (High Disk I/O)
High page faults with high CPU might just be a program loading (cold start). Low CPU with low I/O might be processes blocked on network. High disk I/O with high CPU is a normal I/O workload. Only the combination indicates thrashing.
Linux PSI (Pressure Stall Information):
Modern Linux provides PSI metrics that directly measure resource starvation:
$ cat /proc/pressure/memory
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
PSI 'full' memory pressure above 10-20% indicates thrashing.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
// Thrashing detection algorithm typedef struct { double page_fault_rate; // Faults per second (system-wide) double cpu_utilization; // 0.0 to 1.0 double swap_io_rate; // MB/s swap activity double io_wait_percentage; // Percentage of CPU time in I/O wait int free_pages; // Current free page count int low_watermark; // Page reclamation threshold} SystemMetrics; typedef enum { THRASHING_NONE, THRASHING_WARNING, // Approaching thrashing THRASHING_MILD, // Early thrashing, recoverable THRASHING_SEVERE, // Full thrashing, intervention needed THRASHING_CRITICAL // System unresponsive} ThrashingLevel; ThrashingLevel detect_thrashing(SystemMetrics* m) { // Composite scoring int score = 0; // Factor 1: Page fault rate if (m->page_fault_rate > 500) score += 4; else if (m->page_fault_rate > 100) score += 2; else if (m->page_fault_rate > 50) score += 1; // Factor 2: CPU utilization (inverted - low is bad) if (m->cpu_utilization < 0.10) score += 4; else if (m->cpu_utilization < 0.30) score += 2; else if (m->cpu_utilization < 0.50) score += 1; // Factor 3: I/O wait if (m->io_wait_percentage > 50) score += 3; else if (m->io_wait_percentage > 30) score += 2; else if (m->io_wait_percentage > 15) score += 1; // Factor 4: Swap activity if (m->swap_io_rate > 100) score += 3; else if (m->swap_io_rate > 50) score += 2; else if (m->swap_io_rate > 20) score += 1; // Factor 5: Free pages if (m->free_pages < m->low_watermark / 4) score += 2; else if (m->free_pages < m->low_watermark / 2) score += 1; // Classify based on composite score if (score >= 12) return THRASHING_CRITICAL; if (score >= 9) return THRASHING_SEVERE; if (score >= 6) return THRASHING_MILD; if (score >= 3) return THRASHING_WARNING; return THRASHING_NONE;}The best thrashing detection catches THRASHING_WARNING before it progresses. At this stage, gentle intervention (slowing new process admission, slightly reducing multiprogramming) can prevent crisis. By THRASHING_SEVERE, drastic action is needed.
Thrashing prevention is preferable to thrashing recovery. Several strategies keep the system away from the thrashing cliff.
Strategy 1: Working Set-Based Admission Control
This is the direct application of Denning's working set principle:
On process_create(new_process):
new_wss = estimate_working_set_size(new_process)
current_total_wss = sum(wss for all running processes)
if current_total_wss + new_wss <= available_frames:
admit(new_process)
else:
// Cannot admit without risking thrashing
option1: queue(new_process) until memory available
option2: suspend_process(select_victim()) then admit
option3: reject(new_process) with 'out of memory' error
Challenge: Estimating working set size before a process runs. Options:
Strategy 2: PFF-Based Prevention
Use page fault frequency as a feedback signal:
periodically:
for each process p:
if p.pff > crisis_threshold:
if free_frames > 0:
allocate_frame_to(p)
else:
// System is at capacity and someone is suffering
victim = select_lowest_priority_process()
if victim != p:
suspend(victim) // Free up frames
else:
// Even lowest priority can't get frames - overloaded
suspend_newest_process() // LIFO fairness
| Strategy | When Applied | Overhead | Accuracy | Disruption |
|---|---|---|---|---|
| WSS Admission Control | Process creation | Low | Medium (estimation) | Queues new processes |
| PFF Control | Continuous | Low | High (observed) | Suspends processes |
| Multiprogramming Limit | Process scheduling | Very Low | Low (static) | Queues new processes |
| Load Shedding | As pressure rises | Medium | Medium | Progressive |
| Memory Reservation | Allocation time | Low | N/A | Reduces utilization |
The best systems use multiple strategies simultaneously. Admission control is the first line of defense. PFF monitoring is the second. Memory reservation provides buffer. Load shedding is the last resort before intervention. Each layer catches what the previous layer missed.
When thrashing occurs despite prevention, the system must recover. The fundamental principle is simple: reduce demand until it fits within supply.
Recovery Options (in order of increasing severity):
Suspension vs. Termination:
| Action | Memory Freed | Process State | Recovery | User Impact |
|---|---|---|---|---|
| Suspend | All pages | Preserved (on disk) | Resume later | Work delayed |
| Terminate | All pages + resources | Lost | None | Work lost |
Suspension is reversible; termination is not. Systems should prefer suspension unless:
Linux OOM Killer:
Linux uses an OOM (Out Of Memory) killer as the last resort:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
// Thrashing recovery algorithm void recover_from_thrashing(ThrashingLevel level) { switch (level) { case THRASHING_WARNING: // Level 0: Gentle intervention pause_process_admission(); trigger_aggressive_reclamation(); break; case THRASHING_MILD: // Level 1: Reduce load pause_process_admission(); trigger_aggressive_reclamation(); // Suspend some low-priority processes int to_suspend = count_processes() / 4; // Suspend 25% for (int i = 0; i < to_suspend; i++) { Process* victim = select_suspension_victim(); if (victim) suspend_process(victim); } break; case THRASHING_SEVERE: // Level 2: Aggressive reduction pause_process_admission(); // Suspend half of processes (preferring low priority) int to_suspend = count_processes() / 2; for (int i = 0; i < to_suspend; i++) { Process* victim = select_suspension_victim(); if (victim) suspend_process(victim); } // If still thrashing after suspensions, begin termination if (detect_thrashing(get_current_metrics()) >= THRASHING_SEVERE) { Process* kill_victim = select_oom_victim(); terminate_process(kill_victim, SIGKILL); } break; case THRASHING_CRITICAL: // Level 3: Emergency - terminate immediately // Suspend most processes, terminate memory hogs for (int i = 0; i < 3; i++) { // Kill up to 3 Process* victim = select_oom_victim(); if (victim) terminate_process(victim, SIGKILL); } // Suspend all except essential for (Process* p : all_processes) { if (!is_essential(p)) { suspend_process(p); } } break; }} Process* select_suspension_victim() { // Prefer: low priority, high memory, recently started Process* best = NULL; int best_score = -1; for (Process* p : all_processes) { if (is_suspended(p) || is_essential(p)) continue; int score = 0; score += (MAX_PRIORITY - p->priority) * 10; // Low priority = high score score += p->memory_usage / 1024; // More memory = higher score score -= p->runtime_seconds / 60; // Longer running = lower score if (score > best_score) { best_score = score; best = p; } } return best;}Even after suspending/killing processes, recovery isn't instant. Remaining processes must reload their working sets. The disk queue must drain. This can take seconds to minutes. Systems should not resume normal operation until metrics confirm recovery.
Let's see how all the concepts—working set, PFF, dynamic allocation, and thrashing prevention—work together in an integrated system.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
// Integrated memory management system // ===== CONSTANTS =====#define PFF_UPPER 25.0 // Faults/sec - needs more frames#define PFF_LOWER 5.0 // Faults/sec - can give up frames#define THRASH_THRESHOLD 0.20 // 20% memory pressure = thrashing#define RESERVE_PERCENT 0.10 // Keep 10% frames reserved // ===== STATE =====typedef struct { // Per-process tracking double pff; // Page fault frequency int allocated_frames; // Current allocation int min_frames; // Minimum allowed int max_frames; // Maximum allowed int estimated_wss; // Estimated working set size Process* proc;} ProcessMemoryState; // ===== PERIODIC MEMORY MANAGER (runs every 100ms) =====void memory_management_tick() { SystemMetrics metrics = collect_metrics(); ThrashingLevel thrash = detect_thrashing(&metrics); // Step 1: Handle thrashing if detected if (thrash >= THRASHING_MILD) { recover_from_thrashing(thrash); return; // Don't do normal allocation during recovery } // Step 2: Calculate allocation targets int total_target = 0; for (ProcessMemoryState* pms : all_active_processes) { update_pff(pms); // Base target is estimated WSS pms->target = pms->estimated_wss; // Adjust based on PFF feedback if (pms->pff > PFF_UPPER) { pms->target = pms->allocated_frames * 1.2; } else if (pms->pff < PFF_LOWER) { pms->target = pms->allocated_frames * 0.95; } // Clamp to limits pms->target = clamp(pms->target, pms->min_frames, pms->max_frames); total_target += pms->target; } // Step 3: Scale if over-committed int available = total_frames * (1 - RESERVE_PERCENT); if (total_target > available) { double scale = (double)available / total_target; for (ProcessMemoryState* pms : all_active_processes) { pms->target = max(pms->min_frames, pms->target * scale); } } // Step 4: Apply allocations gradually for (ProcessMemoryState* pms : all_active_processes) { int delta = pms->target - pms->allocated_frames; delta = clamp(delta, -MAX_CHANGE, MAX_CHANGE); if (delta > 0 && free_frames >= delta) { allocate_frames(pms->proc, delta); } else if (delta < 0) { reclaim_frames(pms->proc, -delta); } } // Step 5: Admission control for waiting processes if (thrash == THRASHING_NONE && waiting_processes_exist()) { ProcessMemoryState* candidate = first_waiting(); if (can_admit(candidate)) { admit_process(candidate); } }} // ===== ADMISSION CONTROL =====bool can_admit(ProcessMemoryState* new_pms) { int required = new_pms->estimated_wss; int currently_used = sum_allocated_frames(); int reserve = total_frames * RESERVE_PERCENT; return (currently_used + required) <= (total_frames - reserve);}The Integrated System's Behavior:
Normal Operation:
Approaching Overload:
At Overload:
Recovery:
A well-designed system should rarely or never enter thrashing. Prevention layers should catch overload early. If thrashing occurs, it indicates insufficient prevention—either thresholds were wrong, or an unexpected workload spike occurred. Post-thrashing analysis should tune the system to prevent recurrence.
Modern operating systems and platforms have evolved sophisticated anti-thrashing mechanisms beyond the classic approaches.
Memory Cgroups in Detail:
Linux cgroups provide per-group memory limits:
# Set 1GB limit for a container
echo 1G > /sys/fs/cgroup/memory/mycontainer/memory.limit_in_bytes
# Set soft limit (reclaimed under pressure)
echo 512M > /sys/fs/cgroup/memory/mycontainer/memory.soft_limit_in_bytes
# Enable OOM killer for this group
echo 0 > /sys/fs/cgroup/memory/mycontainer/memory.oom_control
With cgroups:
Pressure Stall Information Use:
# Monitor memory pressure
$ cat /proc/pressure/memory
some avg10=0.50 avg60=0.30 avg300=0.10 total=123456
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
Tools like systemd can react to PSI:
# systemd unit snippet
[Unit]
MemoryPressureWatch=yes
MemoryPressureWatchInterval=2s
IOSchedulingPriority=7 # Deprioritize under memory pressure
| Platform | Primary Mechanism | Scope | Preemptive? |
|---|---|---|---|
| Linux Desktop | OOM killer, swappiness tuning | System-wide | Mostly reactive |
| Linux Container | Cgroups + OOM per container | Per-container | Can be proactive |
| Android | Low Memory Killer | Per-app | Proactive |
| Windows | Working set trimming + paging file | Per-process | Mixed |
| macOS | Memory compression + app nap | Per-app | Proactive |
| VMware ESXi | Ballooning + swapping + TPS | Per-VM | Proactive |
| Kubernetes | Pod eviction + limits | Per-pod | Proactive |
Modern systems often have multiple layers: application-level (well-behaved apps limit themselves), container-level (cgroups), OS-level (OOM killer), and hypervisor-level (ballooning). Each layer catches what lower layers miss. This defense in depth makes thrashing rare in well-configured systems.
We've explored thrashing in comprehensive depth—from its devastating dynamics to modern prevention mechanisms. Let's consolidate the key insights:
Module Complete:
This concludes our exploration of the Working Set Model and Page Fault Frequency. We've traced the complete arc from Denning's foundational theory through practical implementation to thrashing prevention. Together, these concepts form the intellectual backbone of memory management—enabling operating systems to efficiently share limited physical memory among competing processes while maintaining stability and performance.
The principles you've learned apply not just to traditional operating systems but to any resource management scenario: container orchestration, database buffer management, cache sizing, and distributed systems coordination. Wherever limited resources must be shared among competing demands, working set concepts and feedback-based allocation remain relevant.
Congratulations! You've mastered the Working Set Model and Page Fault Frequency module. You now understand the theoretical foundations (working set, locality), practical mechanisms (PFF, dynamic allocation), and failure prevention (thrashing detection and recovery). These concepts are essential for understanding how modern operating systems manage memory at scale.