Loading learning content...
In the previous page, we defined thrashing and observed its devastating effects. Now we examine the root cause: insufficient frame allocation. Understanding this cause is essential because it reveals that thrashing is not random or unpredictable—it's the direct, deterministic result of a fundamental resource shortage.
The core insight is straightforward but profound: thrashing occurs when processes receive fewer frames than they need to hold their working sets in memory. This page explores every dimension of this problem.
By the end of this page, you will understand precisely why insufficient frames cause thrashing, how the relationship between frame allocation and page faults creates instability, the role of the operating system scheduler in exacerbating the problem, and why some allocation strategies are more vulnerable than others.
Every operating system faces a fundamental resource allocation challenge: how should limited physical memory frames be distributed among competing processes? This is the frame allocation problem, and its solution directly determines whether the system thrashes.
The Constraints:
The Mathematical Reality:
Let's formalize the constraint. Given:
The fundamental constraint is:
∑ f(i) ≤ m (Cannot allocate more frames than exist)
For stability, we need:
f(i) ≥ WSS(i) for all i (Each process needs enough frames)
Combining these:
∑ WSS(i) ≤ m (Sum of working sets must fit in memory)
When this condition is violated—when processes collectively need more frames than exist—thrashing is inevitable.
Modern operating systems allow memory overcommitment—processes can request more virtual memory than physical memory exists. This is normally fine because processes rarely use all allocated memory simultaneously. But when memory pressure increases and processes actually need their allocations, the bill comes due. There's no way to satisfy everyone—thrashing begins.
Before discussing how many frames processes should receive, we must establish how many they must receive. Every process has a minimum frame requirement—the bare minimum frames needed for the process to execute at all.
This minimum is determined by the architecture's instruction set and addressing modes:
| Addressing Mode | Pages Potentially Accessed | Minimum Frames Required |
|---|---|---|
| Direct addressing | Instruction page + operand page | 2 |
| Indirect addressing | Instruction + pointer + data pages | 3 |
| Indexed addressing | Instruction + base + index + data | 4 |
| Memory-to-memory | Instruction + source + destination | 3 |
| Complex (VAX MOVC3) | Instruction + control + src + dst segments | 6+ |
Why the Minimum Matters:
Consider an instruction like MOV [addr1], [addr2] that copies data from one memory location to another:
Now imagine the system allocates only 2 frames to this process:
Neither option allows the instruction to complete!
If a process receives fewer than its minimum frames, it cannot complete even a single instruction. Every attempt to make progress causes a page fault, which evicts a page needed for the current instruction. The process enters an infinite fault loop—a degenerate form of thrashing where literally no progress is possible. The OS must guarantee the minimum or refuse to run the process.
Minimum vs. Adequate:
Meeting the minimum prevents infinite loops but doesn't prevent thrashing. The minimum allows a single instruction to complete, but if the working set far exceeds the minimum, the process will still thrash—just not infinitely.
| Frames Allocated | Behavior |
|---|---|
| < Minimum | Cannot complete any instruction; infinite fault loop |
| = Minimum | Can execute but severe thrashing; every instruction causes faults |
| < Working Set | Thrashing; frequent page faults degrade performance |
| ≥ Working Set | Normal operation; occasional faults at phase transitions |
Thrashing fundamentally stems from a demand-supply imbalance. Processes demand frames (through their working sets), and the system supplies them (through physical memory). When demand exceeds supply, something must give.
Visualizing the Imbalance:
DEMAND SIDE SUPPLY SIDE
┌─────────────────────┐ ┌─────────────────────┐
│ Process 1: Web │ │ │
│ Server (WSS=800) │ │ Total Physical │
├─────────────────────┤ │ Memory: 2048 │
│ Process 2: DB │ │ frames │
│ Server (WSS=600) │ │ │
├─────────────────────┤ │ Kernel Reserve: │
│ Process 3: App │ │ 256 frames │
│ Server (WSS=500) │ │ │
├─────────────────────┤ │ Buffer Cache: │
│ Process 4: Redis │ │ 256 frames │
│ Cache (WSS=400) │ │ │
└─────────────────────┘ │ Available: │
│ 1536 frames │
Total Demand: └─────────────────────┘
800+600+500+400 = 2300 frames
DEMAND (2300) > SUPPLY (1536) → THRASHING INEVITABLE
In this scenario, there's no allocation strategy that prevents thrashing. The collective working set exceeds available memory by 764 frames (50% oversubscribed).
When demand exceeds supply, frame allocation becomes a zero-sum game. Every frame given to one process is a frame taken from another. The OS can reshuffle frames endlessly, but the total shortage remains. This is why thrashing cannot be solved by cleverer allocation—only by reducing demand (fewer processes, smaller working sets) or increasing supply (more RAM).
Sources of Demand Increase:
Understanding where demand comes from helps prevent thrashing:
| Demand Source | Description | Example |
|---|---|---|
| New processes | Scheduler admits more work | Fork bomb, service startup |
| Process growth | Existing process expands working set | Loading large dataset |
| Phase transition | Process enters memory-intensive phase | Initialization → main loop |
| Shared page invalidation | Pages become private copies | Copy-on-write after fork |
| Memory leaks | Process accumulates unreleased memory | Application bug |
Sources of Supply Decrease:
| Supply Reduction | Description | Example |
|---|---|---|
| Kernel expansion | OS needs more memory | Module loading, buffer growth |
| Memory fragmentation | Usable but not contiguous | Huge page allocation failure |
| Hot plugging | Physical memory removed | NUMA node offline |
| Reservation increase | More memory pre-allocated | Huge page reservation |
Different frame allocation policies handle memory pressure differently, but all eventually fail when demand exceeds supply. Understanding how each policy fails illuminates the fundamental nature of the problem.
The Priority Allocation Trap:
Priority-based allocation gives more frames to high-priority processes. This seems sensible—important work should proceed smoothly. But it creates perverse effects:
The low-priority processes, unable to make progress, cannot release resources the high-priority processes need.
The key insight is that allocation policies only determine who suffers—they cannot solve the underlying shortage. Equal allocation spreads suffering equally. Proportional allocation spreads it proportionally. Priority allocation concentrates suffering on low-priority processes. But in all cases, total demand exceeds total supply, so someone thrashes.
Comparative Failure Analysis:
| Policy | Frame Distribution | Who Thrashes | System Throughput |
|---|---|---|---|
| Equal | Uniform | Large processes | Highest if all similar size |
| Proportional | By VM size | All (proportionally) | Often best overall |
| Priority | By importance | Low-priority processes | High-priority work proceeds |
| First-Come | Arrival order | Latecomers | Early processes succeed |
| Working Set | By actual usage | Last to arrive | Best utilization, complex |
Each policy makes different tradeoffs, but none escapes the fundamental constraint.
One of the most tragic aspects of thrashing is how the operating system's own scheduling logic accelerates the collapse. The scheduler, following reasonable heuristics, makes decisions that worsen the very problem it should prevent.
The Scheduler's Logic:
The scheduler's error is treating a memory problem as a CPU problem. Low CPU utilization can mean either (a) not enough processes, or (b) all processes blocked on I/O. The standard scheduler assumes (a) and acts accordingly. But when (b) is the cause—and specifically when I/O is page fault service—adding more processes is exactly wrong.
Timeline of Scheduler-Induced Collapse:
Time Event CPU% Page Faults/s Processes
───── ───────────────────────────── ──── ───────────── ─────────
t₀ Normal operation 80% 50 10
t₁ Memory pressure begins 75% 100 10
t₂ Scheduler observes low CPU 70% 200 10
t₃ Scheduler admits 2 new procs 60% 500 12
t₄ Thrashing intensifies 40% 1,500 12
t₅ Scheduler admits 3 more procs 20% 5,000 15
t₆ System nearly unresponsive 5% 20,000 15
t₇ Scheduler still trying 2% 50,000 18
t₈ Complete collapse <1% 100,000+ 18
Notice how CPU utilization and page fault rate move in opposite directions. The scheduler's attempts to raise CPU utilization only accelerate the collapse.
The Missing Feedback:
The core problem is that the scheduler lacks memory-aware feedback. It optimizes for CPU utilization without considering memory constraints:
Modern schedulers incorporate memory pressure signals, but the fundamental tension remains.
We can formalize the relationship between frame shortage and thrashing intensity using quantitative analysis.
Define the shortage ratio S as:
S = (∑ WSS(i) - m) / m
Where:
If S ≤ 0: No shortage, system stable If S > 0: Shortage exists, thrashing risk If S > 0.5: Severe shortage, heavy thrashing
Page Fault Rate as a Function of Shortage:
Empirical studies show that page fault rate increases non-linearly with shortage ratio:
Page Fault Rate
▲
│
1.0 ┤ ╱ Saturation
│ ╱
0.8 ┤ ╱
│ ╱
0.6 ┤ ╱
│ ╱ Thrashing zone
0.4 ┤ ╱
│ ╱
0.2 ┤ ╱
│ ╱
0.1 ┤──────────╱
│ Stable ╱
└─────────┴────┴────┴────┴────┴────► Shortage Ratio
0 0.2 0.4 0.6 0.8 1.0
The curve shows:
| Shortage Ratio | Expected Fault Rate | Effective Throughput | User Experience |
|---|---|---|---|
| S ≤ 0 (adequate) | < 1% | ~100% | Normal, responsive |
| 0 < S ≤ 0.1 | 1-5% | 80-95% | Slightly sluggish |
| 0.1 < S ≤ 0.3 | 5-20% | 50-80% | Noticeably slow |
| 0.3 < S ≤ 0.5 | 20-50% | 20-50% | Very slow, frustrating |
| 0.5 < S ≤ 0.8 | 50-80% | 5-20% | Nearly unusable |
| S > 0.8 | 80% | < 5% | System appears frozen |
The Non-Linear Cliff:
The transition from "slightly sluggish" to "completely unusable" can occur with a small change in shortage ratio. A system at S=0.2 might feel "a bit slow." At S=0.4, it's "very slow." At S=0.6, it's "frozen."
This non-linearity makes thrashing dangerous:
The page replacement scope—global or local—significantly affects how frame shortage manifests and how thrashing propagates through the system.
Under global replacement, a single memory-intensive process can cause system-wide thrashing. Imagine a runaway process that rapidly touches a huge working set. With global replacement, it continually evicts pages from other processes. Those processes then page fault, trying to reload evicted pages. But the aggressive process evicts them again. The entire system grinds to a halt because of one misbehaving process.
Scenario Comparison:
Consider a system with 1000 frames and 4 processes:
With Global Replacement:
With Local Replacement:
Tradeoff Summary:
| Aspect | Global Replacement | Local Replacement |
|---|---|---|
| Memory utilization | Higher | Lower |
| Fairness | None guaranteed | Guaranteed |
| Isolation | None | Complete |
| Thrashing propagation | System-wide | Per-process |
| Optimal for | Cooperative workloads | Adversarial workloads |
| Implementation | Simpler | More complex |
Now that we understand why insufficient frames cause thrashing, we can preview the solutions. All solutions address the fundamental equation: ∑ WSS(i) > m.
There are only three approaches:
| Approach | Mechanism | Tradeoff | When Appropriate |
|---|---|---|---|
| Add RAM | Install more memory | Cost | Long-term solution |
| Swap out processes | Suspend low-priority processes | Reduced concurrency | Temporary overload |
| Working set tracking | Monitor actual page usage | Overhead | Dynamic workloads |
| PFF control | Adjust allocation based on fault rate | Complexity | Proactive prevention |
| Load shedding | Reject new work when saturated | Lost requests | Server systems |
The most robust solution combines detection with action. First, detect when thrashing is occurring or imminent (high page fault rate, memory pressure). Then, take corrective action (reduce multiprogramming, swap processes). Prevention is better than cure—tracking working sets allows proactive management before thrashing begins.
What's Next:
With the cause clearly understood—insufficient frames relative to total working set demand—the next page examines the symptom that most directly indicates thrashing: high page fault rate. Understanding this symptom enables detection and informed response.
You now understand the root cause of thrashing: frame shortage relative to working set demand. You've seen how allocation policies fail, how the scheduler worsens the problem, and why both global and local replacement have limitations. Next, we explore the high page fault rate that signals thrashing.