Loading learning content...
In 1969, László Bélády of IBM Research discovered something that defied intuition: for certain page replacement algorithms, adding more physical memory can increase the number of page faults.
This discovery sent ripples through the operating systems community. Memory was expensive. Administrators bought more RAM expecting performance to improve. Yet Bélády demonstrated mathematically that their systems could actually perform worse with the upgrade.
This phenomenon, now known as Bélády's Anomaly (often written as Belady's Anomaly without the accent), remains one of the most counterintuitive results in computer science. Understanding it reveals deep truths about caching, the nature of FIFO, and why certain algorithm properties matter more than raw simplicity.
With FIFO page replacement, increasing the number of frames from 3 to 4 can transform a reference string that caused 9 page faults into one that causes 10 page faults. More memory, worse performance. This isn't a bug—it's a fundamental property of FIFO.
By the end of this page, you will understand what Belady's anomaly is, why it occurs, which algorithms are immune to it, and the theoretical framework (stack algorithms) that explains this behavior. You'll be able to construct examples exhibiting the anomaly and identify when an algorithm is susceptible.
Let's examine the most famous example of Belady's anomaly. We'll trace FIFO with the same reference string using 3 frames and then 4 frames, and observe the paradox directly.
Reference String: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
This 12-reference sequence was carefully constructed by Bélády to exhibit the anomaly.
FIFO with 3 Frames:
| Step | Page | Frame 0 | Frame 1 | Frame 2 | Result | Evicted |
|---|---|---|---|---|---|---|
| 1 | 1 | 1 | MISS | |||
| 2 | 2 | 1 | 2 | MISS | ||
| 3 | 3 | 1 | 2 | 3 | MISS | |
| 4 | 4 | 4 | 2 | 3 | MISS | 1 |
| 5 | 1 | 4 | 1 | 3 | MISS | 2 |
| 6 | 2 | 4 | 1 | 2 | MISS | 3 |
| 7 | 5 | 5 | 1 | 2 | MISS | 4 |
| 8 | 1 | 5 | 1 | 2 | HIT | |
| 9 | 2 | 5 | 1 | 2 | HIT | |
| 10 | 3 | 5 | 3 | 2 | MISS | 1 |
| 11 | 4 | 5 | 3 | 4 | MISS | 2 |
| 12 | 5 | 5 | 3 | 4 | HIT |
Result with 3 Frames: 9 page faults, 3 hits
FIFO with 4 Frames:
| Step | Page | Frame 0 | Frame 1 | Frame 2 | Frame 3 | Result | Evicted |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | MISS | ||||
| 2 | 2 | 1 | 2 | MISS | |||
| 3 | 3 | 1 | 2 | 3 | MISS | ||
| 4 | 4 | 1 | 2 | 3 | 4 | MISS | |
| 5 | 1 | 1 | 2 | 3 | 4 | HIT | |
| 6 | 2 | 1 | 2 | 3 | 4 | HIT | |
| 7 | 5 | 5 | 2 | 3 | 4 | MISS | 1 |
| 8 | 1 | 5 | 1 | 3 | 4 | MISS | 2 |
| 9 | 2 | 5 | 1 | 2 | 4 | MISS | 3 |
| 10 | 3 | 5 | 1 | 2 | 3 | MISS | 4 |
| 11 | 4 | 4 | 1 | 2 | 3 | MISS | 5 |
| 12 | 5 | 4 | 5 | 2 | 3 | MISS | 1 |
Result with 4 Frames: 10 page faults, 2 hits
The Anomaly:
| Metric | 3 Frames | 4 Frames | Difference |
|---|---|---|---|
| Page Faults | 9 | 10 | +1 (worse!) |
| Page Hits | 3 | 2 | -1 (worse!) |
| Fault Rate | 75% | 83.3% | +8.3% |
With one additional frame—33% more memory—performance degraded by over 11% in fault rate. This is Belady's anomaly in action.
This isn't a trick or edge case—it's a mathematically proven property of FIFO. For any number of frames n, there exists a reference string where FIFO with n+1 frames produces more faults than FIFO with n frames.
To understand Belady's anomaly, we need to examine what happens during the critical transitions in our example.
The Key Insight: Eviction Timing Matters
With 3 frames, certain pages get evicted at particular moments. With 4 frames, the eviction timing shifts—and this shift can be harmful.
Let's trace the critical moments:
After Step 4 (Page 4 loaded):
With 4 frames, page 1 is still in memory. This seems good.
Steps 5-6 (Pages 1, 2 referenced):
Wait, with 3 frames we had to reload 1 and 2, but with 4 frames we hit on them. So far, 4 frames is winning.
Step 7 (Page 5 loaded):
Here's where things diverge critically. With 3 frames, page 1 is in memory. With 4 frames, page 1 was just evicted!
Step 8 (Page 1 referenced):
Step 9 (Page 2 referenced):
The Cascade Effect:
With 3 frames, the eviction of page 1 at step 4 caused reloads at steps 5-6. But this "bad" situation actually set up a good state: pages 1 and 2 were fresh in memory for steps 8-9.
With 4 frames, keeping page 1 until step 7 seemed good, but evicting it at step 7 caused faults at steps 8-9. The "better" early situation led to a worse later situation.
The anomaly occurs because more frames delays evictions, but FIFO's eviction order is fixed by arrival time. Delaying an eviction can cause a page to be evicted at a worse time—just before it's needed again—rather than being evicted earlier when there's time to reload it before it's needed.
Not all page replacement algorithms suffer from Belady's anomaly. Those that are immune share a common property: the stack property (also called the inclusion property).
Definition:
An algorithm satisfies the stack property if and only if:
For any reference string, the set of pages in memory with n frames is always a subset of the set of pages in memory with n+1 frames.
Formally, if S_n(t) is the set of pages in memory at time t with n frames:
S_n(t) ⊆ S_{n+1}(t) for all t and all n
Intuition:
If an algorithm has the stack property, then having more memory means you keep everything you would have kept with less memory, plus potentially more. You never "lose" a page by having more frames.
Why the Stack Property Prevents Anomaly:
With the stack property:
The stack property guarantees monotonic improvement: more memory → fewer (or equal) faults. Never worse.
Does FIFO Have the Stack Property?
No. Our example proves this:
Actually, let's verify at step 7:
Page 1 is in the 3-frame set but NOT in the 4-frame set. This violates the stack property: S_3(7) ⊄ S_4(7).
| Algorithm | Has Stack Property? | Susceptible to Anomaly? |
|---|---|---|
| FIFO | No | Yes |
| LRU (Least Recently Used) | Yes | No |
| LFU (Least Frequently Used) | Yes | No |
| OPT (Optimal) | Yes | No |
| Random | No | Yes (probabilistically) |
| Clock (Second Chance) | No | Yes |
| NRU (Not Recently Used) | No | Yes |
LRU always evicts the least recently used page. With more frames, the LRU page with n+1 frames is at least as old as the LRU page with n frames. So anything evicted with n frames would also be evicted with n+1 frames, but the reverse isn't true. This asymmetry gives LRU the stack property.
Let's formalize our understanding with mathematical rigor.
Notation:
Bélády's Theorem (1969):
For the FIFO page replacement algorithm, there exist reference strings R and frame counts n such that F_{n+1}(R) > F_n(R).
Constructive Proof:
The reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 with n=3 and n+1=4 provides a constructive proof:
Since 10 > 9, the theorem is proved.
Characterizing Anomaly-Prone Reference Strings:
Not all reference strings exhibit the anomaly. The anomaly requires:
Bounding the Anomaly:
While the anomaly exists, it's bounded:
Probability Analysis:
For random reference strings with uniform distribution over p distinct pages:
While Belady's anomaly is mathematically proven and can be demonstrated, it's relatively rare in practice. Real workloads typically show improved performance with more memory. However, the anomaly's existence disqualifies FIFO from situations where guaranteed monotonic improvement is required.
Understanding how to construct anomaly-exhibiting reference strings deepens comprehension of the phenomenon.
Recipe for Anomaly Construction:
Example Construction (n=2, n+1=3):
123456789101112131415161718192021222324252627282930313233343536373839404142434445
def construct_anomaly(n): """ Construct a reference string exhibiting Belady's anomaly between n and n+1 frames. """ # Phase 1: Fill n+1 frames # Reference pages 1, 2, ..., n+1 filling = list(range(1, n + 2)) # Phase 2: Reference first n pages (hits with n+1, may fault with n) # This exploits the extra frame holding page n+1 early_refs = list(range(1, n + 1)) # Phase 3: New page triggers evictions # With n frames: evicts page 1 (oldest) # With n+1 frames: evicts page 1 (oldest) new_page = n + 2 # Phase 4: Reference pages that diverge between n and n+1 # The goal is to reference pages present with n but not n+1 divergence = list(range(1, n + 1)) # Phase 5: More new pages to continue divergence final_phases = [n + 3, n + 4] if n >= 3 else [n + 3] return filling + early_refs + [new_page] + divergence + final_phases # Example for n=3:# Result: 1,2,3,4, 1,2,3, 5, 1,2,3, 6,7# This reliably shows anomaly between 3 and 4 frames def verify_anomaly(ref_string, n): """Verify that ref_string shows anomaly between n and n+1 frames.""" faults_n = simulate_fifo(ref_string, n) faults_n1 = simulate_fifo(ref_string, n + 1) print(f"Frames={n}: {faults_n} faults") print(f"Frames={n+1}: {faults_n1} faults") if faults_n1 > faults_n: print(f"✓ ANOMALY DETECTED: {faults_n1 - faults_n} more faults with more memory!") return True else: print("✗ No anomaly with this reference string") return FalseKnown Anomaly-Exhibiting Patterns:
| Reference String | Best n | Faults(n) | Faults(n+1) | Difference |
|---|---|---|---|---|
| 1,2,3,4,1,2,5,1,2,3,4,5 | 3 | 9 | 10 | +1 |
| 1,2,3,4,5,1,2,3,4,5 | 4 | 10 | 10 | 0 (borderline) |
| 1,2,3,1,2,3,1,2,3,4 | 2 | 7 | 7 | 0 (no anomaly) |
| 1,2,3,4,5,6,1,2,3 | 3 | 9 | 9 | 0 (barely) |
| 3,2,1,0,3,2,4,3,2,1,0,4 | 3 | 9 | 10 | +1 |
To systematically find anomaly examples: (1) Generate reference strings of length m over p pages, (2) Simulate FIFO for all frame counts from 1 to p, (3) Check if faults ever increase as frames increase. This brute-force approach reveals anomaly instances reliably.
While Belady's anomaly is mathematically fascinating, what are its real-world implications?
Why It Matters (Historically):
In the 1960s-70s, when Bélády discovered this phenomenon:
Why It's Less Critical Today:
Modern algorithms avoid FIFO: Production OS kernels use LRU approximations (Clock algorithm), which don't exhibit the anomaly
Memory is abundant: Most systems have enough RAM that page replacement is infrequent
Workloads are locality-dominated: Real programs have strong locality, making anomaly-prone reference patterns rare
SSD backing stores: When page faults do occur, SSDs make them much faster than HDD-era faults
Decision Framework:
When choosing whether to use FIFO despite the anomaly risk:
| Factor | Favor FIFO | Avoid FIFO |
|---|---|---|
| Workload pattern | Sequential/streaming | Looping/random |
| Memory abundance | Frames ≫ working set | Frames ≈ working set |
| Upgrade expectations | None planned | Memory may be added |
| Determinism needed | High (reproducible) | Low |
| Implementation constraint | Severe (minimal code) | Relaxed |
| Performance criticality | Low | High |
If you're using FIFO and considering adding memory to improve performance: test first! Simulate your workload with the new frame count. Belady's anomaly means your intuition that 'more is better' could be wrong.
Belady's anomaly connects to broader concepts in caching and algorithm theory.
Competitive Analysis:
In competitive analysis, algorithms are compared against optimal offline algorithms (OPT). For page replacement:
This illustrates that competitive ratio alone doesn't capture all relevant algorithm properties.
The Monotonicity Property:
Belady's anomaly is about monotonicity: does increasing resources always help (or at least not hurt)?
Algorithms can be classified:
FIFO is non-monotonic. LRU is weakly monotonic.
Connection to Scheduling Theory:
Similar anomalies exist in other domains:
All these phenomena share a common structure: local improvements (more resources, more choice) don't guarantee global improvements due to system dynamics.
Information-Theoretic View:
FIFO uses only one piece of information: arrival order. LRU uses two: arrival order AND access history.
The additional information in LRU allows it to maintain the stack property. FIFO's limited information makes it susceptible to pathological cases where arrival order doesn't correlate with future utility.
Belady's anomaly teaches that algorithm design must consider edge cases and theoretical properties, not just average-case behavior. An algorithm that's 'usually fine' may have surprising failure modes. The stack property provides a theoretical guarantee that FIFO lacks.
Belady's anomaly stands as one of the most counterintuitive results in systems design. Let's consolidate what we've learned:
What's Next:
Having understood FIFO's most surprising property, we'll now examine its performance characteristics more broadly. The next page analyzes FIFO's performance issues systematically—when it performs well, when it performs poorly, and how to predict its behavior for different workloads.
You now understand Belady's anomaly—not just that it exists, but why it occurs, how to demonstrate it, and what it means for algorithm selection. This understanding elevates your comprehension of caching systems from practical to theoretical.