Operating SystemsPage Replacement Need

Page Replacement Need

LevelIntermediate

Duration60 mins

TopicPage Replacement Need

3 / 5

Victim Selection

The Million-Dollar Decision

In 1966, László Bélády at IBM Research published a proof that would shape memory management for decades. He demonstrated that the optimal page to replace is the one that will not be used for the longest time in the future. This sounds obvious in retrospect, but Bélády proved something profound: knowing which page to evict is computationally equivalent to predicting the future.

Since we cannot predict the future, every practical page replacement algorithm is an approximation—a heuristic guessing which page is least likely to be needed soon. Some approximations are better than others. Some fall prey to pathological behaviors. Some are simple to implement; others require elaborate data structures.

Victim selection is the moment of truth in page replacement. Choose well, and the evicted page isn't needed again for a long time. Choose poorly, and you'll fault it back almost immediately—wasting two disk I/O operations and degrading system performance.

What You Will Learn

By the end of this page, you will understand: (1) The theoretical optimal solution and why it's impossible to implement, (2) How heuristics approximate optimal behavior using observable information, (3) The tradeoffs between selection accuracy and implementation cost, (4) Criteria for evaluating victim selection quality, and (5) The concept of stack algorithms and Bélády's anomaly.

The Optimal Algorithm (OPT / Bélády's Algorithm)

Before exploring practical algorithms, we must understand the theoretical ideal. Bélády's Optimal Algorithm (OPT) provides the theoretical minimum number of page faults possible for any given reference string and frame count.

The OPT Rule:

Replace the page that will not be used for the longest time in the future.

This is provably optimal: any other choice could be improved by swapping it with the OPT choice without increasing faults.

Example:

Reference string: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1
Frames available: 3

Step-by-step OPT execution:

Ref: 7  Frames: [7, -, -]     Fault (cold start)
Ref: 0  Frames: [7, 0, -]     Fault (cold start)
Ref: 1  Frames: [7, 0, 1]     Fault (cold start)
Ref: 2  Frames: [2, 0, 1]     Fault: Replace 7 (used at position 17)
Ref: 0  Frames: [2, 0, 1]     Hit
Ref: 3  Frames: [2, 0, 3]     Fault: Replace 1 (used at position 13)
Ref: 0  Frames: [2, 0, 3]     Hit
Ref: 4  Frames: [2, 0, 4]     Fault: Replace 3 (used at position 9)
Ref: 2  Frames: [2, 0, 4]     Hit
Ref: 3  Frames: [3, 0, 4]     Fault: Replace 2 (used at position 12)...

... continuing this process yields 9 total faults for 20 references.

No algorithm can do better than OPT for this reference string with 3 frames.

The Impossibility Problem

OPT requires knowledge of future memory accesses—which doesn't exist. At runtime, the OS only knows the past, not the future. OPT is useful only for: (1) Analyzing trace data after execution, (2) Benchmarking other algorithms (how close do they get to optimal?), and (3) Theoretical understanding of limits.

Why OPT Matters Despite Being Impractical:

OPT establishes an upper bound on achievable performance. When evaluating real algorithms, we compare them to OPT:

Algorithm Quality = OPT Faults / Algorithm Faults × 100%

An algorithm achieving 80% of OPT quality (only 25% more faults) is reasonably good. An algorithm achieving only 40% (150% more faults) has significant room for improvement.

The Insight That Enabled Progress:

Since we can't know the future, we must infer it from the past. This is where the principle of locality becomes crucial:

Pages accessed in the recent past are likely to be accessed in the near future.

This principle—backed by extensive empirical observation—allows us to approximate OPT:

OPT asks: "Which page won't be used for the longest future time?"
We approximate: "Which page hasn't been used for the longest past time?"

This transforms an impossible question into a tractable one.

This approximation is the foundation of LRU (Least Recently Used) and its variants—the most successful family of page replacement algorithms.

Observable Information for Victim Selection

Practical algorithms can only use information that's actually observable. Understanding what's available helps us appreciate why different algorithms exist and what tradeoffs they make.

Category 1: Temporal Information

When was the page last used?

Information	Collection Method	Algorithm Using It
Load time	Record timestamp when page loaded	FIFO
Last access time	Update timestamp on every access	LRU
Last reference window	Reference bit + periodic clearing	Clock/Second-Chance

Category 2: Frequency Information

How often is the page used?

Information	Collection Method	Algorithm Using It
Access count	Increment counter on each access	LFU
Recent access count	Counter with decay over time	LRFU

Category 3: State Information

What is the current state of the page?

Information	Source	Why It Matters
Dirty (modified) bit	Hardware	Dirty pages require write before eviction
Reference bit	Hardware	Recently referenced pages likely needed
Page type	OS metadata	Different page types have different value
Sharing count	Reverse mapping	Shared pages affect multiple processes

Category 4: Process Context

What do we know about the owning process?

Information	Source	Why It Matters
Process priority	Scheduler	High-priority processes may need faster access
Process activity	Scheduler	Idle processes unlikely to need pages
Working set membership	Working set tracking	Pages in WS are actively needed
Memory allocation	Resource accounting	Over-allocating processes may be deprioritized

Information Quality vs Cost Tradeoff
Information Type	Quality of Prediction	Collection Cost	Example Algorithm
Load time only	Poor	Very Low	FIFO
Reference bit (1-bit)	Moderate	Low	Clock
Reference + Dirty bits	Good	Low	Enhanced Clock
Exact access timestamps	Excellent	High	LRU (ideal)
Access frequency + recency	Excellent	Very High	ARC, LRFU

The Cost-Benefit Principle

More information enables better decisions, but collecting and maintaining that information has overhead. The best algorithms find the sweet spot: enough information to make good decisions without spending more resources on tracking than they save by avoiding faults.

Selection Criteria Hierarchy

Victim selection typically involves multiple criteria applied in priority order. Understanding this hierarchy clarifies how algorithms combine different signals.

Priority 1: Evictability

Some pages simply cannot be evicted:

Unevictable Pages (Absolute Exclusions):
├── Kernel text and data pages
├── Locked pages (mlock, mlockall)
├── Pages undergoing DMA
├── Active file system buffers
└── Pages pinned by drivers

These must be skipped entirely.

Priority 2: Cost Reduction

Among evictable pages, prefer those cheaper to evict:

Cost Preference Order:
1. Clean pages (no disk write needed)
2. Dirty pages backed by swap (write to swap)
3. Dirty pages backed by file (write to file)
4. Pages requiring special handling

Rationale: Clean page eviction is instant.
           Dirty page eviction requires disk I/O.

Priority 3: Recency/Frequency Heuristics

•Not Recently Referenced — Pages with cleared reference bits haven't been accessed since last scan; safer to evict
•Not Recently Modified — Pages that remain clean are accessed read-only; eviction cost lower
•Oldest in Memory — Simple FIFO fallback when other signals tie
•Least Recently Used — Pages unused for longest time likely unneeded
•Least Frequently Used — Pages accessed rarely have lower value

Composite Selection Example: Enhanced Second-Chance

This algorithm combines reference and dirty bits into a 4-class priority system:

Class 0: Reference=0, Dirty=0  →  Best victim
         Not recently used, clean

Class 1: Reference=0, Dirty=1  →  Good victim
         Not recently used, but dirty

Class 2: Reference=1, Dirty=0  →  Poor victim
         Recently used, but clean

Class 3: Reference=1, Dirty=1  →  Worst victim
         Recently used AND dirty

Algorithm scans pages in circular order:

First pass: look for Class 0 page
If none found, clear reference bits and look for Class 1
If none found, look for Class 2
If none found, look for Class 3
Repeat with freshly cleared bits

The Multi-pass ensures we prefer clean, unreferenced pages.

Priority 4: Page Type Differentiation (Linux)

Linux distinguishes page types:

Page Type Priority (lower = evict first):
1. Clean file-backed pages (can reload from file)
2. Dirty file-backed pages (must write, but recoverable)
3. Anonymous pages (must swap, no backing file)
4. Mapped executable pages (may cause major faults)

File pages can be evicted without swap; anonymous pages require swap space.

Linux's vm.swappiness

The vm.swappiness parameter (0-100) controls the relative weight of reclaiming anonymous vs file pages. At 0, Linux strongly prefers evicting file pages. At 100, it equally considers both. This allows tuning for workloads that are file-I/O heavy (prefer low swappiness) vs compute-heavy with large anonymous allocations (may tolerate higher swappiness).

The Cost of Poor Selection

What happens when victim selection goes wrong? Understanding the consequences helps appreciate why algorithm choice matters.

Scenario: Immediate Re-fault

The worst outcome is evicting a page that's immediately needed:

Time T+0: Page fault on page A
          Select page B as victim
          Evict B, load A
          Cost: 1 disk read (+ 1 write if dirty)

Time T+1: Instruction needs page B
          Page fault on page B
          Select page C as victim
          Evict C, load B
          Cost: 1 disk read (+ 1 write if dirty)

Time T+2: Instruction needs page C
          ... cascade continues ...

This is thrashing: constant page faults, no useful work.

Quantifying the Impact:

Let's compare good vs poor victim selection:

Good Selection (pages not needed soon):
  - Evict page X at T+0
  - Page X re-faulted at T+1000
  - 1000 useful memory accesses between faults
  - Effective overhead: 1 fault / 1000 accesses = 0.1%

Poor Selection (pages needed immediately):
  - Evict page X at T+0
  - Page X re-faulted at T+10
  - 10 useful memory accesses between faults
  - Effective overhead: 1 fault / 10 accesses = 10%

The poor selection is 100x worse!

Selection Quality Impact on System Performance
Selection Quality	Avg Time Between Re-faults	Page Fault Rate	System Behavior
Excellent (near OPT)	10,000 references	< 0.01%	Near-native performance
Good	1,000 - 10,000 references	0.01-0.1%	Slightly elevated latency
Acceptable	100 - 1,000 references	0.1-1%	Noticeable slowdowns
Poor	10 - 100 references	1-10%	Significant degradation
Pathological	< 10 references	10%	Thrashing, unusable

Real-World Example: FIFO Under Scan Workload

Consider a sequential scan through data larger than memory:

Scenario:
  - 1 GB file, 4 KB pages = 262,144 pages
  - 512 MB memory = 131,072 frames
  - Sequential read from start to end

With FIFO:
  - Pages loaded in order: 0, 1, 2, ... 131,071
  - Memory full, start evicting oldest
  - Evict page 0, load page 131,072
  - Evict page 1, load page 131,073
  - ... continue for entire file ...

  Result: Every single page faults once.
  Total faults: 262,144 (100% fault rate)

  This is actually acceptable for sequential access!

The Disaster Case: Looping Scan

Scenario:
  - Same 1 GB file
  - Program loops: read entire file, repeat

With FIFO after first pass:
  - Memory contains pages 131,072 - 262,143
  - Loop restarts, needs page 0
  - Page 0 was evicted! Fault.
  - Evict 131,072 to load 0
  - Next access: page 1 was evicted! Fault.
  - Evict 131,073 to load 1
  - ... every single access faults ...

With Perfect LRU:
  - Memory contains pages 131,072 - 262,143
  - Loop restarts, needs page 0
  - Evict page 131,072 (least recently used)
  - Access page 1... also faulted
  - Still bad, but the access pattern defeats all algorithms

This demonstrates: no algorithm can help when working set > memory.

The Impossibility Bound

When the working set exceeds available memory, no victim selection algorithm can prevent thrashing. The only solutions are: (1) Add more memory, (2) Reduce working set size, (3) Swap out entire processes to free bulk memory, or (4) Terminate processes (OOM killer). Victim selection algorithms optimize within feasibility bounds—they can't create physical memory that doesn't exist.

Bélády's Anomaly

One might expect that adding more frames to a system would always reduce page faults—more memory means more pages can stay resident. Surprisingly, this intuition is wrong for some algorithms.

Bélády's Anomaly is the counterintuitive phenomenon where increasing the number of frames can increase the number of page faults for certain algorithms and reference strings.

Classic Example with FIFO:

Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5

With 3 frames:
┌─────┬───────────────┬───────────────┐
│ Ref │    Frames     │    Event      │
├─────┼───────────────┼───────────────┤
│  1  │ [1, -, -]     │ Fault         │
│  2  │ [1, 2, -]     │ Fault         │
│  3  │ [1, 2, 3]     │ Fault         │
│  4  │ [4, 2, 3]     │ Fault (evict 1) │
│  1  │ [4, 1, 3]     │ Fault (evict 2) │
│  2  │ [4, 1, 2]     │ Fault (evict 3) │
│  5  │ [5, 1, 2]     │ Fault (evict 4) │
│  1  │ [5, 1, 2]     │ Hit           │
│  2  │ [5, 1, 2]     │ Hit           │
│  3  │ [5, 3, 2]     │ Fault (evict 1) │
│  4  │ [5, 3, 4]     │ Fault (evict 2) │
│  5  │ [5, 3, 4]     │ Hit           │
└─────┴───────────────┴───────────────┘
Total faults with 3 frames: 9

With 4 frames:
┌─────┬─────────────────┬───────────────┐
│ Ref │     Frames      │    Event      │
├─────┼─────────────────┼───────────────┤
│  1  │ [1, -, -, -]    │ Fault         │
│  2  │ [1, 2, -, -]    │ Fault         │
│  3  │ [1, 2, 3, -]    │ Fault         │
│  4  │ [1, 2, 3, 4]    │ Fault         │
│  1  │ [1, 2, 3, 4]    │ Hit           │
│  2  │ [1, 2, 3, 4]    │ Hit           │
│  5  │ [5, 2, 3, 4]    │ Fault (evict 1) │
│  1  │ [5, 1, 3, 4]    │ Fault (evict 2) │
│  2  │ [5, 1, 2, 4]    │ Fault (evict 3) │
│  3  │ [5, 1, 2, 3]    │ Fault (evict 4) │
│  4  │ [4, 1, 2, 3]    │ Fault (evict 5) │
│  5  │ [4, 5, 2, 3]    │ Fault (evict 1) │
└─────┴─────────────────┴───────────────┘
Total faults with 4 frames: 10

10 > 9: Adding a frame INCREASED faults!

Why Does This Happen?

Bélády's anomaly occurs because FIFO's eviction order depends on load order, which changes when more frames are available. With different eviction timing, pages that were hits become misses and vice versa. The net effect can be negative. This counterintuitive behavior reveals that FIFO doesn't capture the true 'value' of keeping a page resident.

Stack Algorithms: Immune to Bélády's Anomaly

A class of algorithms called stack algorithms are provably immune to Bélády's anomaly. These algorithms maintain the property:

The set of pages in memory with n frames is always a subset of pages in memory with n+1 frames.

Formal Definition:

An algorithm is a stack algorithm if, at every point in time:

S(t, n) ⊆ S(t, n+1) for all n

Where S(t, n) = set of pages in memory at time t with n frames

Examples:

OPT is a stack algorithm
LRU is a stack algorithm
FIFO is NOT a stack algorithm

The Stack Property Guarantees:

For stack algorithms:
  Faults(n frames) ≥ Faults(n+1 frames)

More frames never hurts.

Why LRU Has the Stack Property:

LRU maintains a strict ordering by recency. With n frames, the n most recently used pages are resident. With n+1 frames, the n+1 most recently used are resident—which includes all of the previous n. Thus:

LRU: Pages in memory with 3 frames at time t ⊆ Pages with 4 frames at time t

This is always true because both are "most recent k pages."

Practical Implication:

When choosing between algorithms:

Stack algorithms (LRU, OPT): Safe, predictable scaling
Non-stack algorithms (FIFO): May exhibit anomalous behavior

Implementation Tradeoffs

Each victim selection strategy involves tradeoffs between multiple dimensions. Understanding these helps explain why simple algorithms often win in practice.

Dimension 1: Selection Accuracy vs Overhead

More Accurate Selection:
  + Fewer page faults
  + Better system throughput
  - Higher per-access bookkeeping
  - More complex data structures
  - Longer critical sections (potential contention)

Less Accurate Selection:
  + Simpler implementation
  + Lower per-access overhead  
  + Faster victim identification
  - More page faults
  - Potentially pathological cases

Example: Perfect LRU vs Clock

Aspect	Perfect LRU	Clock
Selection quality	Optimal among history-based	~80-90% of LRU
Per-access cost	Update linked list or counter	None (hardware reference bit)
Selection cost	O(1) with proper data structure	O(n) worst case scan
Memory overhead	8+ bytes per page	1 bit per page
Scalability	Limited by list operations	Excellent

Dimension 2: Memory Overhead

Per-Page Metadata Requirements:

FIFO:    ~8 bytes (position in queue)
LRU:     ~16 bytes (list pointers + timestamp)
LFU:     ~12 bytes (pointer + counter)
Clock:   ~0 bytes (uses hardware reference bit)
ARC:     ~32 bytes (dual lists + ghost entries)

With 16 GB RAM and 4 KB pages:
  Total pages: 4 million
  
  FIFO:  32 MB metadata
  LRU:   64 MB metadata  
  LFU:   48 MB metadata
  Clock: 0 MB additional
  ARC:   128 MB metadata

Dimension 3: Scalability with CPUs

Multiprocessor systems face contention on shared data structures:

Global LRU list (naive):
  Every memory access updates list position
  List protected by lock
  Lock becomes severe bottleneck
  
Per-CPU or partitioned structures:
  Each CPU has local list
  Global coordination only on eviction
  Much better scalability
  Slightly less accurate

Algorithm Comparison Summary
Algorithm	Accuracy	Implementation Complexity	Scalability	Real-World Use
OPT	Perfect	Impossible online	N/A	Benchmark only
LRU (exact)	Excellent	High	Poor	Rarely used
LRU (approx)	Very Good	Medium	Good	Common basis
Clock	Good	Low	Excellent	Very common
FIFO	Poor-Fair	Very Low	Excellent	Backup option
LFU	Situation-dependent	Medium	Medium	Specific workloads
ARC	Excellent	High	Medium	ZFS, databases

The Clock Algorithm Sweet Spot

Clock (and its variants) dominate production systems because they achieve ~85-90% of perfect LRU performance with near-zero overhead. The hardware reference bit does the work that would otherwise require expensive software tracking. This is why most real operating systems use Clock variants rather than pure LRU.

Victim Selection in Practice

Real operating systems use sophisticated combinations of the principles we've discussed. Let's examine how major systems implement victim selection.

Linux Page Frame Reclamation Algorithm (PFRA):

Linux maintains two LRU-like lists per memory zone:

┌────────────────────────────────┐
│         Active List            │
│   Recently accessed pages      │
│   Protected from reclamation   │
└────────────┬───────────────────┘
             │ Demotion (aging)
             ▼
┌────────────────────────────────┐
│        Inactive List           │
│   Candidates for reclamation   │
│   Evicted from tail            │
└────────────────────────────────┘

Pages enter at active list.
Periodic scanning demotes old pages to inactive.
Eviction takes from inactive tail.
Accessed inactive pages promoted back to active.

Linux Selection Criteria (Simplified):

1. Scan inactive list for unreferenced pages
2. Prefer clean pages over dirty
3. Prefer file-backed over anonymous (tunable via swappiness)
4. Consider cgroup memory limits
5. Avoid recently dirtied pages (give time to coalesce writes)
6. Skip locked/pinned pages
7. Balance across memory zones

FreeBSD: Multi-Queue LRU

FreeBSD uses four page queues:

┌──────────────────────────────────────────────────┐
│ Active Queue      │   Recently referenced        │
├───────────────────┼──────────────────────────────┤
│ Inactive Queue    │   Not recently referenced    │
├───────────────────┼──────────────────────────────┤
│ Laundry Queue     │   Dirty, scheduled for write │
├───────────────────┼──────────────────────────────┤
│ Free Queue        │   Available for allocation   │
└───────────────────┴──────────────────────────────┘

Daemon (pagedaemon) moves pages through queues:
  Active → Inactive (aging)
  Inactive dirty → Laundry (write scheduling)
  Inactive clean → Free (available)
  Laundry after write → Free

Windows Memory Manager:

Windows uses Working Set management:

1. Each process has a Working Set (resident pages)
2. Working Set has min and max size
3. When memory pressure high:
   a. Trim Working Sets toward min
   b. Aged pages go to Standby List
   c. Modified pages go to Modified List
   
4. Standby List: Clean pages, still valid
            "Soft" faults if re-accessed
            
5. Modified List: Dirty pages queued for write
            Become Standby after written

6. Free List: Zeroed pages ready for allocation

simplified-victim-selection.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
/* Simplified Victim Selection (Clock-like) */
 
struct page_frame *select_victim(void) {
    struct page_frame *victim = NULL;
    int passes = 0;
    
    /* Circular scan through all frames */
    while (victim == NULL && passes < 2) {
        struct page_frame *pf = get_clock_hand();
        
        /* Skip unevictable pages */
        if (pf->flags & (PF_LOCKED | PF_KERNEL | PF_DMA)) {
            advance_clock_hand();
            continue;
        }
        
        /* First pass: look for unreferenced clean pages */
        if (passes == 0) {
            if (!pf->referenced && !pf->dirty) {
                victim = pf;
                break;
            }
        }
        
        /* Clear reference bit for second pass */
        if (pf->referenced) {
            pf->referenced = 0;
            advance_clock_hand();
            continue;
        }
        
        /* Second pass: accept any unreferenced page */
        if (passes == 1 && !pf->referenced) {
            victim = pf;
            break;
        }
        
        advance_clock_hand();
        
        /* Track passes through the clock */
        if (clock_hand_at_start())
            passes++;
    }
    
    return victim;  /* May be NULL if all pages locked */
}

Summary and Key Takeaways

We've explored victim selection—the heart of page replacement algorithms. Let's consolidate our understanding:

Key Takeaways

•OPT is theoretically optimal but requires future knowledge — It serves as an unreachable benchmark against which we measure practical algorithms.
•Practical algorithms approximate OPT using past behavior — The principle of locality allows us to infer future access patterns from recent history.
•Observable information drives selection decisions — Reference bits, dirty bits, timestamps, and page types all inform victim choice.
•Selection criteria form a priority hierarchy — Evictability, cost, recency, and page type are evaluated in order.
•Poor selection causes cascading failures — Evicting needed pages leads to immediate re-faults and potential thrashing.
•Bélády's anomaly shows FIFO's weakness — Only stack algorithms guarantee that more frames never increases faults.
•Implementation tradeoffs favor simpler algorithms — Clock variants dominate because they're nearly as good as LRU with far less overhead.

What's Next:

We've established what victim selection aims to achieve and how different approaches compare. The next page examines the dirty bit—a crucial piece of hardware support that dramatically affects the cost of evicting specific pages and influences which victims algorithms prefer.

Page Complete

You now understand victim selection at a deep level—from the theoretical optimal through practical approximations, and from simple heuristics to production implementations. This foundation prepares you for understanding the specific hardware and software mechanisms that enable efficient page replacement.

3 / 5

Loading learning content...

Operating SystemsPage Replacement Need

Page Replacement Need

LevelIntermediate

Duration60 mins

TopicPage Replacement Need

3 / 5

Victim Selection

The Million-Dollar Decision

What You Will Learn

The Optimal Algorithm (OPT / Bélády's Algorithm)

The OPT Rule:

Replace the page that will not be used for the longest time in the future.

This is provably optimal: any other choice could be improved by swapping it with the OPT choice without increasing faults.

Example:

Reference string: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1
Frames available: 3

Step-by-step OPT execution:

Ref: 7  Frames: [7, -, -]     Fault (cold start)
Ref: 0  Frames: [7, 0, -]     Fault (cold start)
Ref: 1  Frames: [7, 0, 1]     Fault (cold start)
Ref: 2  Frames: [2, 0, 1]     Fault: Replace 7 (used at position 17)
Ref: 0  Frames: [2, 0, 1]     Hit
Ref: 3  Frames: [2, 0, 3]     Fault: Replace 1 (used at position 13)
Ref: 0  Frames: [2, 0, 3]     Hit
Ref: 4  Frames: [2, 0, 4]     Fault: Replace 3 (used at position 9)
Ref: 2  Frames: [2, 0, 4]     Hit
Ref: 3  Frames: [3, 0, 4]     Fault: Replace 2 (used at position 12)...

... continuing this process yields 9 total faults for 20 references.

No algorithm can do better than OPT for this reference string with 3 frames.

The Impossibility Problem

Why OPT Matters Despite Being Impractical:

OPT establishes an upper bound on achievable performance. When evaluating real algorithms, we compare them to OPT:

Algorithm Quality = OPT Faults / Algorithm Faults × 100%

An algorithm achieving 80% of OPT quality (only 25% more faults) is reasonably good. An algorithm achieving only 40% (150% more faults) has significant room for improvement.

The Insight That Enabled Progress:

Since we can't know the future, we must infer it from the past. This is where the principle of locality becomes crucial:

Pages accessed in the recent past are likely to be accessed in the near future.

This principle—backed by extensive empirical observation—allows us to approximate OPT:

OPT asks: "Which page won't be used for the longest future time?"
We approximate: "Which page hasn't been used for the longest past time?"

This transforms an impossible question into a tractable one.

This approximation is the foundation of LRU (Least Recently Used) and its variants—the most successful family of page replacement algorithms.

Observable Information for Victim Selection

Practical algorithms can only use information that's actually observable. Understanding what's available helps us appreciate why different algorithms exist and what tradeoffs they make.

Category 1: Temporal Information

When was the page last used?

Information	Collection Method	Algorithm Using It
Load time	Record timestamp when page loaded	FIFO
Last access time	Update timestamp on every access	LRU
Last reference window	Reference bit + periodic clearing	Clock/Second-Chance

Category 2: Frequency Information

How often is the page used?

Information	Collection Method	Algorithm Using It
Access count	Increment counter on each access	LFU
Recent access count	Counter with decay over time	LRFU

Category 3: State Information

What is the current state of the page?

Information	Source	Why It Matters
Dirty (modified) bit	Hardware	Dirty pages require write before eviction
Reference bit	Hardware	Recently referenced pages likely needed
Page type	OS metadata	Different page types have different value
Sharing count	Reverse mapping	Shared pages affect multiple processes

Category 4: Process Context

What do we know about the owning process?

Information	Source	Why It Matters
Process priority	Scheduler	High-priority processes may need faster access
Process activity	Scheduler	Idle processes unlikely to need pages
Working set membership	Working set tracking	Pages in WS are actively needed
Memory allocation	Resource accounting	Over-allocating processes may be deprioritized

Information Quality vs Cost Tradeoff
Information Type	Quality of Prediction	Collection Cost	Example Algorithm
Load time only	Poor	Very Low	FIFO
Reference bit (1-bit)	Moderate	Low	Clock
Reference + Dirty bits	Good	Low	Enhanced Clock
Exact access timestamps	Excellent	High	LRU (ideal)
Access frequency + recency	Excellent	Very High	ARC, LRFU

The Cost-Benefit Principle

Selection Criteria Hierarchy

Victim selection typically involves multiple criteria applied in priority order. Understanding this hierarchy clarifies how algorithms combine different signals.

Priority 1: Evictability

Some pages simply cannot be evicted:

Unevictable Pages (Absolute Exclusions):
├── Kernel text and data pages
├── Locked pages (mlock, mlockall)
├── Pages undergoing DMA
├── Active file system buffers
└── Pages pinned by drivers

These must be skipped entirely.

Priority 2: Cost Reduction

Among evictable pages, prefer those cheaper to evict:

Cost Preference Order:
1. Clean pages (no disk write needed)
2. Dirty pages backed by swap (write to swap)
3. Dirty pages backed by file (write to file)
4. Pages requiring special handling

Rationale: Clean page eviction is instant.
           Dirty page eviction requires disk I/O.

Priority 3: Recency/Frequency Heuristics

•Not Recently Referenced — Pages with cleared reference bits haven't been accessed since last scan; safer to evict
•Not Recently Modified — Pages that remain clean are accessed read-only; eviction cost lower
•Oldest in Memory — Simple FIFO fallback when other signals tie
•Least Recently Used — Pages unused for longest time likely unneeded
•Least Frequently Used — Pages accessed rarely have lower value

Composite Selection Example: Enhanced Second-Chance

This algorithm combines reference and dirty bits into a 4-class priority system:

Class 0: Reference=0, Dirty=0  →  Best victim
         Not recently used, clean

Class 1: Reference=0, Dirty=1  →  Good victim
         Not recently used, but dirty

Class 2: Reference=1, Dirty=0  →  Poor victim
         Recently used, but clean

Class 3: Reference=1, Dirty=1  →  Worst victim
         Recently used AND dirty

Algorithm scans pages in circular order:

First pass: look for Class 0 page
If none found, clear reference bits and look for Class 1
If none found, look for Class 2
If none found, look for Class 3
Repeat with freshly cleared bits

The Multi-pass ensures we prefer clean, unreferenced pages.

Priority 4: Page Type Differentiation (Linux)

Linux distinguishes page types:

Page Type Priority (lower = evict first):
1. Clean file-backed pages (can reload from file)
2. Dirty file-backed pages (must write, but recoverable)
3. Anonymous pages (must swap, no backing file)
4. Mapped executable pages (may cause major faults)

File pages can be evicted without swap; anonymous pages require swap space.

Linux's vm.swappiness

The Cost of Poor Selection

What happens when victim selection goes wrong? Understanding the consequences helps appreciate why algorithm choice matters.

Scenario: Immediate Re-fault

The worst outcome is evicting a page that's immediately needed:

Time T+0: Page fault on page A
          Select page B as victim
          Evict B, load A
          Cost: 1 disk read (+ 1 write if dirty)

Time T+1: Instruction needs page B
          Page fault on page B
          Select page C as victim
          Evict C, load B
          Cost: 1 disk read (+ 1 write if dirty)

Time T+2: Instruction needs page C
          ... cascade continues ...

This is thrashing: constant page faults, no useful work.

Quantifying the Impact:

Let's compare good vs poor victim selection:

Good Selection (pages not needed soon):
  - Evict page X at T+0
  - Page X re-faulted at T+1000
  - 1000 useful memory accesses between faults
  - Effective overhead: 1 fault / 1000 accesses = 0.1%

Poor Selection (pages needed immediately):
  - Evict page X at T+0
  - Page X re-faulted at T+10
  - 10 useful memory accesses between faults
  - Effective overhead: 1 fault / 10 accesses = 10%

The poor selection is 100x worse!

Selection Quality Impact on System Performance
Selection Quality	Avg Time Between Re-faults	Page Fault Rate	System Behavior
Excellent (near OPT)	10,000 references	< 0.01%	Near-native performance
Good	1,000 - 10,000 references	0.01-0.1%	Slightly elevated latency
Acceptable	100 - 1,000 references	0.1-1%	Noticeable slowdowns
Poor	10 - 100 references	1-10%	Significant degradation
Pathological	< 10 references	10%	Thrashing, unusable

Real-World Example: FIFO Under Scan Workload

Consider a sequential scan through data larger than memory:

Scenario:
  - 1 GB file, 4 KB pages = 262,144 pages
  - 512 MB memory = 131,072 frames
  - Sequential read from start to end

With FIFO:
  - Pages loaded in order: 0, 1, 2, ... 131,071
  - Memory full, start evicting oldest
  - Evict page 0, load page 131,072
  - Evict page 1, load page 131,073
  - ... continue for entire file ...

  Result: Every single page faults once.
  Total faults: 262,144 (100% fault rate)

  This is actually acceptable for sequential access!

The Disaster Case: Looping Scan

Scenario:
  - Same 1 GB file
  - Program loops: read entire file, repeat

With FIFO after first pass:
  - Memory contains pages 131,072 - 262,143
  - Loop restarts, needs page 0
  - Page 0 was evicted! Fault.
  - Evict 131,072 to load 0
  - Next access: page 1 was evicted! Fault.
  - Evict 131,073 to load 1
  - ... every single access faults ...

With Perfect LRU:
  - Memory contains pages 131,072 - 262,143
  - Loop restarts, needs page 0
  - Evict page 131,072 (least recently used)
  - Access page 1... also faulted
  - Still bad, but the access pattern defeats all algorithms

This demonstrates: no algorithm can help when working set > memory.

The Impossibility Bound

Bélády's Anomaly

One might expect that adding more frames to a system would always reduce page faults—more memory means more pages can stay resident. Surprisingly, this intuition is wrong for some algorithms.

Bélády's Anomaly is the counterintuitive phenomenon where increasing the number of frames can increase the number of page faults for certain algorithms and reference strings.

Classic Example with FIFO:

Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5

With 3 frames:
┌─────┬───────────────┬───────────────┐
│ Ref │    Frames     │    Event      │
├─────┼───────────────┼───────────────┤
│  1  │ [1, -, -]     │ Fault         │
│  2  │ [1, 2, -]     │ Fault         │
│  3  │ [1, 2, 3]     │ Fault         │
│  4  │ [4, 2, 3]     │ Fault (evict 1) │
│  1  │ [4, 1, 3]     │ Fault (evict 2) │
│  2  │ [4, 1, 2]     │ Fault (evict 3) │
│  5  │ [5, 1, 2]     │ Fault (evict 4) │
│  1  │ [5, 1, 2]     │ Hit           │
│  2  │ [5, 1, 2]     │ Hit           │
│  3  │ [5, 3, 2]     │ Fault (evict 1) │
│  4  │ [5, 3, 4]     │ Fault (evict 2) │
│  5  │ [5, 3, 4]     │ Hit           │
└─────┴───────────────┴───────────────┘
Total faults with 3 frames: 9

With 4 frames:
┌─────┬─────────────────┬───────────────┐
│ Ref │     Frames      │    Event      │
├─────┼─────────────────┼───────────────┤
│  1  │ [1, -, -, -]    │ Fault         │
│  2  │ [1, 2, -, -]    │ Fault         │
│  3  │ [1, 2, 3, -]    │ Fault         │
│  4  │ [1, 2, 3, 4]    │ Fault         │
│  1  │ [1, 2, 3, 4]    │ Hit           │
│  2  │ [1, 2, 3, 4]    │ Hit           │
│  5  │ [5, 2, 3, 4]    │ Fault (evict 1) │
│  1  │ [5, 1, 3, 4]    │ Fault (evict 2) │
│  2  │ [5, 1, 2, 4]    │ Fault (evict 3) │
│  3  │ [5, 1, 2, 3]    │ Fault (evict 4) │
│  4  │ [4, 1, 2, 3]    │ Fault (evict 5) │
│  5  │ [4, 5, 2, 3]    │ Fault (evict 1) │
└─────┴─────────────────┴───────────────┘
Total faults with 4 frames: 10

10 > 9: Adding a frame INCREASED faults!

Why Does This Happen?

Stack Algorithms: Immune to Bélády's Anomaly

A class of algorithms called stack algorithms are provably immune to Bélády's anomaly. These algorithms maintain the property:

The set of pages in memory with n frames is always a subset of pages in memory with n+1 frames.

Formal Definition:

An algorithm is a stack algorithm if, at every point in time:

S(t, n) ⊆ S(t, n+1) for all n

Where S(t, n) = set of pages in memory at time t with n frames

Examples:

OPT is a stack algorithm
LRU is a stack algorithm
FIFO is NOT a stack algorithm

The Stack Property Guarantees:

For stack algorithms:
  Faults(n frames) ≥ Faults(n+1 frames)

More frames never hurts.

Why LRU Has the Stack Property:

LRU: Pages in memory with 3 frames at time t ⊆ Pages with 4 frames at time t

This is always true because both are "most recent k pages."

Practical Implication:

When choosing between algorithms:

Stack algorithms (LRU, OPT): Safe, predictable scaling
Non-stack algorithms (FIFO): May exhibit anomalous behavior

Implementation Tradeoffs

Each victim selection strategy involves tradeoffs between multiple dimensions. Understanding these helps explain why simple algorithms often win in practice.

Dimension 1: Selection Accuracy vs Overhead

More Accurate Selection:
  + Fewer page faults
  + Better system throughput
  - Higher per-access bookkeeping
  - More complex data structures
  - Longer critical sections (potential contention)

Less Accurate Selection:
  + Simpler implementation
  + Lower per-access overhead  
  + Faster victim identification
  - More page faults
  - Potentially pathological cases

Example: Perfect LRU vs Clock

Aspect	Perfect LRU	Clock
Selection quality	Optimal among history-based	~80-90% of LRU
Per-access cost	Update linked list or counter	None (hardware reference bit)
Selection cost	O(1) with proper data structure	O(n) worst case scan
Memory overhead	8+ bytes per page	1 bit per page
Scalability	Limited by list operations	Excellent

Dimension 2: Memory Overhead

Per-Page Metadata Requirements:

FIFO:    ~8 bytes (position in queue)
LRU:     ~16 bytes (list pointers + timestamp)
LFU:     ~12 bytes (pointer + counter)
Clock:   ~0 bytes (uses hardware reference bit)
ARC:     ~32 bytes (dual lists + ghost entries)

With 16 GB RAM and 4 KB pages:
  Total pages: 4 million
  
  FIFO:  32 MB metadata
  LRU:   64 MB metadata  
  LFU:   48 MB metadata
  Clock: 0 MB additional
  ARC:   128 MB metadata

Dimension 3: Scalability with CPUs

Multiprocessor systems face contention on shared data structures:

Global LRU list (naive):
  Every memory access updates list position
  List protected by lock
  Lock becomes severe bottleneck
  
Per-CPU or partitioned structures:
  Each CPU has local list
  Global coordination only on eviction
  Much better scalability
  Slightly less accurate

Algorithm Comparison Summary
Algorithm	Accuracy	Implementation Complexity	Scalability	Real-World Use
OPT	Perfect	Impossible online	N/A	Benchmark only
LRU (exact)	Excellent	High	Poor	Rarely used
LRU (approx)	Very Good	Medium	Good	Common basis
Clock	Good	Low	Excellent	Very common
FIFO	Poor-Fair	Very Low	Excellent	Backup option
LFU	Situation-dependent	Medium	Medium	Specific workloads
ARC	Excellent	High	Medium	ZFS, databases

The Clock Algorithm Sweet Spot

Victim Selection in Practice

Real operating systems use sophisticated combinations of the principles we've discussed. Let's examine how major systems implement victim selection.

Linux Page Frame Reclamation Algorithm (PFRA):

Linux maintains two LRU-like lists per memory zone:

┌────────────────────────────────┐
│         Active List            │
│   Recently accessed pages      │
│   Protected from reclamation   │
└────────────┬───────────────────┘
             │ Demotion (aging)
             ▼
┌────────────────────────────────┐
│        Inactive List           │
│   Candidates for reclamation   │
│   Evicted from tail            │
└────────────────────────────────┘

Pages enter at active list.
Periodic scanning demotes old pages to inactive.
Eviction takes from inactive tail.
Accessed inactive pages promoted back to active.

Linux Selection Criteria (Simplified):

1. Scan inactive list for unreferenced pages
2. Prefer clean pages over dirty
3. Prefer file-backed over anonymous (tunable via swappiness)
4. Consider cgroup memory limits
5. Avoid recently dirtied pages (give time to coalesce writes)
6. Skip locked/pinned pages
7. Balance across memory zones

FreeBSD: Multi-Queue LRU

FreeBSD uses four page queues:

┌──────────────────────────────────────────────────┐
│ Active Queue      │   Recently referenced        │
├───────────────────┼──────────────────────────────┤
│ Inactive Queue    │   Not recently referenced    │
├───────────────────┼──────────────────────────────┤
│ Laundry Queue     │   Dirty, scheduled for write │
├───────────────────┼──────────────────────────────┤
│ Free Queue        │   Available for allocation   │
└───────────────────┴──────────────────────────────┘

Daemon (pagedaemon) moves pages through queues:
  Active → Inactive (aging)
  Inactive dirty → Laundry (write scheduling)
  Inactive clean → Free (available)
  Laundry after write → Free

Windows Memory Manager:

Windows uses Working Set management:

1. Each process has a Working Set (resident pages)
2. Working Set has min and max size
3. When memory pressure high:
   a. Trim Working Sets toward min
   b. Aged pages go to Standby List
   c. Modified pages go to Modified List
   
4. Standby List: Clean pages, still valid
            "Soft" faults if re-accessed
            
5. Modified List: Dirty pages queued for write
            Become Standby after written

6. Free List: Zeroed pages ready for allocation

simplified-victim-selection.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
/* Simplified Victim Selection (Clock-like) */
 
struct page_frame *select_victim(void) {
    struct page_frame *victim = NULL;
    int passes = 0;
    
    /* Circular scan through all frames */
    while (victim == NULL && passes < 2) {
        struct page_frame *pf = get_clock_hand();
        
        /* Skip unevictable pages */
        if (pf->flags & (PF_LOCKED | PF_KERNEL | PF_DMA)) {
            advance_clock_hand();
            continue;
        }
        
        /* First pass: look for unreferenced clean pages */
        if (passes == 0) {
            if (!pf->referenced && !pf->dirty) {
                victim = pf;
                break;
            }
        }
        
        /* Clear reference bit for second pass */
        if (pf->referenced) {
            pf->referenced = 0;
            advance_clock_hand();
            continue;
        }
        
        /* Second pass: accept any unreferenced page */
        if (passes == 1 && !pf->referenced) {
            victim = pf;
            break;
        }
        
        advance_clock_hand();
        
        /* Track passes through the clock */
        if (clock_hand_at_start())
            passes++;
    }
    
    return victim;  /* May be NULL if all pages locked */
}

Summary and Key Takeaways

We've explored victim selection—the heart of page replacement algorithms. Let's consolidate our understanding:

Key Takeaways

•OPT is theoretically optimal but requires future knowledge — It serves as an unreachable benchmark against which we measure practical algorithms.
•Practical algorithms approximate OPT using past behavior — The principle of locality allows us to infer future access patterns from recent history.
•Observable information drives selection decisions — Reference bits, dirty bits, timestamps, and page types all inform victim choice.
•Selection criteria form a priority hierarchy — Evictability, cost, recency, and page type are evaluated in order.
•Poor selection causes cascading failures — Evicting needed pages leads to immediate re-faults and potential thrashing.
•Bélády's anomaly shows FIFO's weakness — Only stack algorithms guarantee that more frames never increases faults.
•Implementation tradeoffs favor simpler algorithms — Clock variants dominate because they're nearly as good as LRU with far less overhead.

What's Next:

Page Complete

3 / 5