Operating SystemsPage Replacement Need

Page Replacement Need

LevelIntermediate

Duration60 mins

TopicPage Replacement Need

2 / 5

Page Replacement Concept

The Art of Making Room

When physical memory is exhausted and a process needs a page that isn't in memory, the operating system faces a classical resource allocation problem: something must yield for something else to be served.

This isn't merely a technical challenge—it's a problem of profound consequences. The page we choose to evict will determine whether the system runs smoothly or descends into a cascade of additional page faults. Choose well, and the evicted page won't be needed for a long time. Choose poorly, and we'll immediately fault it back in, having wasted two disk I/O operations for nothing.

Page replacement is where virtual memory theory meets operational reality. This page explores the fundamental concept, mechanics, and critical decisions involved in this pivotal operating system operation.

What You Will Learn

By the end of this page, you will understand: (1) The formal definition of page replacement and its role in virtual memory, (2) The complete sequence of events during a page replacement operation, (3) The key decisions the OS must make and their performance implications, (4) The difference between local and global replacement scopes, and (5) The metrics by which we evaluate replacement quality.

Defining Page Replacement

Page replacement is the process by which an operating system selects a page currently resident in physical memory, removes it, and uses the freed frame to satisfy a new page request.

Formal Definition:

Given:

A set F of physical frames, all currently allocated
A page fault for virtual page p not in memory
Page p must be loaded from backing store

Page Replacement is the operation of:

Selecting a victim page v currently occupying frame f
Removing the mapping of v to f
(Optionally) Saving v to backing store if modified
Loading p into frame f
Establishing mapping of p to f

The core challenge is step 1: victim selection. This is where page replacement algorithms differ and where performance is won or lost.

Why "Replacement" Instead of "Allocation"?

The term "replacement" emphasizes that we're not finding new resources—we're reallocating existing ones. Unlike initial page allocation where we simply use a free frame, replacement forces us to dispossess one page to accommodate another. This zero-sum nature is fundamental to understanding why the choice matters so much.

Replacement vs Eviction

The terms 'page replacement' and 'page eviction' are often used interchangeably, but there's a subtle distinction. Eviction refers specifically to removing a page from memory, while replacement encompasses the complete process of evicting one page to make room for another. Eviction can also occur proactively (to maintain free frame reserves) without an immediate replacement need.

The Page Replacement Problem:

Formally stated, the page replacement problem asks:

Given a sequence of page references R = r₁, r₂, r₃, ..., rₙ and a fixed number of frames k, determine which pages to keep in memory at each step to minimize the total number of page faults.

This is a classic online algorithm problem—we must make decisions without knowing future references. The theoretically optimal solution (Bélády's OPT) requires future knowledge, making it impossible to implement but useful as a benchmark.

Dimensions of the Problem:

Dimension	Question	Options
When to replace	Proactively or on demand?	On-fault, anticipatory, background
What to replace	Which page to evict?	Various algorithms (FIFO, LRU, Clock, etc.)
Where to replace	From which process?	Local, global, or partitioned
How to replace	What operations are required?	Clean vs dirty handling

The Replacement Process

Understanding page replacement requires tracing through the complete sequence of events, from the initial page fault to the resumption of the faulting process.

Phase 1: Page Fault Detection

The process begins when a memory access cannot be satisfied from physical memory:

1. CPU executes instruction accessing virtual address VA
2. MMU consults page table entry for VA's page
3. Valid bit is 0 → page not in memory
4. MMU raises page fault exception
5. CPU saves state and transfers to kernel page fault handler

Phase 2: Page Identification

The kernel must determine what page is needed and where it resides:

1. Extract faulting virtual address from CPU register
2. Identify faulting process from current context
3. Look up page in process's virtual memory map
4. Determine page source:
   - Anonymous page → allocate zeroed or from swap
   - File-backed page → identify file and offset
   - Private mapping → may be copy-on-write

Phase 3: Frame Acquisition

The kernel attempts to obtain a free frame:

1. Check free frame list
2. If frames available → skip to Phase 5
3. If no free frames → proceed to Phase 4 (replacement)

Phase 4: Victim Selection and Eviction

This is the heart of page replacement:

1. Invoke page replacement algorithm to select victim
2. Lock victim frame to prevent concurrent modification
3. Check if victim page is modified (dirty bit)
4. If dirty:
   a. Determine write-back location (swap or file)
   b. Initiate disk write
   c. Wait for I/O completion
   d. Clear dirty bit
5. Update victim's page table entry:
   - Clear valid bit (page no longer in memory)
   - Store disk location for future reference
6. Remove frame from any caching structures
7. Frame is now logically free

Phase 5: Page Load

1. Initiate disk read to load new page into frame
2. Wait for I/O completion
3. Page contents now in frame

Phase 6: Mapping Establishment

1. Update faulting process's page table entry:
   - Set frame number field
   - Set valid bit = 1
   - Clear reference bit
   - Clear dirty bit
   - Set protection bits as appropriate
2. Flush TLB entry for this page (if cached)

Phase 7: Process Resumption

1. Return from page fault handler
2. CPU restores saved state
3. Original instruction is re-executed
4. This time, page table lookup succeeds
5. Memory access completes normally

The Critical Path

The time-consuming steps are disk I/O operations. A clean page eviction requires only one disk read (for the new page). A dirty page eviction requires both a disk write and a disk read—potentially doubling the page fault latency. This is why many systems prefer evicting clean pages when possible.

The Anatomy of a Page Fault with Replacement

Let's trace through a concrete example of a page fault requiring replacement. This detailed walkthrough illuminates every data structure and operation involved.

Scenario Setup:

System has 4 physical frames (numbered 0-3)
All frames are currently occupied
Process A faults on virtual page 7
Page 7 is in swap at block 42

Initial State:

Physical Memory (4 frames):
┌─────────┬─────────────┬─────────┬───────┬───────────┐
│ Frame # │ Contents    │ Process │ VPage │ Dirty Bit │
├─────────┼─────────────┼─────────┼───────┼───────────┤
│    0    │ Code page   │ A       │   2   │     0     │
│    1    │ Stack page  │ B       │  15   │     1     │
│    2    │ Heap data   │ A       │   5   │     1     │
│    3    │ Shared lib  │ Shared  │  N/A  │     0     │
└─────────┴─────────────┴─────────┴───────┴───────────┘

Process A's Page Table:
┌─────────┬───────┬───────┬─────────┐
│ VPage # │ Valid │ Frame │ Dirty   │
├─────────┼───────┼───────┼─────────┤
│    2    │   1   │   0   │    0    │
│    5    │   1   │   2   │    1    │
│    7    │   0   │  N/A  │   N/A   │ ← Page to load
└─────────┴───────┴───────┴─────────┘

Free Frame List: [empty]

Step-by-Step Execution:

Step 1: Page Fault Occurs

Process A executes: MOV AX, [page7_address]
MMU checks page table: VPage 7, Valid = 0
Page fault trap to kernel

Step 2: Kernel Analyzes Fault

Fault address indicates VPage 7
Look up in Process A's virtual memory structures
Find: anonymous page, currently in swap at block 42

Step 3: Check Free Frames

Free frame list is empty
Must invoke page replacement

Step 4: Select Victim (Example: Choose Frame 0)

Replacement algorithm selects Frame 0 (Process A, VPage 2)
Note: Frame 0 is clean (dirty bit = 0)

Step 5: Evict Victim

Frame 0 contains Process A's VPage 2
Dirty bit = 0 → no disk write needed
Update Process A's page table: VPage 2 valid = 0
Frame 0 is now available

Step 6: Load New Page

Initiate disk read from swap block 42
Block for I/O completion (~10ms HDD, ~50μs SSD)
VPage 7 contents now in Frame 0

Step 7: Update Page Table

Process A's page table: VPage 7 → Frame 0, valid = 1

Step 8: Resume Process

Return from page fault handler
Instruction re-executes successfully

Final State:

Physical Memory (4 frames):
┌─────────┬─────────────┬─────────┬───────┬───────────┐
│ Frame # │ Contents    │ Process │ VPage │ Dirty Bit │
├─────────┼─────────────┼─────────┼───────┼───────────┤
│    0    │ New page 7  │ A       │   7   │     0     │ ← Changed
│    1    │ Stack page  │ B       │  15   │     1     │
│    2    │ Heap data   │ A       │   5   │     1     │
│    3    │ Shared lib  │ Shared  │  N/A  │     0     │
└─────────┴─────────────┴─────────┴───────┴───────────┘

Process A's Page Table:
┌─────────┬───────┬───────┬─────────┐
│ VPage # │ Valid │ Frame │ Dirty   │
├─────────┼───────┼───────┼─────────┤
│    2    │   0   │  N/A  │   N/A   │ ← Now invalid (evicted)
│    5    │   1   │   2   │    1    │
│    7    │   1   │   0   │    0    │ ← Now valid
└─────────┴───────┴───────┴─────────┘

Outcome Analysis:

Selected clean victim → only 1 disk I/O operation
Had we selected Frame 2 (dirty) → 2 disk I/O operations
Good choice saved ~10-50ms of page fault latency

What If We Immediately Need Page 2?

If Process A immediately accesses VPage 2 after this fault, we'll fault again—loading page 2 back into memory. If this cascades, we're in a pathological state. This is why victim selection algorithms try to evict pages that won't be needed soon. The ideal victim is the page whose next access is furthest in the future.

Key Decisions in Replacement

Page replacement involves several critical decisions that significantly impact system performance. Understanding these decisions is essential for appreciating why different algorithms exist and how operating systems are configured.

Decision 1: When to Replace?

There are two fundamental approaches:

On-Demand Replacement:

Replace only when a page fault finds no free frames
Simpler implementation
May cause latency spikes when frames depleted

Anticipatory Replacement (Background Reclamation):

Maintain a reserve of free frames
Background daemon (e.g., Linux's kswapd) proactively evicts pages
Page faults usually find free frames immediately
Trades background CPU/I/O for foreground latency

Most modern systems use anticipatory replacement with on-demand as fallback.

Replacement Timing Strategies
Strategy	When Replacement Occurs	Advantages	Disadvantages
Pure On-Demand	Only during page fault when no free frames	Simple, no wasted evictions	Variable fault latency, potential blocking
Low Watermark	When free frames drop below threshold	Maintains reserve, consistent latency	May evict pages still needed
Continuous	Constantly maintain target free percentage	Very consistent latency	Continuous overhead, may over-evict
Adaptive	Based on memory pressure and workload	Best of all worlds	Complex to tune correctly

Decision 2: Replacement Scope—Local vs Global

This decision determines the pool of pages eligible for eviction:

Local Replacement:

Only evict pages from the faulting process
Each process has a fixed frame allocation
Processes cannot steal frames from each other
Provides isolation but may underutilize memory

Global Replacement:

Any page in memory is a candidate for eviction
Frames flow to processes that need them
Better overall utilization
One process can impact another's performance

Partitioned Schemes:

Hybrid approaches dividing memory into pools
Critical processes get dedicated pools
Background processes share a common pool

Local Replacement:
┌──────────────────────────────────────────────┐
│ Process A Pool │ Process B Pool │ Kernel Pool │
│   [5 frames]   │   [5 frames]   │ [6 frames]  │
│    Isolated    │    Isolated    │  Protected  │
└──────────────────────────────────────────────┘
A can only evict from its own 5 frames.

Global Replacement:
┌──────────────────────────────────────────────┐
│          Shared Replaceable Pool             │
│   [All user frames - 10 total]               │
│   Any page can be evicted for any process    │
└──────────────────────────────────────────────┘
A can cause B's page to be evicted.

Decision 3: What to Replace?

This is where page replacement algorithms come in. The goal is to evict the page least likely to be needed soon. Common strategies:

Algorithm	Criterion	Approximates
FIFO	Oldest page in memory	Arrival time
OPT	Page used farthest in future	Perfect knowledge
LRU	Least recently accessed	Recent past
Clock	Not recently referenced	LRU approximation
LFU	Least frequently accessed	Usage frequency

Each algorithm trades accuracy for implementation cost. We'll explore these in detail in subsequent modules.

Decision 4: How to Handle Dirty Pages?

When a dirty page is selected for eviction:

Synchronous Write-Back:

Write page to disk before declaring frame free
Simple but adds latency to page fault

Asynchronous Write-Back:

Write page in background before it becomes victim
When evicted, page is already clean
Reduces page fault latency
Requires tracking and scheduling infrastructure

Modern systems use page cleaning daemons (like Linux's pdflush or flush workers) to write dirty pages asynchronously, so when replacement is needed, more pages are clean.

The Linux Approach

Linux uses global replacement with per-cgroup limits, a Clock-based algorithm (PFRA), watermark-triggered background reclamation (kswapd), and asynchronous dirty page writeback (flush workers). This combination balances efficiency, fairness, and responsiveness for diverse workloads.

Evaluating Replacement Performance

How do we measure whether a page replacement algorithm is working well? Several metrics capture different aspects of performance.

Primary Metric: Page Fault Rate

The most fundamental measure is the page fault rate—the fraction of memory accesses that result in page faults:

Page Fault Rate = Number of Page Faults / Total Memory References

For a reference string of length n with f faults:

PFR = f / n

Lower is better. Typical values:

Excellent: < 0.001 (1 fault per 1000 accesses)
Good: 0.001 - 0.01
Acceptable: 0.01 - 0.1
Poor: > 0.1 (approaching thrashing)

Page Fault Rate Example:

Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 With 3 frames:

FIFO: 9 faults → PFR = 9/12 = 0.75
LRU: 10 faults → PFR = 10/12 = 0.83
OPT: 7 faults → PFR = 7/12 = 0.58

Key Performance Metrics

•Page Fault Rate (PFR) — Fraction of accesses causing faults; primary efficiency measure
•Effective Access Time (EAT) — Average time to access memory including fault probability; EAT = (1-p)×m + p×f where m=memory access time, f=fault time
•Dirty Page Eviction Ratio — Fraction of evicted pages that were dirty; higher means more I/O
•Fault Latency — Time to handle a single page fault; affected by dirty evictions and I/O speed
•Thrashing Detection — When working set > available frames; PFR spikes dramatically
•Frame Utilization — Percentage of allocated frames actually contributing to working sets

Effective Access Time Calculation:

EAT combines normal memory access time with the cost of page faults:

EAT = (1 - p) × memory_access_time + p × page_fault_time

Where:
  p = probability of page fault (page fault rate)
  memory_access_time ≈ 100 nanoseconds
  page_fault_time ≈ 10 milliseconds (HDD) or 100 microseconds (SSD)

Example Calculation (HDD):

p = 0.001 (1 fault per 1000 accesses)
memory_access = 100 ns
fault_time = 10 ms = 10,000,000 ns

EAT = (0.999)(100) + (0.001)(10,000,000)
    = 99.9 + 10,000
    = 10,099.9 ns ≈ 10 μs

Even with a 0.1% fault rate, access time increases by 100x!

Example Calculation (SSD):

p = 0.001
memory_access = 100 ns
fault_time = 100 μs = 100,000 ns

EAT = (0.999)(100) + (0.001)(100,000)
    = 99.9 + 100
    = 199.9 ns ≈ 200 ns

SSD reduces the impact to a 2x slowdown—demonstrating why SSDs dramatically improve system responsiveness under memory pressure.

The Brutal Asymmetry

Memory access is ~100 nanoseconds; disk access is ~10 milliseconds on HDD. That's a 100,000× difference. A page fault rate of 0.01% still causes noticeable slowdown. A rate of 1% makes the system feel sluggish. This extreme asymmetry is why page replacement algorithm quality matters so much—even small improvements in fault rate yield significant user-perceived performance gains.

The Replacement Algorithm Interface

Page replacement algorithms implement a specific interface that the memory management subsystem invokes. Understanding this interface clarifies what information algorithms receive and what decisions they must make.

The Conceptual Interface:

/**
 * Page Replacement Algorithm Interface
 * 
 * The memory manager calls these functions at specific points
 * in the page lifecycle.
 */

// Called when a page fault occurs and no free frames exist
// Returns: the frame to evict (the victim)
frame_t select_victim(void);

// Called when a page is first loaded into a frame
void page_loaded(frame_t frame, page_t page);

// Called when a page is accessed (may be used for LRU tracking)
void page_accessed(frame_t frame);

// Called when a page is modified (write reference)
void page_modified(frame_t frame);

// Called when a page is explicitly freed (process exits, munmap)
void page_freed(frame_t frame);

Information Available to Algorithms:

Algorithms have access to various metadata to make decisions:

Algorithm Decision Information
Information	Source	Used By
Page load time	Timestamp at page_loaded()	FIFO
Last access time	Updated on page_accessed()	LRU
Access frequency	Counter incremented on access	LFU
Reference bit	Hardware sets on any access	Clock, Second-Chance
Dirty/modify bit	Hardware sets on write	Enhanced Clock, All algorithms prefer clean
Page type	Anonymous, file-backed, shared	Linux PFRA
Working set info	Per-process access history	Working Set Algorithm

Hardware Support Requirements:

Efficient page replacement requires specific hardware support:

Reference Bit:

MMU sets to 1 when page is accessed
OS can clear it periodically
Essential for Clock and LRU approximations

Dirty (Modify) Bit:

MMU sets to 1 when page is written
Indicates page must be saved before eviction
Critical for avoiding unnecessary I/O

Both bits are per-page table entry:

Page Table Entry (simplified):
┌─────────────────────────────────────────────────────────┐
│ Frame # │ Valid │ Reference │ Dirty │ Protection │ ... │
│  20 bit │ 1 bit │   1 bit   │ 1 bit │   3 bit    │     │
└─────────────────────────────────────────────────────────┘

Software-Only Alternatives:

Some architectures lack hardware reference bits. Software simulation:

Mark page as invalid in page table
Any access causes page fault
Fault handler sets software reference bit, marks page valid
Significant overhead but allows LRU-like behavior

This technique was used extensively on older systems and remains available as fallback.

replacement-interface.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/* Example Page Replacement Algorithm Framework */
 
struct page_frame {
    unsigned long frame_number;
    struct page *page;           /* Virtual page mapped here */
    unsigned int flags;          /* State flags */
    unsigned long load_time;     /* When page was loaded (FIFO) */
    unsigned long last_access;   /* Last access time (LRU) */
    unsigned int access_count;   /* Access frequency (LFU) */
    struct list_head lru_list;   /* Position in replacement queue */
};
 
/* Replacement algorithm operations */
struct replacement_ops {
    /* Select a victim frame for eviction */
    struct page_frame *(*select_victim)(void);
    
    /* Called when page loaded into frame */
    void (*page_loaded)(struct page_frame *pf);
    
    /* Called on page access (if tracked) */
    void (*page_accessed)(struct page_frame *pf);
    
    /* Called when page modified */
    void (*page_modified)(struct page_frame *pf);
    
    /* Initialize algorithm state */
    int (*init)(void);
    
    /* Cleanup algorithm state */
    void (*cleanup)(void);
};
 
/* Current active algorithm */
static struct replacement_ops *current_algorithm;
 
/* Victim selection wrapper */
struct page_frame *find_victim(void) {
    ASSERT(current_algorithm != NULL);
    return current_algorithm->select_victim();
}

Common Challenges and Edge Cases

Page replacement seems conceptually simple—pick a victim and swap pages. In practice, numerous edge cases and challenges complicate implementation.

Challenge 1: Pinned/Locked Pages

Some pages cannot be evicted:

Non-evictable pages:
- Kernel code and data pages
- Pages undergoing DMA transfers
- Pages locked by mlock() system call
- Real-time application pages
- File system journal pages

Algorithms must skip these when scanning for victims.

Challenge 2: Shared Pages

Pages shared between processes have multiple PTE references:

Frame 42 is mapped by:
- Process A, VPage 10 (read-only)
- Process B, VPage 7 (read-only)  
- Process C, VPage 12 (read-only)

Eviction must:
1. Update ALL three page table entries
2. Flush TLB entries for ALL processes
3. Track reverse mappings from frames to PTEs

Challenge 3: Concurrent Page Faults

Multiple CPUs may fault simultaneously:

CPU 0: Faults on page P1, selects frame F
CPU 1: Faults on page P2, also selects frame F

Without synchronization:
- Both load pages into F
- One clobbers the other
- Silent corruption

Solution: Frame-level locks, atomic victim selection

Implementation Pitfalls

•TLB Inconsistency — After updating a page table, stale TLB entries may remain. Must flush TLB on affected CPUs or use TLB shootdown for SMP systems.
•Swap Space Exhaustion — If swap is full, dirty pages cannot be evicted. System may need to trigger OOM killer.
•Priority Inversion — Low-priority process holds a page that high-priority needs. Standard replacement ignores process priority.
•Livelock — All scannable pages are locked or recently referenced. Algorithm loops indefinitely without finding victim.
•Cascading Evictions — Evicting page A causes process B to fault on A, which evicts page C, causing process D to fault... Chain reactions degrade performance.
•Write-Amplification — Repeatedly dirtying and evicting the same pages causes excessive I/O wear, especially damaging to SSDs.

Challenge 4: NUMA Considerations

On Non-Uniform Memory Access systems, not all frames are equal:

NUMA Node 0 (CPU 0-3):
  - Local memory: 16 GB, access time: 100ns
  - Remote memory (Node 1): access time: 300ns

NUMA Node 1 (CPU 4-7):
  - Local memory: 16 GB, access time: 100ns
  - Remote memory (Node 0): access time: 300ns

Replacement implications:
- Prefer evicting remote pages (less costly to re-fault)
- Try to keep process pages on local node
- Balance memory pressure across nodes

Challenge 5: Large Pages (Huge Pages)

2 MB or 1 GB pages complicate replacement:

2 MB huge page = 512 standard 4 KB pages

Evicting one huge page:
- Frees 512 frames at once (good)
- Internal fragmentation if only part needed
- May evict actively-used sub-regions
- Shattering: break huge page into small pages first

Modern systems support transparent huge page (THP) with dynamic promotion/demotion, adding complexity to replacement decisions.

Production Reality

Real page replacement implementations handle dozens of edge cases not covered in textbooks. Linux's PFRA (Page Frame Reclamation Algorithm) contains thousands of lines managing referenced lists, active/inactive tracking, memory cgroups, zone balancing, compaction, and more. The core concept is simple; robust implementation is deeply complex.

Summary and Key Takeaways

We've established a comprehensive understanding of page replacement as a concept. Let's consolidate the key points:

Key Takeaways

•Page replacement evicts a resident page to load a needed page — It's the mechanism that allows virtual memory to exceed physical memory.
•The core challenge is victim selection — Which page to evict determines whether we cause additional faults.
•The process involves multiple phases — Fault detection, victim selection, potential write-back, page load, and mapping establishment.
•Critical decisions include when, what, where, and how — Timing, algorithm, scope, and dirty-page handling all affect performance.
•Page fault rate is the primary metric — Even small fault rates cause significant slowdown due to the memory/disk speed asymmetry.
•Real implementations handle many edge cases — Pinned pages, shared pages, concurrency, NUMA, and huge pages all require special handling.

What's Next:

With a solid understanding of the page replacement concept, we now turn to the critical question: how do we choose which page to evict? The next page explores victim selection—the heart of every page replacement algorithm and the decision that most directly impacts system performance.

Page Complete

You now understand page replacement at a conceptual level—what it is, why it's necessary, and the key decisions involved. This foundation prepares you for the algorithmic details to come, where we'll see how different victim selection strategies produce dramatically different performance characteristics.

2 / 5

Loading learning content...

Operating SystemsPage Replacement Need

Page Replacement Need

LevelIntermediate

Duration60 mins

TopicPage Replacement Need

2 / 5

Page Replacement Concept

The Art of Making Room

What You Will Learn

Defining Page Replacement

Page replacement is the process by which an operating system selects a page currently resident in physical memory, removes it, and uses the freed frame to satisfy a new page request.

Formal Definition:

Given:

A set F of physical frames, all currently allocated
A page fault for virtual page p not in memory
Page p must be loaded from backing store

Page Replacement is the operation of:

Selecting a victim page v currently occupying frame f
Removing the mapping of v to f
(Optionally) Saving v to backing store if modified
Loading p into frame f
Establishing mapping of p to f

The core challenge is step 1: victim selection. This is where page replacement algorithms differ and where performance is won or lost.

Why "Replacement" Instead of "Allocation"?

Replacement vs Eviction

The Page Replacement Problem:

Formally stated, the page replacement problem asks:

Given a sequence of page references R = r₁, r₂, r₃, ..., rₙ and a fixed number of frames k, determine which pages to keep in memory at each step to minimize the total number of page faults.

Dimensions of the Problem:

Dimension	Question	Options
When to replace	Proactively or on demand?	On-fault, anticipatory, background
What to replace	Which page to evict?	Various algorithms (FIFO, LRU, Clock, etc.)
Where to replace	From which process?	Local, global, or partitioned
How to replace	What operations are required?	Clean vs dirty handling

The Replacement Process

Understanding page replacement requires tracing through the complete sequence of events, from the initial page fault to the resumption of the faulting process.

Phase 1: Page Fault Detection

The process begins when a memory access cannot be satisfied from physical memory:

1. CPU executes instruction accessing virtual address VA
2. MMU consults page table entry for VA's page
3. Valid bit is 0 → page not in memory
4. MMU raises page fault exception
5. CPU saves state and transfers to kernel page fault handler

Phase 2: Page Identification

The kernel must determine what page is needed and where it resides:

1. Extract faulting virtual address from CPU register
2. Identify faulting process from current context
3. Look up page in process's virtual memory map
4. Determine page source:
   - Anonymous page → allocate zeroed or from swap
   - File-backed page → identify file and offset
   - Private mapping → may be copy-on-write

Phase 3: Frame Acquisition

The kernel attempts to obtain a free frame:

1. Check free frame list
2. If frames available → skip to Phase 5
3. If no free frames → proceed to Phase 4 (replacement)

Phase 4: Victim Selection and Eviction

This is the heart of page replacement:

1. Invoke page replacement algorithm to select victim
2. Lock victim frame to prevent concurrent modification
3. Check if victim page is modified (dirty bit)
4. If dirty:
   a. Determine write-back location (swap or file)
   b. Initiate disk write
   c. Wait for I/O completion
   d. Clear dirty bit
5. Update victim's page table entry:
   - Clear valid bit (page no longer in memory)
   - Store disk location for future reference
6. Remove frame from any caching structures
7. Frame is now logically free

Phase 5: Page Load

1. Initiate disk read to load new page into frame
2. Wait for I/O completion
3. Page contents now in frame

Phase 6: Mapping Establishment

1. Update faulting process's page table entry:
   - Set frame number field
   - Set valid bit = 1
   - Clear reference bit
   - Clear dirty bit
   - Set protection bits as appropriate
2. Flush TLB entry for this page (if cached)

Phase 7: Process Resumption

1. Return from page fault handler
2. CPU restores saved state
3. Original instruction is re-executed
4. This time, page table lookup succeeds
5. Memory access completes normally

The Critical Path

The Anatomy of a Page Fault with Replacement

Let's trace through a concrete example of a page fault requiring replacement. This detailed walkthrough illuminates every data structure and operation involved.

Scenario Setup:

System has 4 physical frames (numbered 0-3)
All frames are currently occupied
Process A faults on virtual page 7
Page 7 is in swap at block 42

Initial State:

Physical Memory (4 frames):
┌─────────┬─────────────┬─────────┬───────┬───────────┐
│ Frame # │ Contents    │ Process │ VPage │ Dirty Bit │
├─────────┼─────────────┼─────────┼───────┼───────────┤
│    0    │ Code page   │ A       │   2   │     0     │
│    1    │ Stack page  │ B       │  15   │     1     │
│    2    │ Heap data   │ A       │   5   │     1     │
│    3    │ Shared lib  │ Shared  │  N/A  │     0     │
└─────────┴─────────────┴─────────┴───────┴───────────┘

Process A's Page Table:
┌─────────┬───────┬───────┬─────────┐
│ VPage # │ Valid │ Frame │ Dirty   │
├─────────┼───────┼───────┼─────────┤
│    2    │   1   │   0   │    0    │
│    5    │   1   │   2   │    1    │
│    7    │   0   │  N/A  │   N/A   │ ← Page to load
└─────────┴───────┴───────┴─────────┘

Free Frame List: [empty]

Step-by-Step Execution:

Step 1: Page Fault Occurs

Process A executes: MOV AX, [page7_address]
MMU checks page table: VPage 7, Valid = 0
Page fault trap to kernel

Step 2: Kernel Analyzes Fault

Fault address indicates VPage 7
Look up in Process A's virtual memory structures
Find: anonymous page, currently in swap at block 42

Step 3: Check Free Frames

Free frame list is empty
Must invoke page replacement

Step 4: Select Victim (Example: Choose Frame 0)

Replacement algorithm selects Frame 0 (Process A, VPage 2)
Note: Frame 0 is clean (dirty bit = 0)

Step 5: Evict Victim

Frame 0 contains Process A's VPage 2
Dirty bit = 0 → no disk write needed
Update Process A's page table: VPage 2 valid = 0
Frame 0 is now available

Step 6: Load New Page

Initiate disk read from swap block 42
Block for I/O completion (~10ms HDD, ~50μs SSD)
VPage 7 contents now in Frame 0

Step 7: Update Page Table

Process A's page table: VPage 7 → Frame 0, valid = 1

Step 8: Resume Process

Return from page fault handler
Instruction re-executes successfully

Final State:

Physical Memory (4 frames):
┌─────────┬─────────────┬─────────┬───────┬───────────┐
│ Frame # │ Contents    │ Process │ VPage │ Dirty Bit │
├─────────┼─────────────┼─────────┼───────┼───────────┤
│    0    │ New page 7  │ A       │   7   │     0     │ ← Changed
│    1    │ Stack page  │ B       │  15   │     1     │
│    2    │ Heap data   │ A       │   5   │     1     │
│    3    │ Shared lib  │ Shared  │  N/A  │     0     │
└─────────┴─────────────┴─────────┴───────┴───────────┘

Process A's Page Table:
┌─────────┬───────┬───────┬─────────┐
│ VPage # │ Valid │ Frame │ Dirty   │
├─────────┼───────┼───────┼─────────┤
│    2    │   0   │  N/A  │   N/A   │ ← Now invalid (evicted)
│    5    │   1   │   2   │    1    │
│    7    │   1   │   0   │    0    │ ← Now valid
└─────────┴───────┴───────┴─────────┘

Outcome Analysis:

Selected clean victim → only 1 disk I/O operation
Had we selected Frame 2 (dirty) → 2 disk I/O operations
Good choice saved ~10-50ms of page fault latency

What If We Immediately Need Page 2?

Key Decisions in Replacement

Decision 1: When to Replace?

There are two fundamental approaches:

On-Demand Replacement:

Replace only when a page fault finds no free frames
Simpler implementation
May cause latency spikes when frames depleted

Anticipatory Replacement (Background Reclamation):

Maintain a reserve of free frames
Background daemon (e.g., Linux's kswapd) proactively evicts pages
Page faults usually find free frames immediately
Trades background CPU/I/O for foreground latency

Most modern systems use anticipatory replacement with on-demand as fallback.

Replacement Timing Strategies
Strategy	When Replacement Occurs	Advantages	Disadvantages
Pure On-Demand	Only during page fault when no free frames	Simple, no wasted evictions	Variable fault latency, potential blocking
Low Watermark	When free frames drop below threshold	Maintains reserve, consistent latency	May evict pages still needed
Continuous	Constantly maintain target free percentage	Very consistent latency	Continuous overhead, may over-evict
Adaptive	Based on memory pressure and workload	Best of all worlds	Complex to tune correctly

Decision 2: Replacement Scope—Local vs Global

This decision determines the pool of pages eligible for eviction:

Local Replacement:

Only evict pages from the faulting process
Each process has a fixed frame allocation
Processes cannot steal frames from each other
Provides isolation but may underutilize memory

Global Replacement:

Any page in memory is a candidate for eviction
Frames flow to processes that need them
Better overall utilization
One process can impact another's performance

Partitioned Schemes:

Hybrid approaches dividing memory into pools
Critical processes get dedicated pools
Background processes share a common pool

Local Replacement:
┌──────────────────────────────────────────────┐
│ Process A Pool │ Process B Pool │ Kernel Pool │
│   [5 frames]   │   [5 frames]   │ [6 frames]  │
│    Isolated    │    Isolated    │  Protected  │
└──────────────────────────────────────────────┘
A can only evict from its own 5 frames.

Global Replacement:
┌──────────────────────────────────────────────┐
│          Shared Replaceable Pool             │
│   [All user frames - 10 total]               │
│   Any page can be evicted for any process    │
└──────────────────────────────────────────────┘
A can cause B's page to be evicted.

Decision 3: What to Replace?

This is where page replacement algorithms come in. The goal is to evict the page least likely to be needed soon. Common strategies:

Algorithm	Criterion	Approximates
FIFO	Oldest page in memory	Arrival time
OPT	Page used farthest in future	Perfect knowledge
LRU	Least recently accessed	Recent past
Clock	Not recently referenced	LRU approximation
LFU	Least frequently accessed	Usage frequency

Each algorithm trades accuracy for implementation cost. We'll explore these in detail in subsequent modules.

Decision 4: How to Handle Dirty Pages?

When a dirty page is selected for eviction:

Synchronous Write-Back:

Write page to disk before declaring frame free
Simple but adds latency to page fault

Asynchronous Write-Back:

Write page in background before it becomes victim
When evicted, page is already clean
Reduces page fault latency
Requires tracking and scheduling infrastructure

Modern systems use page cleaning daemons (like Linux's pdflush or flush workers) to write dirty pages asynchronously, so when replacement is needed, more pages are clean.

The Linux Approach

Evaluating Replacement Performance

How do we measure whether a page replacement algorithm is working well? Several metrics capture different aspects of performance.

Primary Metric: Page Fault Rate

The most fundamental measure is the page fault rate—the fraction of memory accesses that result in page faults:

Page Fault Rate = Number of Page Faults / Total Memory References

For a reference string of length n with f faults:

PFR = f / n

Lower is better. Typical values:

Excellent: < 0.001 (1 fault per 1000 accesses)
Good: 0.001 - 0.01
Acceptable: 0.01 - 0.1
Poor: > 0.1 (approaching thrashing)

Page Fault Rate Example:

Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 With 3 frames:

FIFO: 9 faults → PFR = 9/12 = 0.75
LRU: 10 faults → PFR = 10/12 = 0.83
OPT: 7 faults → PFR = 7/12 = 0.58

Key Performance Metrics

•Page Fault Rate (PFR) — Fraction of accesses causing faults; primary efficiency measure
•Effective Access Time (EAT) — Average time to access memory including fault probability; EAT = (1-p)×m + p×f where m=memory access time, f=fault time
•Dirty Page Eviction Ratio — Fraction of evicted pages that were dirty; higher means more I/O
•Fault Latency — Time to handle a single page fault; affected by dirty evictions and I/O speed
•Thrashing Detection — When working set > available frames; PFR spikes dramatically
•Frame Utilization — Percentage of allocated frames actually contributing to working sets

Effective Access Time Calculation:

EAT combines normal memory access time with the cost of page faults:

EAT = (1 - p) × memory_access_time + p × page_fault_time

Where:
  p = probability of page fault (page fault rate)
  memory_access_time ≈ 100 nanoseconds
  page_fault_time ≈ 10 milliseconds (HDD) or 100 microseconds (SSD)

Example Calculation (HDD):

p = 0.001 (1 fault per 1000 accesses)
memory_access = 100 ns
fault_time = 10 ms = 10,000,000 ns

EAT = (0.999)(100) + (0.001)(10,000,000)
    = 99.9 + 10,000
    = 10,099.9 ns ≈ 10 μs

Even with a 0.1% fault rate, access time increases by 100x!

Example Calculation (SSD):

p = 0.001
memory_access = 100 ns
fault_time = 100 μs = 100,000 ns

EAT = (0.999)(100) + (0.001)(100,000)
    = 99.9 + 100
    = 199.9 ns ≈ 200 ns

SSD reduces the impact to a 2x slowdown—demonstrating why SSDs dramatically improve system responsiveness under memory pressure.

The Brutal Asymmetry

The Replacement Algorithm Interface

The Conceptual Interface:

/**
 * Page Replacement Algorithm Interface
 * 
 * The memory manager calls these functions at specific points
 * in the page lifecycle.
 */

// Called when a page fault occurs and no free frames exist
// Returns: the frame to evict (the victim)
frame_t select_victim(void);

// Called when a page is first loaded into a frame
void page_loaded(frame_t frame, page_t page);

// Called when a page is accessed (may be used for LRU tracking)
void page_accessed(frame_t frame);

// Called when a page is modified (write reference)
void page_modified(frame_t frame);

// Called when a page is explicitly freed (process exits, munmap)
void page_freed(frame_t frame);

Information Available to Algorithms:

Algorithms have access to various metadata to make decisions:

Algorithm Decision Information
Information	Source	Used By
Page load time	Timestamp at page_loaded()	FIFO
Last access time	Updated on page_accessed()	LRU
Access frequency	Counter incremented on access	LFU
Reference bit	Hardware sets on any access	Clock, Second-Chance
Dirty/modify bit	Hardware sets on write	Enhanced Clock, All algorithms prefer clean
Page type	Anonymous, file-backed, shared	Linux PFRA
Working set info	Per-process access history	Working Set Algorithm

Hardware Support Requirements:

Efficient page replacement requires specific hardware support:

Reference Bit:

MMU sets to 1 when page is accessed
OS can clear it periodically
Essential for Clock and LRU approximations

Dirty (Modify) Bit:

MMU sets to 1 when page is written
Indicates page must be saved before eviction
Critical for avoiding unnecessary I/O

Both bits are per-page table entry:

Page Table Entry (simplified):
┌─────────────────────────────────────────────────────────┐
│ Frame # │ Valid │ Reference │ Dirty │ Protection │ ... │
│  20 bit │ 1 bit │   1 bit   │ 1 bit │   3 bit    │     │
└─────────────────────────────────────────────────────────┘

Software-Only Alternatives:

Some architectures lack hardware reference bits. Software simulation:

Mark page as invalid in page table
Any access causes page fault
Fault handler sets software reference bit, marks page valid
Significant overhead but allows LRU-like behavior

This technique was used extensively on older systems and remains available as fallback.

replacement-interface.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/* Example Page Replacement Algorithm Framework */
 
struct page_frame {
    unsigned long frame_number;
    struct page *page;           /* Virtual page mapped here */
    unsigned int flags;          /* State flags */
    unsigned long load_time;     /* When page was loaded (FIFO) */
    unsigned long last_access;   /* Last access time (LRU) */
    unsigned int access_count;   /* Access frequency (LFU) */
    struct list_head lru_list;   /* Position in replacement queue */
};
 
/* Replacement algorithm operations */
struct replacement_ops {
    /* Select a victim frame for eviction */
    struct page_frame *(*select_victim)(void);
    
    /* Called when page loaded into frame */
    void (*page_loaded)(struct page_frame *pf);
    
    /* Called on page access (if tracked) */
    void (*page_accessed)(struct page_frame *pf);
    
    /* Called when page modified */
    void (*page_modified)(struct page_frame *pf);
    
    /* Initialize algorithm state */
    int (*init)(void);
    
    /* Cleanup algorithm state */
    void (*cleanup)(void);
};
 
/* Current active algorithm */
static struct replacement_ops *current_algorithm;
 
/* Victim selection wrapper */
struct page_frame *find_victim(void) {
    ASSERT(current_algorithm != NULL);
    return current_algorithm->select_victim();
}

Common Challenges and Edge Cases

Page replacement seems conceptually simple—pick a victim and swap pages. In practice, numerous edge cases and challenges complicate implementation.

Challenge 1: Pinned/Locked Pages

Some pages cannot be evicted:

Non-evictable pages:
- Kernel code and data pages
- Pages undergoing DMA transfers
- Pages locked by mlock() system call
- Real-time application pages
- File system journal pages

Algorithms must skip these when scanning for victims.

Challenge 2: Shared Pages

Pages shared between processes have multiple PTE references:

Frame 42 is mapped by:
- Process A, VPage 10 (read-only)
- Process B, VPage 7 (read-only)  
- Process C, VPage 12 (read-only)

Eviction must:
1. Update ALL three page table entries
2. Flush TLB entries for ALL processes
3. Track reverse mappings from frames to PTEs

Challenge 3: Concurrent Page Faults

Multiple CPUs may fault simultaneously:

CPU 0: Faults on page P1, selects frame F
CPU 1: Faults on page P2, also selects frame F

Without synchronization:
- Both load pages into F
- One clobbers the other
- Silent corruption

Solution: Frame-level locks, atomic victim selection

Implementation Pitfalls

•TLB Inconsistency — After updating a page table, stale TLB entries may remain. Must flush TLB on affected CPUs or use TLB shootdown for SMP systems.
•Swap Space Exhaustion — If swap is full, dirty pages cannot be evicted. System may need to trigger OOM killer.
•Priority Inversion — Low-priority process holds a page that high-priority needs. Standard replacement ignores process priority.
•Livelock — All scannable pages are locked or recently referenced. Algorithm loops indefinitely without finding victim.
•Cascading Evictions — Evicting page A causes process B to fault on A, which evicts page C, causing process D to fault... Chain reactions degrade performance.
•Write-Amplification — Repeatedly dirtying and evicting the same pages causes excessive I/O wear, especially damaging to SSDs.

Challenge 4: NUMA Considerations

On Non-Uniform Memory Access systems, not all frames are equal:

NUMA Node 0 (CPU 0-3):
  - Local memory: 16 GB, access time: 100ns
  - Remote memory (Node 1): access time: 300ns

NUMA Node 1 (CPU 4-7):
  - Local memory: 16 GB, access time: 100ns
  - Remote memory (Node 0): access time: 300ns

Replacement implications:
- Prefer evicting remote pages (less costly to re-fault)
- Try to keep process pages on local node
- Balance memory pressure across nodes

Challenge 5: Large Pages (Huge Pages)

2 MB or 1 GB pages complicate replacement:

2 MB huge page = 512 standard 4 KB pages

Evicting one huge page:
- Frees 512 frames at once (good)
- Internal fragmentation if only part needed
- May evict actively-used sub-regions
- Shattering: break huge page into small pages first

Modern systems support transparent huge page (THP) with dynamic promotion/demotion, adding complexity to replacement decisions.

Production Reality

Summary and Key Takeaways

We've established a comprehensive understanding of page replacement as a concept. Let's consolidate the key points:

Key Takeaways

•Page replacement evicts a resident page to load a needed page — It's the mechanism that allows virtual memory to exceed physical memory.
•The core challenge is victim selection — Which page to evict determines whether we cause additional faults.
•The process involves multiple phases — Fault detection, victim selection, potential write-back, page load, and mapping establishment.
•Critical decisions include when, what, where, and how — Timing, algorithm, scope, and dirty-page handling all affect performance.
•Page fault rate is the primary metric — Even small fault rates cause significant slowdown due to the memory/disk speed asymmetry.
•Real implementations handle many edge cases — Pinned pages, shared pages, concurrency, NUMA, and huge pages all require special handling.

What's Next:

Page Complete

2 / 5