Loading learning content...
When physical memory is exhausted and a process needs a page that isn't in memory, the operating system faces a classical resource allocation problem: something must yield for something else to be served.
This isn't merely a technical challenge—it's a problem of profound consequences. The page we choose to evict will determine whether the system runs smoothly or descends into a cascade of additional page faults. Choose well, and the evicted page won't be needed for a long time. Choose poorly, and we'll immediately fault it back in, having wasted two disk I/O operations for nothing.
Page replacement is where virtual memory theory meets operational reality. This page explores the fundamental concept, mechanics, and critical decisions involved in this pivotal operating system operation.
By the end of this page, you will understand: (1) The formal definition of page replacement and its role in virtual memory, (2) The complete sequence of events during a page replacement operation, (3) The key decisions the OS must make and their performance implications, (4) The difference between local and global replacement scopes, and (5) The metrics by which we evaluate replacement quality.
Page replacement is the process by which an operating system selects a page currently resident in physical memory, removes it, and uses the freed frame to satisfy a new page request.
Formal Definition:
Given:
Page Replacement is the operation of:
The core challenge is step 1: victim selection. This is where page replacement algorithms differ and where performance is won or lost.
Why "Replacement" Instead of "Allocation"?
The term "replacement" emphasizes that we're not finding new resources—we're reallocating existing ones. Unlike initial page allocation where we simply use a free frame, replacement forces us to dispossess one page to accommodate another. This zero-sum nature is fundamental to understanding why the choice matters so much.
The terms 'page replacement' and 'page eviction' are often used interchangeably, but there's a subtle distinction. Eviction refers specifically to removing a page from memory, while replacement encompasses the complete process of evicting one page to make room for another. Eviction can also occur proactively (to maintain free frame reserves) without an immediate replacement need.
The Page Replacement Problem:
Formally stated, the page replacement problem asks:
Given a sequence of page references R = r₁, r₂, r₃, ..., rₙ and a fixed number of frames k, determine which pages to keep in memory at each step to minimize the total number of page faults.
This is a classic online algorithm problem—we must make decisions without knowing future references. The theoretically optimal solution (Bélády's OPT) requires future knowledge, making it impossible to implement but useful as a benchmark.
Dimensions of the Problem:
| Dimension | Question | Options |
|---|---|---|
| When to replace | Proactively or on demand? | On-fault, anticipatory, background |
| What to replace | Which page to evict? | Various algorithms (FIFO, LRU, Clock, etc.) |
| Where to replace | From which process? | Local, global, or partitioned |
| How to replace | What operations are required? | Clean vs dirty handling |
Understanding page replacement requires tracing through the complete sequence of events, from the initial page fault to the resumption of the faulting process.
Phase 1: Page Fault Detection
The process begins when a memory access cannot be satisfied from physical memory:
1. CPU executes instruction accessing virtual address VA
2. MMU consults page table entry for VA's page
3. Valid bit is 0 → page not in memory
4. MMU raises page fault exception
5. CPU saves state and transfers to kernel page fault handler
Phase 2: Page Identification
The kernel must determine what page is needed and where it resides:
1. Extract faulting virtual address from CPU register
2. Identify faulting process from current context
3. Look up page in process's virtual memory map
4. Determine page source:
- Anonymous page → allocate zeroed or from swap
- File-backed page → identify file and offset
- Private mapping → may be copy-on-write
Phase 3: Frame Acquisition
The kernel attempts to obtain a free frame:
1. Check free frame list
2. If frames available → skip to Phase 5
3. If no free frames → proceed to Phase 4 (replacement)
Phase 4: Victim Selection and Eviction
This is the heart of page replacement:
1. Invoke page replacement algorithm to select victim
2. Lock victim frame to prevent concurrent modification
3. Check if victim page is modified (dirty bit)
4. If dirty:
a. Determine write-back location (swap or file)
b. Initiate disk write
c. Wait for I/O completion
d. Clear dirty bit
5. Update victim's page table entry:
- Clear valid bit (page no longer in memory)
- Store disk location for future reference
6. Remove frame from any caching structures
7. Frame is now logically free
Phase 5: Page Load
1. Initiate disk read to load new page into frame
2. Wait for I/O completion
3. Page contents now in frame
Phase 6: Mapping Establishment
1. Update faulting process's page table entry:
- Set frame number field
- Set valid bit = 1
- Clear reference bit
- Clear dirty bit
- Set protection bits as appropriate
2. Flush TLB entry for this page (if cached)
Phase 7: Process Resumption
1. Return from page fault handler
2. CPU restores saved state
3. Original instruction is re-executed
4. This time, page table lookup succeeds
5. Memory access completes normally
The time-consuming steps are disk I/O operations. A clean page eviction requires only one disk read (for the new page). A dirty page eviction requires both a disk write and a disk read—potentially doubling the page fault latency. This is why many systems prefer evicting clean pages when possible.
Let's trace through a concrete example of a page fault requiring replacement. This detailed walkthrough illuminates every data structure and operation involved.
Scenario Setup:
Initial State:
Physical Memory (4 frames):
┌─────────┬─────────────┬─────────┬───────┬───────────┐
│ Frame # │ Contents │ Process │ VPage │ Dirty Bit │
├─────────┼─────────────┼─────────┼───────┼───────────┤
│ 0 │ Code page │ A │ 2 │ 0 │
│ 1 │ Stack page │ B │ 15 │ 1 │
│ 2 │ Heap data │ A │ 5 │ 1 │
│ 3 │ Shared lib │ Shared │ N/A │ 0 │
└─────────┴─────────────┴─────────┴───────┴───────────┘
Process A's Page Table:
┌─────────┬───────┬───────┬─────────┐
│ VPage # │ Valid │ Frame │ Dirty │
├─────────┼───────┼───────┼─────────┤
│ 2 │ 1 │ 0 │ 0 │
│ 5 │ 1 │ 2 │ 1 │
│ 7 │ 0 │ N/A │ N/A │ ← Page to load
└─────────┴───────┴───────┴─────────┘
Free Frame List: [empty]
Step-by-Step Execution:
Step 1: Page Fault Occurs
MOV AX, [page7_address]Step 2: Kernel Analyzes Fault
Step 3: Check Free Frames
Step 4: Select Victim (Example: Choose Frame 0)
Step 5: Evict Victim
Step 6: Load New Page
Step 7: Update Page Table
Step 8: Resume Process
Final State:
Physical Memory (4 frames):
┌─────────┬─────────────┬─────────┬───────┬───────────┐
│ Frame # │ Contents │ Process │ VPage │ Dirty Bit │
├─────────┼─────────────┼─────────┼───────┼───────────┤
│ 0 │ New page 7 │ A │ 7 │ 0 │ ← Changed
│ 1 │ Stack page │ B │ 15 │ 1 │
│ 2 │ Heap data │ A │ 5 │ 1 │
│ 3 │ Shared lib │ Shared │ N/A │ 0 │
└─────────┴─────────────┴─────────┴───────┴───────────┘
Process A's Page Table:
┌─────────┬───────┬───────┬─────────┐
│ VPage # │ Valid │ Frame │ Dirty │
├─────────┼───────┼───────┼─────────┤
│ 2 │ 0 │ N/A │ N/A │ ← Now invalid (evicted)
│ 5 │ 1 │ 2 │ 1 │
│ 7 │ 1 │ 0 │ 0 │ ← Now valid
└─────────┴───────┴───────┴─────────┘
Outcome Analysis:
If Process A immediately accesses VPage 2 after this fault, we'll fault again—loading page 2 back into memory. If this cascades, we're in a pathological state. This is why victim selection algorithms try to evict pages that won't be needed soon. The ideal victim is the page whose next access is furthest in the future.
Page replacement involves several critical decisions that significantly impact system performance. Understanding these decisions is essential for appreciating why different algorithms exist and how operating systems are configured.
Decision 1: When to Replace?
There are two fundamental approaches:
On-Demand Replacement:
Anticipatory Replacement (Background Reclamation):
kswapd) proactively evicts pagesMost modern systems use anticipatory replacement with on-demand as fallback.
| Strategy | When Replacement Occurs | Advantages | Disadvantages |
|---|---|---|---|
| Pure On-Demand | Only during page fault when no free frames | Simple, no wasted evictions | Variable fault latency, potential blocking |
| Low Watermark | When free frames drop below threshold | Maintains reserve, consistent latency | May evict pages still needed |
| Continuous | Constantly maintain target free percentage | Very consistent latency | Continuous overhead, may over-evict |
| Adaptive | Based on memory pressure and workload | Best of all worlds | Complex to tune correctly |
Decision 2: Replacement Scope—Local vs Global
This decision determines the pool of pages eligible for eviction:
Local Replacement:
Global Replacement:
Partitioned Schemes:
Local Replacement:
┌──────────────────────────────────────────────┐
│ Process A Pool │ Process B Pool │ Kernel Pool │
│ [5 frames] │ [5 frames] │ [6 frames] │
│ Isolated │ Isolated │ Protected │
└──────────────────────────────────────────────┘
A can only evict from its own 5 frames.
Global Replacement:
┌──────────────────────────────────────────────┐
│ Shared Replaceable Pool │
│ [All user frames - 10 total] │
│ Any page can be evicted for any process │
└──────────────────────────────────────────────┘
A can cause B's page to be evicted.
Decision 3: What to Replace?
This is where page replacement algorithms come in. The goal is to evict the page least likely to be needed soon. Common strategies:
| Algorithm | Criterion | Approximates |
|---|---|---|
| FIFO | Oldest page in memory | Arrival time |
| OPT | Page used farthest in future | Perfect knowledge |
| LRU | Least recently accessed | Recent past |
| Clock | Not recently referenced | LRU approximation |
| LFU | Least frequently accessed | Usage frequency |
Each algorithm trades accuracy for implementation cost. We'll explore these in detail in subsequent modules.
Decision 4: How to Handle Dirty Pages?
When a dirty page is selected for eviction:
Synchronous Write-Back:
Asynchronous Write-Back:
Modern systems use page cleaning daemons (like Linux's pdflush or flush workers) to write dirty pages asynchronously, so when replacement is needed, more pages are clean.
Linux uses global replacement with per-cgroup limits, a Clock-based algorithm (PFRA), watermark-triggered background reclamation (kswapd), and asynchronous dirty page writeback (flush workers). This combination balances efficiency, fairness, and responsiveness for diverse workloads.
How do we measure whether a page replacement algorithm is working well? Several metrics capture different aspects of performance.
Primary Metric: Page Fault Rate
The most fundamental measure is the page fault rate—the fraction of memory accesses that result in page faults:
Page Fault Rate = Number of Page Faults / Total Memory References
For a reference string of length n with f faults:
PFR = f / n
Lower is better. Typical values:
Page Fault Rate Example:
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 With 3 frames:
Effective Access Time Calculation:
EAT combines normal memory access time with the cost of page faults:
EAT = (1 - p) × memory_access_time + p × page_fault_time
Where:
p = probability of page fault (page fault rate)
memory_access_time ≈ 100 nanoseconds
page_fault_time ≈ 10 milliseconds (HDD) or 100 microseconds (SSD)
Example Calculation (HDD):
p = 0.001 (1 fault per 1000 accesses)
memory_access = 100 ns
fault_time = 10 ms = 10,000,000 ns
EAT = (0.999)(100) + (0.001)(10,000,000)
= 99.9 + 10,000
= 10,099.9 ns ≈ 10 μs
Even with a 0.1% fault rate, access time increases by 100x!
Example Calculation (SSD):
p = 0.001
memory_access = 100 ns
fault_time = 100 μs = 100,000 ns
EAT = (0.999)(100) + (0.001)(100,000)
= 99.9 + 100
= 199.9 ns ≈ 200 ns
SSD reduces the impact to a 2x slowdown—demonstrating why SSDs dramatically improve system responsiveness under memory pressure.
Memory access is ~100 nanoseconds; disk access is ~10 milliseconds on HDD. That's a 100,000× difference. A page fault rate of 0.01% still causes noticeable slowdown. A rate of 1% makes the system feel sluggish. This extreme asymmetry is why page replacement algorithm quality matters so much—even small improvements in fault rate yield significant user-perceived performance gains.
Page replacement algorithms implement a specific interface that the memory management subsystem invokes. Understanding this interface clarifies what information algorithms receive and what decisions they must make.
The Conceptual Interface:
/**
* Page Replacement Algorithm Interface
*
* The memory manager calls these functions at specific points
* in the page lifecycle.
*/
// Called when a page fault occurs and no free frames exist
// Returns: the frame to evict (the victim)
frame_t select_victim(void);
// Called when a page is first loaded into a frame
void page_loaded(frame_t frame, page_t page);
// Called when a page is accessed (may be used for LRU tracking)
void page_accessed(frame_t frame);
// Called when a page is modified (write reference)
void page_modified(frame_t frame);
// Called when a page is explicitly freed (process exits, munmap)
void page_freed(frame_t frame);
Information Available to Algorithms:
Algorithms have access to various metadata to make decisions:
| Information | Source | Used By |
|---|---|---|
| Page load time | Timestamp at page_loaded() | FIFO |
| Last access time | Updated on page_accessed() | LRU |
| Access frequency | Counter incremented on access | LFU |
| Reference bit | Hardware sets on any access | Clock, Second-Chance |
| Dirty/modify bit | Hardware sets on write | Enhanced Clock, All algorithms prefer clean |
| Page type | Anonymous, file-backed, shared | Linux PFRA |
| Working set info | Per-process access history | Working Set Algorithm |
Hardware Support Requirements:
Efficient page replacement requires specific hardware support:
Reference Bit:
Dirty (Modify) Bit:
Both bits are per-page table entry:
Page Table Entry (simplified):
┌─────────────────────────────────────────────────────────┐
│ Frame # │ Valid │ Reference │ Dirty │ Protection │ ... │
│ 20 bit │ 1 bit │ 1 bit │ 1 bit │ 3 bit │ │
└─────────────────────────────────────────────────────────┘
Software-Only Alternatives:
Some architectures lack hardware reference bits. Software simulation:
This technique was used extensively on older systems and remains available as fallback.
1234567891011121314151617181920212223242526272829303132333435363738394041
/* Example Page Replacement Algorithm Framework */ struct page_frame { unsigned long frame_number; struct page *page; /* Virtual page mapped here */ unsigned int flags; /* State flags */ unsigned long load_time; /* When page was loaded (FIFO) */ unsigned long last_access; /* Last access time (LRU) */ unsigned int access_count; /* Access frequency (LFU) */ struct list_head lru_list; /* Position in replacement queue */}; /* Replacement algorithm operations */struct replacement_ops { /* Select a victim frame for eviction */ struct page_frame *(*select_victim)(void); /* Called when page loaded into frame */ void (*page_loaded)(struct page_frame *pf); /* Called on page access (if tracked) */ void (*page_accessed)(struct page_frame *pf); /* Called when page modified */ void (*page_modified)(struct page_frame *pf); /* Initialize algorithm state */ int (*init)(void); /* Cleanup algorithm state */ void (*cleanup)(void);}; /* Current active algorithm */static struct replacement_ops *current_algorithm; /* Victim selection wrapper */struct page_frame *find_victim(void) { ASSERT(current_algorithm != NULL); return current_algorithm->select_victim();}Page replacement seems conceptually simple—pick a victim and swap pages. In practice, numerous edge cases and challenges complicate implementation.
Challenge 1: Pinned/Locked Pages
Some pages cannot be evicted:
Non-evictable pages:
- Kernel code and data pages
- Pages undergoing DMA transfers
- Pages locked by mlock() system call
- Real-time application pages
- File system journal pages
Algorithms must skip these when scanning for victims.
Challenge 2: Shared Pages
Pages shared between processes have multiple PTE references:
Frame 42 is mapped by:
- Process A, VPage 10 (read-only)
- Process B, VPage 7 (read-only)
- Process C, VPage 12 (read-only)
Eviction must:
1. Update ALL three page table entries
2. Flush TLB entries for ALL processes
3. Track reverse mappings from frames to PTEs
Challenge 3: Concurrent Page Faults
Multiple CPUs may fault simultaneously:
CPU 0: Faults on page P1, selects frame F
CPU 1: Faults on page P2, also selects frame F
Without synchronization:
- Both load pages into F
- One clobbers the other
- Silent corruption
Solution: Frame-level locks, atomic victim selection
Challenge 4: NUMA Considerations
On Non-Uniform Memory Access systems, not all frames are equal:
NUMA Node 0 (CPU 0-3):
- Local memory: 16 GB, access time: 100ns
- Remote memory (Node 1): access time: 300ns
NUMA Node 1 (CPU 4-7):
- Local memory: 16 GB, access time: 100ns
- Remote memory (Node 0): access time: 300ns
Replacement implications:
- Prefer evicting remote pages (less costly to re-fault)
- Try to keep process pages on local node
- Balance memory pressure across nodes
Challenge 5: Large Pages (Huge Pages)
2 MB or 1 GB pages complicate replacement:
2 MB huge page = 512 standard 4 KB pages
Evicting one huge page:
- Frees 512 frames at once (good)
- Internal fragmentation if only part needed
- May evict actively-used sub-regions
- Shattering: break huge page into small pages first
Modern systems support transparent huge page (THP) with dynamic promotion/demotion, adding complexity to replacement decisions.
Real page replacement implementations handle dozens of edge cases not covered in textbooks. Linux's PFRA (Page Frame Reclamation Algorithm) contains thousands of lines managing referenced lists, active/inactive tracking, memory cgroups, zone balancing, compaction, and more. The core concept is simple; robust implementation is deeply complex.
We've established a comprehensive understanding of page replacement as a concept. Let's consolidate the key points:
What's Next:
With a solid understanding of the page replacement concept, we now turn to the critical question: how do we choose which page to evict? The next page explores victim selection—the heart of every page replacement algorithm and the decision that most directly impacts system performance.
You now understand page replacement at a conceptual level—what it is, why it's necessary, and the key decisions involved. This foundation prepares you for the algorithmic details to come, where we'll see how different victim selection strategies produce dramatically different performance characteristics.