Loading content...
On the previous page, we established that LRU evicts the "least recently used" page. But what exactly does "recently" mean? How do we measure it? And how does recency translate into practical eviction decisions?
This page dives deep into the mechanics of recency-based decision making. We'll formalize the concept with mathematical precision, explore the stack distance model that underlies LRU's theoretical analysis, and understand exactly how an LRU system determines which page to evict at any given moment.
These foundations are crucial—they will inform our understanding of the implementation challenges and approximation strategies that follow.
By the end of this page, you will understand: (1) formal definitions of recency, (2) the LRU stack model and stack distance, (3) how LRU uses recency to order pages, (4) the relationship between stack distance and page faults, and (5) how recency decisions propagate through the memory hierarchy.
Recency in the context of page replacement has a precise definition. Let's formalize it.
Definition 1: Last Access Time
For a page p, let last_access(p) denote the logical timestamp of the most recent memory reference to page p. If p has never been accessed, last_access(p) = -∞.
Definition 2: Recency Value
The recency value of a page at logical time t is: $$recency(p, t) = t - last_access(p)$$
A larger recency value means the page is "older" (less recently used). A smaller value means the page is "fresher" (more recently used).
Definition 3: LRU Victim Selection
At time t, when eviction is necessary from memory set M: $$victim = \arg\max_{p \in M} recency(p, t) = \arg\min_{p \in M} last_access(p)$$
| Page | Last Access Time | Recency Value | Interpretation |
|---|---|---|---|
| P₁ | 98 | 100 - 98 = 2 | Very fresh (2 units ago) |
| P₂ | 95 | 100 - 95 = 5 | Fresh (5 units ago) |
| P₃ | 80 | 100 - 80 = 20 | Stale (20 units ago) |
| P₄ | 50 | 100 - 50 = 50 | Old (50 units ago) |
| P₅ | 10 | 100 - 10 = 90 | Most stale → LRU victim |
LRU typically uses logical time—the count of memory references—rather than wall-clock time. A reference string like [A, B, C, A, D] has logical timestamps 1, 2, 3, 4, 5. This is independent of how long each access takes in real time. Logical time captures the sequence of program events, not durations.
The most elegant way to understand LRU behavior is through the LRU stack model, introduced by researchers in the 1970s. This model provides a unified view of recency ordering across all possible memory sizes.
The LRU Stack
Imagine an infinite stack containing all pages ever referenced. The stack is always ordered by recency:
Key Property: When a page is accessed, it moves to position 1 (top). All pages that were above it shift down by one position. Pages below it don't move.
LRU Stack Behavior for Reference String: A, B, C, B, A, D, C Initial: [empty] After accessing A: Stack: [A] A at position 1After accessing B: Stack: [B, A] B at top, A moved downAfter accessing C: Stack: [C, B, A] C at topAfter accessing B: Stack: [B, C, A] B moves to top, C shifts downAfter accessing A: Stack: [A, B, C] A moves to top, B shifts downAfter accessing D: Stack: [D, A, B, C] D at topAfter accessing C: Stack: [C, D, A, B] C moves from pos 4 to pos 1 The stack always maintains the recency order:- Position 1 = most recently used- Higher positions = less recently usedThe Inclusion Property
A crucial insight from the stack model: if you have n frames of memory, the pages in memory are exactly the top n pages of the LRU stack.
This directly proves that LRU is a stack algorithm: $$S_n(t) \subset S_{n+1}(t)$$
The pages in memory with n frames are always a subset of pages in memory with n+1 frames. This is why Bélády's anomaly is impossible for LRU.
The stack model lets us analyze LRU behavior for ALL memory sizes simultaneously. Process the reference string once to build the stack, then for any memory size n, you can instantly determine hits/faults by checking stack positions.
Stack distance is the key metric for analyzing LRU performance. It connects recency to cache behavior.
Definition: Stack Distance
When page p is accessed at time t, its stack distance d(p, t) is its position in the LRU stack just before the access.
The Fundamental Theorem of LRU:
With n frames of memory, a page access results in a hit if and only if d ≤ n.
In other words:
| Time | Reference | Stack Before | Stack Distance | Hit if n≥ | Stack After |
|---|---|---|---|---|---|
| 1 | A | [empty] | ∞ (first) | ∞ | [A] |
| 2 | B | [A] | ∞ (first) | ∞ | [B, A] |
| 3 | C | [B, A] | ∞ (first) | ∞ | [C, B, A] |
| 4 | B | [C, B, A] | 2 | 2 | [B, C, A] |
| 5 | A | [B, C, A] | 3 | 3 | [A, B, C] |
| 6 | D | [A, B, C] | ∞ (first) | ∞ | [D, A, B, C] |
| 7 | C | [D, A, B, C] | 4 | 4 | [C, D, A, B] |
| 8 | B | [C, D, A, B] | 4 | 4 | [B, C, D, A] |
| 9 | A | [B, C, D, A] | 4 | 4 | [A, B, C, D] |
Reading the Analysis:
For 9 accesses with various memory sizes:
| Memory Size | Hits (stack dist ≤ n) | Faults | Hit Rate |
|---|---|---|---|
| n = 1 | 0 | 9 | 0% |
| n = 2 | 1 (t=4) | 8 | 11% |
| n = 3 | 2 (t=4,5) | 7 | 22% |
| n = 4 | 5 (t=4,5,7,8,9) | 4 | 56% |
| n = 5+ | 5 | 4 | 56% |
Notice: more memory never increases faults (stack algorithm property). Each additional frame captures references at exactly one additional stack distance.
Real programs have characteristic stack distance distributions. Programs with strong locality have many references with small stack distances. The stack distance histogram reveals how much memory a program needs—if 90% of references have stack distance ≤ 100, then 100 frames will achieve 90% hit rate.
While the LRU stack is a theoretical model, a practical implementation typically uses a recency list—a doubly linked list of pages ordered by recency.
Data Structure: Doubly Linked List + Hash Table
1234567891011121314151617181920212223242526
// LRU Recency List Data Structure typedef struct PageNode { int page_number; struct PageNode* prev; // More recently used struct PageNode* next; // Less recently used} PageNode; typedef struct LRUCache { PageNode* head; // Most recently used (MRU) PageNode* tail; // Least recently used (LRU) - eviction candidate HashMap* page_map; // page_number -> PageNode* for O(1) lookup int capacity; // Maximum number of frames int size; // Current number of pages in memory} LRUCache; /* * List Organization: * * head (MRU) tail (LRU) * ↓ ↓ * [Page A] ←→ [Page B] ←→ [Page C] ←→ [Page D] ←→ [Page E] * ↑ ↑ * Most recent Evict this one * access (oldest access) */Operations on the Recency List:
1. Access a page already in memory (Hit):
2. Access a page not in memory (Fault):
3. Eviction:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
// Move page to MRU position (head)void move_to_head(LRUCache* cache, PageNode* node) { // Unlink from current position if (node->prev) node->prev->next = node->next; if (node->next) node->next->prev = node->prev; // Update tail if we're moving the tail if (cache->tail == node) cache->tail = node->prev; // Insert at head node->prev = NULL; node->next = cache->head; if (cache->head) cache->head->prev = node; cache->head = node; // Update tail if list was empty if (!cache->tail) cache->tail = node;} // Evict LRU page (tail)int evict_lru(LRUCache* cache) { if (!cache->tail) return -1; // Empty cache PageNode* victim = cache->tail; int evicted_page = victim->page_number; // Update tail pointer cache->tail = victim->prev; if (cache->tail) cache->tail->next = NULL; else cache->head = NULL; // Cache now empty // Remove from hash map and free hashmap_remove(cache->page_map, evicted_page); free(victim); cache->size--; return evicted_page;} // Access a page (returns true if hit, false if fault)bool access_page(LRUCache* cache, int page_number) { PageNode* node = hashmap_get(cache->page_map, page_number); if (node) { // HIT: Move to head (update recency) move_to_head(cache, node); return true; } // FAULT: Need to bring page into memory if (cache->size >= cache->capacity) { evict_lru(cache); // Evict least recently used } // Create new node at head PageNode* new_node = malloc(sizeof(PageNode)); new_node->page_number = page_number; move_to_head(cache, new_node); hashmap_put(cache->page_map, page_number, new_node); cache->size++; return false;}Without the hash table, finding a page in the list would take O(n) time, making every access linear in the number of pages. The hash map provides O(1) lookup, and all list manipulations are O(1) with doubly linked lists. The combination achieves O(1) for all operations.
The recency list gives us O(1) operations, but there's a deeper problem: who performs these updates, and when?
The Challenge:
Every memory reference—every instruction fetch, every data load, every store—is a reference that should update the recency ordering. In a system running at 3 GHz:
Even with O(1) operations, performing software updates for every memory reference would catastrophically slow the system.
| Scenario | Updates/Second | Overhead/Update | Total Overhead | CPU Time Left |
|---|---|---|---|---|
| Software LRU tracking | 3 billion | 50 cycles | 150 billion cycles | 0% (impossible) |
| Hardware TLB only | 3 billion | N/A | N/A | Would need per-access HW |
| Periodic sampling (1ms) | 1 thousand | 50 cycles | 50K cycles | ~99.998% |
| Page fault time only | ~10K (depends) | 50 cycles | 500K cycles | ~99.98% |
The Two Fundamental Approaches:
1. True LRU with Hardware Assistance
2. Approximate LRU with Periodic Sampling
The next two pages explore exact implementations (counters and stacks), followed by the practical approximations used in real operating systems.
Perfect LRU requires tracking every memory access. Practical systems can't do this in software. The choice is: (1) expensive hardware for true LRU, (2) approximate LRU with acceptable overhead, or (3) simpler algorithms like CLOCK. Real systems choose approximation.
Recency-based replacement isn't unique to page replacement—it appears throughout the memory hierarchy. Understanding this reveals a unifying principle.
LRU at Every Level:
| Level | Unit | Replacement Policy | Who Updates Recency? | Time Scale |
|---|---|---|---|---|
| CPU Registers | Individual registers | Compiler allocation | Compiler (compile time) | N/A |
| L1 Cache | Cache lines (64B) | LRU or pseudo-LRU | Hardware (every access) | Nanoseconds |
| L2/L3 Cache | Cache lines (64B) | Pseudo-LRU/SRRIP | Hardware (every access) | Nanoseconds |
| TLB | Page table entries | LRU or FIFO | Hardware (every translation) | Nanoseconds |
| Page Cache | Pages (4KB) | LRU approximation | OS + hardware bits | Milliseconds |
| Disk Buffer | Disk blocks | LRU variants | Disk controller | Milliseconds |
Key Insight: Hardware Implements LRU for Caches
CPU caches can afford true (or near-true) LRU because:
Why the OS Can't Do This for Pages:
The page cache must use approximations while caches use near-exact LRU implemented in silicon.
The TLB (Translation Lookaside Buffer) uses LRU-like replacement for its entries. When the OS page replacement evicts a page, its TLB entry must also be invalidated. TLB misses after poor page replacement decisions compound the performance impact.
LRU uses recency—when was the page last used? An alternative approach uses frequency—how often is the page used? Understanding this distinction is crucial for appreciating LRU's strengths and weaknesses.
Comparison:
Scenario: LRU Wins
A program runs phase A (using pages 1-10) for 1 million accesses, then switches to phase B (using pages 11-20). With LFU, pages 1-10 have massive frequency counts and will never be evicted, even though the program has moved on. LRU immediately adapts—pages 11-20 become most recent as they're accessed.
Scenario: LFU Wins
A popular page is accessed frequently but happened to be idle for a brief moment when a burst of one-time accesses occurs. LRU might evict the popular page. LFU would keep it due to its high frequency count.
Modern Hybrid: ARC and LIRS
Advanced algorithms like ARC (Adaptive Replacement Cache) and LIRS (Low Inter-reference Recency Set) combine both metrics, maintaining separate lists for recency and frequency, and adaptively balancing between them based on workload characteristics.
For most workloads, LRU's quick adaptation to changing patterns is more valuable than LFU's historical memory. Phase changes in programs are common, and LRU handles them gracefully. This is why LRU (and its approximations) dominate in practice.
Competitive Analysis
How does LRU compare to the optimal algorithm? This question is answered by competitive analysis.
Theorem (LRU Competitive Ratio):
For any reference string of length n over a page set of size N, with k frames of memory: $$\text{Faults}{LRU} \leq k \cdot \text{Faults}{OPT}$$
LRU is k-competitive: its fault count is at most k times the optimal. This sounds bad (k could be large), but it's actually a strong guarantee—it holds for any reference string, including adversarial ones.
In practice, LRU is much closer to optimal because real programs have locality. The competitive ratio is a worst-case bound that rarely occurs.
| Workload Type | LRU Faults | OPT Faults | Ratio | Notes |
|---|---|---|---|---|
| Compiler (GCC) | 12,450 | 10,200 | 1.22 | Strong locality |
| Database (OLTP) | 8,320 | 7,050 | 1.18 | Working set behavior |
| Web server | 15,600 | 13,400 | 1.16 | Good locality |
| Sequential scan | 100,000 | 10,000 | 10.0 | Pathological case |
| Scientific (array) | 45,000 | 42,000 | 1.07 | Excellent locality |
The Stack Distance Model for Prediction
Given a program's stack distance distribution D, where D(d) is the probability of a reference having stack distance d, the expected hit rate with n frames is:
$$\text{Hit Rate}(n) = \sum_{d=1}^{n} D(d)$$
This formula lets us predict LRU performance for any memory size from a single stack distance profile. It's the foundation of cache simulation techniques used in performance analysis.
By measuring a program's stack distance distribution once, you can predict its LRU performance for any memory size. This technique is used by architects to size caches, by researchers to evaluate new algorithms, and by systems engineers to capacity plan memory allocation.
We've explored the precise mechanics of how LRU makes eviction decisions based on recency. Let's consolidate:
What's Next:
We've established how recency-based decisions work in theory. The next page explores the implementation challenges of LRU—specifically why maintaining exact recency ordering is prohibitively expensive, and the hardware/software trade-offs involved.
You now understand the mathematical foundations of recency-based page replacement: the formal definitions, the stack model, stack distance analysis, and the recency list data structure. This precision is essential for understanding both the elegance of LRU and the challenges of implementing it.