Loading content...
When the FIFO algorithm disappointed system designers with Bélády's anomaly, and the optimal algorithm remained forever out of reach, the search turned to something more practical: an algorithm that could approximate optimal behavior using information available to the operating system. The answer came from a fundamental insight about how programs actually use memory—the principle of temporal locality.
Least Recently Used (LRU) emerged as the gold standard of practical page replacement. Its premise is deceptively simple: if a page hasn't been accessed in a long time, it probably won't be accessed soon. This single assumption, grounded in decades of program behavior analysis, has proven remarkably effective across virtually all workload types.
By the end of this page, you will understand: (1) the theoretical basis of LRU in temporal locality, (2) why LRU approximates the optimal algorithm, (3) the mathematical formalization of recency, (4) LRU's behavior with reference strings, and (5) why LRU is considered the practical benchmark for page replacement.
Before diving into LRU, we must understand the limitation that made it necessary. FIFO (First-In, First-Out) seemed intuitive—replace the oldest page in memory. But FIFO has a fatal flaw: it ignores what happened after a page arrived.
Consider this scenario:
Under FIFO, when we need to evict a page, we replace A—the oldest—even though it's still actively in use. Meanwhile, B (completely unused) stays in memory simply because it arrived later. This is catastrophically wrong from a performance perspective.
| Page | Arrival Time | Last Access | Access Frequency | FIFO Priority | Actual Value |
|---|---|---|---|---|---|
| Page A | t=0 | t=100 (now) | High (active) | Evict first ❌ | Keep (critical) |
| Page B | t=1 | t=1 (never again) | Zero | Keep second ❌ | Evict (useless) |
| Page C | t=2 | t=50 | Medium | Keep last | Consider |
The insight: arrival time tells us nothing about future usefulness. What matters is recent access patterns—the recency of use. A page that was just accessed is likely to be accessed again soon. A page that hasn't been touched in a long time is probably no longer needed.
This observation, rooted in empirical studies of program behavior spanning decades, is called the principle of temporal locality.
Temporal locality is one of the two fundamental principles of program behavior (the other being spatial locality). It states:
If a memory location is accessed at time t, there is a high probability it will be accessed again at time t + Δ for small values of Δ.
In simpler terms: recently used memory will likely be used again soon.
This isn't just a theoretical assumption—it's an empirical observation that holds across virtually all program types:
for loop iterating 1000 times accesses the same code 1000 times consecutively.Empirical studies consistently show that at any moment, a program actively uses a small 'working set' of pages. Peter Denning's seminal work demonstrated that this working set size remains stable during phases of execution, and sudden changes indicate phase transitions. This working set behavior is a direct consequence of temporal locality.
The mathematical foundation of temporal locality gives rise to the stack distance model. For each memory access, we can measure how many distinct pages have been accessed since this page was last used. Pages with small stack distances (recently used) are likely to be used again; pages with large stack distances (long-unused) are likely cold.
LRU directly implements this stack distance intuition: it always evicts the page with the largest stack distance—the one accessed longest ago.
Least Recently Used (LRU) is a page replacement algorithm that, when a page fault occurs and no free frames are available, selects the page that has not been accessed for the longest time—the page whose most recent access is furthest in the past.
Formal Definition:
Let M be the set of pages currently in memory frames, and let last_access(p) denote the time of the most recent reference to page p. When a page replacement is necessary:
$$\text{victim} = \arg\min_{p \in M} last_access(p)$$
In words: choose the page p in memory M whose last_access time is minimal (oldest).
The Core Assumption:
LRU makes a single assumption: the past predicts the future. Specifically, recent access patterns will continue into the near future. A page not used for a long time is assumed to not be needed soon.
| Page in Memory | Last Access Time | Time Since Access | LRU Priority |
|---|---|---|---|
| Page P₁ | t = 1000 (current) | 0 time units | Keep (most recently used) |
| Page P₂ | t = 995 | 5 time units | Keep (recently used) |
| Page P₃ | t = 980 | 20 time units | Consider for eviction |
| Page P₄ | t = 900 | 100 time units | Strong eviction candidate |
| Page P₅ | t = 500 | 500 time units | EVICT (least recently used) |
LRU (Least Recently Used) considers recency—when was the page last accessed? LFU (Least Frequently Used) considers frequency—how many times has the page been accessed? These are different metrics. A page accessed 1000 times in the past but not at all recently might be evicted under LRU but kept under LFU. Each has different strengths for different workloads.
The Optimal (OPT/MIN) algorithm achieves the minimum possible page faults by evicting the page that won't be used for the longest time in the future. This requires knowing the future—impossible in practice.
LRU uses the past to approximate the future. This is the key insight:
| Optimal Algorithm | LRU Algorithm |
|---|---|
| Looks at future references | Looks at past references |
| Evicts page used farthest in future | Evicts page used farthest in past |
| Perfect decisions | Decisions based on locality assumption |
| Impossible to implement | Implementable (with challenges) |
Why does this approximation work? Because of temporal locality: if a page was used recently, it will likely be used soon. If a page hasn't been used for a long time, it likely won't be used soon. The past reflects the future, asymmetrically.
Imagine the reference string as a timeline. OPT looks forward from the current moment; LRU looks backward. If memory access patterns are symmetric around the present (which they approximately are, due to locality), both strategies make similar decisions. This 'mirror property' explains why LRU performs close to optimal for most real workloads.
Mathematical Perspective: Stack Algorithms
LRU belongs to a class called stack algorithms. A page replacement algorithm is a stack algorithm if, for a reference string, the set of pages in memory with n frames is always a subset of the pages in memory with n+1 frames.
Formally, if S_n(t) is the set of pages in memory at time t with n frames:
$$S_n(t) \subseteq S_{n+1}(t) \text{ for all } t$$
This has a crucial implication: stack algorithms never exhibit Bélády's anomaly. More frames always mean fewer or equal page faults—never more.
Both OPT and LRU are stack algorithms. FIFO is not. This mathematical property is why LRU is fundamentally superior to FIFO for general workloads.
Let's trace through LRU with a concrete example. Consider the reference string:
7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1
With 3 frames available, let's track each reference, noting LRU order (leftmost = least recently used, must be evicted next).
| Reference | Frame 0 | Frame 1 | Frame 2 | LRU Order | Fault? |
|---|---|---|---|---|---|
| 7 | 7 | [7] | Yes ✗ | ||
| 0 | 7 | 0 | [7,0] | Yes ✗ | |
| 1 | 7 | 0 | 1 | [7,0,1] | Yes ✗ |
| 2 | 2 | 0 | 1 | [0,1,2] | Yes ✗ (evict 7, LRU) |
| 0 | 2 | 0 | 1 | [1,2,0] | No (0 in memory, update recency) |
| 3 | 2 | 0 | 3 | [2,0,3] | Yes ✗ (evict 1, LRU) |
| 0 | 2 | 0 | 3 | [2,3,0] | No (update 0's recency) |
| 4 | 4 | 0 | 3 | [3,0,4] | Yes ✗ (evict 2, LRU) |
| 2 | 4 | 0 | 2 | [0,4,2] | Yes ✗ (evict 3, LRU) |
| 3 | 4 | 3 | 2 | [4,2,3] | Yes ✗ (evict 0, LRU) |
| 0 | 0 | 3 | 2 | [2,3,0] | Yes ✗ (evict 4, LRU) |
| 3 | 0 | 3 | 2 | [2,0,3] | No (update 3's recency) |
| 2 | 0 | 3 | 2 | [0,3,2] | No (update 2's recency) |
| 1 | 0 | 3 | 1 | [0,3,1] | Yes ✗ (evict 2, LRU) |
| 2 | 0 | 2 | 1 | [0,1,2] | Yes ✗ (evict 3, LRU) |
| 0 | 0 | 2 | 1 | [1,2,0] | No (update 0's recency) |
| 1 | 0 | 2 | 1 | [2,0,1] | No (update 1's recency) |
| 7 | 0 | 2 | 7 | [0,1,7] | Yes ✗ (evict 2, LRU) |
| 0 | 0 | 2→0 | 7 | [1,7,0] | No (update 0's recency) |
| 1 | 0 | 0→1 | 7 | [7,0,1] | No (update 1's recency) |
Analysis Results:
Comparison with other algorithms on the same string (3 frames):
| Algorithm | Page Faults | Fault Rate |
|---|---|---|
| OPT | 9 | 45% |
| LRU | 12 | 60% |
| FIFO | 15 | 75% |
LRU provides 3 fewer faults than FIFO (20% improvement) and is only 3 faults more than optimal. For this workload, LRU is a good approximation of optimal.
The key to LRU is maintaining the recency order. Every time a page is accessed (hit or fault), it moves to the 'most recently used' position. The page at the 'least recently used' position is the eviction candidate. This ordering changes with every memory reference, which is why LRU is expensive to implement exactly.
LRU is not perfect. Its effectiveness depends on programs exhibiting temporal locality. When this assumption breaks, LRU can perform poorly—sometimes as bad as random replacement or worse.
Pathological Case 1: Sequential Scans Larger Than Memory
123456789101112131415
// A database scans a 5GB table with only 4GB memory// The scan accesses pages: 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ... // With 4 pages of memory and 5-page working set:// Access 1: [1, _, _, _] - FAULT// Access 2: [1, 2, _, _] - FAULT// Access 3: [1, 2, 3, _] - FAULT// Access 4: [1, 2, 3, 4] - FAULT// Access 5: [5, 2, 3, 4] - FAULT (evict 1, LRU)// Access 1: [5, 1, 3, 4] - FAULT (evict 2, LRU) - 1 was JUST evicted!// Access 2: [5, 1, 2, 4] - FAULT (evict 3, LRU) - 2 was JUST evicted!// ... every single access is a fault! // LRU achieves 0% hit rate on this sequential scan// This is called "LRU pollution" or "scan resistance failure"Pathological Case 2: Loop Larger Than Memory
A loop that cycles through more pages than available frames causes every single access to be a fault under LRU. The "least recently used" page is always the one we're about to need.
Real systems address these weaknesses with algorithms like ARC (Adaptive Replacement Cache), 2Q, LIRS, and CLOCK-Pro. These maintain multiple lists or use frequency information alongside recency. Linux's page cache uses a variant called 'Multi-Generation LRU' (MGLRU) that addresses scan resistance by distinguishing between generations of page ages.
LRU's elegance as an algorithm contrasts sharply with the difficulty of implementing it efficiently. The core challenge:
Every memory access must update the recency ordering.
Consider a modern system:
For each of these accesses, a pure LRU implementation must:
The Cost Analysis:
| Approach | Update Cost | Search Cost | Space Cost | Practical? |
|---|---|---|---|---|
| Linked List | O(n) to find + O(1) to move | O(n) | O(n) pointers | No (too slow) |
| Counter per Page | O(1) increment | O(n) to find min | O(n) counters | Maybe (search slow) |
| Clock/Hardware Counter | O(1) reference | N/A | O(1) | Only if HW supports |
| Stack (software) | O(n) manipulation | O(1) bottom | O(n) | No (too slow) |
| Hardware TLB Aging | O(1) | O(n) for MMU | Embedded | Impractical |
Why can't we just update on every reference?
At billions of memory references per second:
The practical reality:
No modern operating system implements pure LRU. Instead, they use approximations that:
The subsequent pages in this module explore the exact implementations: counters, stacks, and ultimately the practical approximations that real systems use.
Most architectures include a 'reference bit' in each page table entry. The MMU sets this bit automatically when a page is accessed. The OS can periodically sample these bits to approximate recency. This hardware assist is what makes LRU-like behavior practical. We'll explore this in detail when we discuss the Clock algorithm and LRU approximations.
The Origins of LRU
LRU emerged in the 1960s as researchers at IBM, MIT, and other institutions grappled with the newly invented concept of virtual memory. The algorithm was formalized in the context of the Atlas computer (1962) and later refined through work on the Multics and IBM System/360 systems.
Key milestones:
Every major operating system uses LRU or LRU approximations: Linux (Multi-Generation LRU), Windows (working set manager with LRU-like aging), macOS (unified buffer cache with LRU), and all major database systems (buffer pool managers). The principle has stood for 60+ years as the foundation of practical memory management.
We've established the foundational understanding of LRU. Let's consolidate the key insights:
What's Next:
Now that we understand the LRU principle, the next page examines Recency-Based Decision Making in depth—how recency is precisely defined, how it relates to the concept of stack distance, and the mathematical models that predict LRU performance.
You now understand the foundational principle of LRU: use past reference behavior, specifically recency, to predict future needs. LRU's elegance lies in this simple insight, grounded in the empirical observation of temporal locality that pervades all program execution.