Lru Algorithm - Learning Module

Loading content...

0/227

Least Recently Used: The Foundation of Practical Page Replacement

The Principle Behind Modern Memory Management

When the FIFO algorithm disappointed system designers with Bélády's anomaly, and the optimal algorithm remained forever out of reach, the search turned to something more practical: an algorithm that could approximate optimal behavior using information available to the operating system. The answer came from a fundamental insight about how programs actually use memory—the principle of temporal locality.

Least Recently Used (LRU) emerged as the gold standard of practical page replacement. Its premise is deceptively simple: if a page hasn't been accessed in a long time, it probably won't be accessed soon. This single assumption, grounded in decades of program behavior analysis, has proven remarkably effective across virtually all workload types.

What You Will Learn

By the end of this page, you will understand: (1) the theoretical basis of LRU in temporal locality, (2) why LRU approximates the optimal algorithm, (3) the mathematical formalization of recency, (4) LRU's behavior with reference strings, and (5) why LRU is considered the practical benchmark for page replacement.

The Problem FIFO Couldn't Solve

Before diving into LRU, we must understand the limitation that made it necessary. FIFO (First-In, First-Out) seemed intuitive—replace the oldest page in memory. But FIFO has a fatal flaw: it ignores what happened after a page arrived.

Consider this scenario:

Page A arrives at time 0 and is accessed constantly (hot page)
Page B arrives at time 1 and is never touched again (cold page)
Page C arrives at time 2 and is used occasionally

Under FIFO, when we need to evict a page, we replace A—the oldest—even though it's still actively in use. Meanwhile, B (completely unused) stays in memory simply because it arrived later. This is catastrophically wrong from a performance perspective.

FIFO's Fundamental Flaw
Page	Arrival Time	Last Access	Access Frequency	FIFO Priority	Actual Value
Page A	t=0	t=100 (now)	High (active)	Evict first ❌	Keep (critical)
Page B	t=1	t=1 (never again)	Zero	Keep second ❌	Evict (useless)
Page C	t=2	t=50	Medium	Keep last	Consider

The insight: arrival time tells us nothing about future usefulness. What matters is recent access patterns—the recency of use. A page that was just accessed is likely to be accessed again soon. A page that hasn't been touched in a long time is probably no longer needed.

This observation, rooted in empirical studies of program behavior spanning decades, is called the principle of temporal locality.

Temporal Locality: The Foundation

Temporal locality is one of the two fundamental principles of program behavior (the other being spatial locality). It states:

If a memory location is accessed at time t, there is a high probability it will be accessed again at time t + Δ for small values of Δ.

In simpler terms: recently used memory will likely be used again soon.

This isn't just a theoretical assumption—it's an empirical observation that holds across virtually all program types:

Why Programs Exhibit Temporal Locality

•Loop Bodies — Programs spend most execution time in loops. Loop body instructions and data are accessed repeatedly over short time intervals. A for loop iterating 1000 times accesses the same code 1000 times consecutively.
•Stack Frames — Function call stacks exhibit strong temporal locality. Local variables are created, used intensively, then abandoned when the function returns. During execution, they're accessed repeatedly.
•Object-Oriented Access — When working with objects, programs typically perform multiple operations on the same object before moving to another. Object fields are accessed clustered in time.
•Data Structure Operations — Algorithms typically work on small portions of data structures at a time. Tree traversals repeatedly access parent-child relationships. Hash lookups concentrate on active buckets.
•Hot Paths — All programs have 'hot' code paths that execute frequently and 'cold' paths that rarely run. The 90-10 rule suggests 90% of execution time is spent in 10% of the code.

The 90-10 Rule in Memory Access

Empirical studies consistently show that at any moment, a program actively uses a small 'working set' of pages. Peter Denning's seminal work demonstrated that this working set size remains stable during phases of execution, and sudden changes indicate phase transitions. This working set behavior is a direct consequence of temporal locality.

The mathematical foundation of temporal locality gives rise to the stack distance model. For each memory access, we can measure how many distinct pages have been accessed since this page was last used. Pages with small stack distances (recently used) are likely to be used again; pages with large stack distances (long-unused) are likely cold.

LRU directly implements this stack distance intuition: it always evicts the page with the largest stack distance—the one accessed longest ago.

LRU: Definition and Principle

Least Recently Used (LRU) is a page replacement algorithm that, when a page fault occurs and no free frames are available, selects the page that has not been accessed for the longest time—the page whose most recent access is furthest in the past.

Formal Definition:

Let M be the set of pages currently in memory frames, and let last_access(p) denote the time of the most recent reference to page p. When a page replacement is necessary:

$$\text{victim} = \arg\min_{p \in M} last_access(p)$$

In words: choose the page p in memory M whose last_access time is minimal (oldest).

The Core Assumption:

LRU makes a single assumption: the past predicts the future. Specifically, recent access patterns will continue into the near future. A page not used for a long time is assumed to not be needed soon.

LRU Decision Making
Page in Memory	Last Access Time	Time Since Access	LRU Priority
Page P₁	t = 1000 (current)	0 time units	Keep (most recently used)
Page P₂	t = 995	5 time units	Keep (recently used)
Page P₃	t = 980	20 time units	Consider for eviction
Page P₄	t = 900	100 time units	Strong eviction candidate
Page P₅	t = 500	500 time units	EVICT (least recently used)

LRU vs. LFU: A Common Confusion

LRU (Least Recently Used) considers recency—when was the page last accessed? LFU (Least Frequently Used) considers frequency—how many times has the page been accessed? These are different metrics. A page accessed 1000 times in the past but not at all recently might be evicted under LRU but kept under LFU. Each has different strengths for different workloads.

LRU as an Approximation of Optimal

The Optimal (OPT/MIN) algorithm achieves the minimum possible page faults by evicting the page that won't be used for the longest time in the future. This requires knowing the future—impossible in practice.

LRU uses the past to approximate the future. This is the key insight:

Optimal Algorithm	LRU Algorithm
Looks at future references	Looks at past references
Evicts page used farthest in future	Evicts page used farthest in past
Perfect decisions	Decisions based on locality assumption
Impossible to implement	Implementable (with challenges)

Why does this approximation work? Because of temporal locality: if a page was used recently, it will likely be used soon. If a page hasn't been used for a long time, it likely won't be used soon. The past reflects the future, asymmetrically.

The Looking Glass Property

Imagine the reference string as a timeline. OPT looks forward from the current moment; LRU looks backward. If memory access patterns are symmetric around the present (which they approximately are, due to locality), both strategies make similar decisions. This 'mirror property' explains why LRU performs close to optimal for most real workloads.

Mathematical Perspective: Stack Algorithms

LRU belongs to a class called stack algorithms. A page replacement algorithm is a stack algorithm if, for a reference string, the set of pages in memory with n frames is always a subset of the pages in memory with n+1 frames.

Formally, if S_n(t) is the set of pages in memory at time t with n frames:

$$S_n(t) \subseteq S_{n+1}(t) \text{ for all } t$$

This has a crucial implication: stack algorithms never exhibit Bélády's anomaly. More frames always mean fewer or equal page faults—never more.

Both OPT and LRU are stack algorithms. FIFO is not. This mathematical property is why LRU is fundamentally superior to FIFO for general workloads.

Stack Algorithms (No Anomaly)

•OPT — Optimal algorithm
•LRU — Least Recently Used
•LRU approximations (Clock, etc.)
•More frames = fewer/equal faults
•Predictable performance scaling

Non-Stack (May Have Anomaly)

•FIFO — First In First Out
•Random replacement
•Some other heuristics
•More frames may increase faults
•Unpredictable behavior

LRU in Action: Reference String Analysis

Let's trace through LRU with a concrete example. Consider the reference string:

7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1

With 3 frames available, let's track each reference, noting LRU order (leftmost = least recently used, must be evicted next).

LRU Page Replacement Trace (3 Frames)
Reference	Frame 0	Frame 1	Frame 2	LRU Order	Fault?
7	7			[7]	Yes ✗
0	7	0		[7,0]	Yes ✗
1	7	0	1	[7,0,1]	Yes ✗
2	2	0	1	[0,1,2]	Yes ✗ (evict 7, LRU)
0	2	0	1	[1,2,0]	No (0 in memory, update recency)
3	2	0	3	[2,0,3]	Yes ✗ (evict 1, LRU)
0	2	0	3	[2,3,0]	No (update 0's recency)
4	4	0	3	[3,0,4]	Yes ✗ (evict 2, LRU)
2	4	0	2	[0,4,2]	Yes ✗ (evict 3, LRU)
3	4	3	2	[4,2,3]	Yes ✗ (evict 0, LRU)
0	0	3	2	[2,3,0]	Yes ✗ (evict 4, LRU)
3	0	3	2	[2,0,3]	No (update 3's recency)
2	0	3	2	[0,3,2]	No (update 2's recency)
1	0	3	1	[0,3,1]	Yes ✗ (evict 2, LRU)
2	0	2	1	[0,1,2]	Yes ✗ (evict 3, LRU)
0	0	2	1	[1,2,0]	No (update 0's recency)
1	0	2	1	[2,0,1]	No (update 1's recency)
7	0	2	7	[0,1,7]	Yes ✗ (evict 2, LRU)
0	0	2→0	7	[1,7,0]	No (update 0's recency)
1	0	0→1	7	[7,0,1]	No (update 1's recency)

Analysis Results:

Total references: 20
Page faults: 12
Hit rate: 8/20 = 40%
Fault rate: 12/20 = 60%

Comparison with other algorithms on the same string (3 frames):

Algorithm	Page Faults	Fault Rate
OPT	9	45%
LRU	12	60%
FIFO	15	75%

LRU provides 3 fewer faults than FIFO (20% improvement) and is only 3 faults more than optimal. For this workload, LRU is a good approximation of optimal.

Tracking LRU Order

The key to LRU is maintaining the recency order. Every time a page is accessed (hit or fault), it moves to the 'most recently used' position. The page at the 'least recently used' position is the eviction candidate. This ordering changes with every memory reference, which is why LRU is expensive to implement exactly.

When LRU Fails: Breaking the Locality Assumption

LRU is not perfect. Its effectiveness depends on programs exhibiting temporal locality. When this assumption breaks, LRU can perform poorly—sometimes as bad as random replacement or worse.

Pathological Case 1: Sequential Scans Larger Than Memory

sequential_scan_problem.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// A database scans a 5GB table with only 4GB memory
// The scan accesses pages: 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ...
 
// With 4 pages of memory and 5-page working set:
// Access 1: [1, _, _, _] - FAULT
// Access 2: [1, 2, _, _] - FAULT
// Access 3: [1, 2, 3, _] - FAULT
// Access 4: [1, 2, 3, 4] - FAULT
// Access 5: [5, 2, 3, 4] - FAULT (evict 1, LRU)
// Access 1: [5, 1, 3, 4] - FAULT (evict 2, LRU) - 1 was JUST evicted!
// Access 2: [5, 1, 2, 4] - FAULT (evict 3, LRU) - 2 was JUST evicted!
// ... every single access is a fault!
 
// LRU achieves 0% hit rate on this sequential scan
// This is called "LRU pollution" or "scan resistance failure"

Pathological Case 2: Loop Larger Than Memory

A loop that cycles through more pages than available frames causes every single access to be a fault under LRU. The "least recently used" page is always the one we're about to need.

Workloads Where LRU Struggles

•Sequential scans — Full table scans in databases, video playback, log processing. Each page is used once, evicted, and never touched again until the next pass.
•Cyclic access patterns — Loops larger than memory. The cycle size exceeds available frames, causing continuous eviction of needed pages.
•One-shot accesses — Pages used once and never again (initialization, configuration loading). They push out genuinely hot pages.
•Adversarial workloads — Carefully crafted access patterns that maximize faults. In adversarial models, LRU can be forced to optimal worst-case behavior.

Modern Solutions to LRU Weaknesses

Real systems address these weaknesses with algorithms like ARC (Adaptive Replacement Cache), 2Q, LIRS, and CLOCK-Pro. These maintain multiple lists or use frequency information alongside recency. Linux's page cache uses a variant called 'Multi-Generation LRU' (MGLRU) that addresses scan resistance by distinguishing between generations of page ages.

The Implementation Challenge: Why Pure LRU is Expensive

LRU's elegance as an algorithm contrasts sharply with the difficulty of implementing it efficiently. The core challenge:

Every memory access must update the recency ordering.

Consider a modern system:

CPU runs at ~3 GHz (3 billion cycles/second)
Each instruction may access memory
Memory accesses can occur billions of times per second

For each of these accesses, a pure LRU implementation must:

Check if the page is in memory
If yes, update its position to "most recently used"
This update must be atomic and correct

The Cost Analysis:

LRU Implementation Approaches and Their Costs
Approach	Update Cost	Search Cost	Space Cost	Practical?
Linked List	O(n) to find + O(1) to move	O(n)	O(n) pointers	No (too slow)
Counter per Page	O(1) increment	O(n) to find min	O(n) counters	Maybe (search slow)
Clock/Hardware Counter	O(1) reference	N/A	O(1)	Only if HW supports
Stack (software)	O(n) manipulation	O(1) bottom	O(n)	No (too slow)
Hardware TLB Aging	O(1)	O(n) for MMU	Embedded	Impractical

Why can't we just update on every reference?

At billions of memory references per second:

Software updates add 10-100ns overhead per access
This would slow the CPU by 10-100x
The memory management overhead would dwarf actual computation

The practical reality:

No modern operating system implements pure LRU. Instead, they use approximations that:

Update recency information periodically (not every access)
Use hardware assistance (reference bits)
Accept imperfect ordering in exchange for speed

The subsequent pages in this module explore the exact implementations: counters, stacks, and ultimately the practical approximations that real systems use.

A Preview: The Reference Bit

Most architectures include a 'reference bit' in each page table entry. The MMU sets this bit automatically when a page is accessed. The OS can periodically sample these bits to approximate recency. This hardware assist is what makes LRU-like behavior practical. We'll explore this in detail when we discuss the Clock algorithm and LRU approximations.

Historical Context and Significance

The Origins of LRU

LRU emerged in the 1960s as researchers at IBM, MIT, and other institutions grappled with the newly invented concept of virtual memory. The algorithm was formalized in the context of the Atlas computer (1962) and later refined through work on the Multics and IBM System/360 systems.

Key milestones:

1962: Manchester Atlas demonstrates automatic paging
1966: Bélády publishes the OPT algorithm analysis
1968: Denning introduces the Working Set model, providing theoretical grounding for LRU's effectiveness
1969: MULTICS implements sophisticated page replacement
1970s: Stack algorithm classification proves LRU's advantageous properties

Why LRU Became the Benchmark

•Theoretical Foundation — LRU is provably optimal among algorithms that only consider the past. It's the best you can do without knowing the future.
•No Bélády's Anomaly — As a stack algorithm, LRU guarantees monotonic improvement with more memory. This is essential for predictable system behavior.
•Approximability — While exact LRU is expensive, it can be approximated efficiently with minimal performance loss. This makes it practical.
•Universal Applicability — LRU works well across diverse workload types due to the universal nature of temporal locality in programs.
•Benchmark Status — New page replacement algorithms are always compared against LRU. Improvements are measured as 'X% fewer faults than LRU.'

LRU Today

Every major operating system uses LRU or LRU approximations: Linux (Multi-Generation LRU), Windows (working set manager with LRU-like aging), macOS (unified buffer cache with LRU), and all major database systems (buffer pool managers). The principle has stood for 60+ years as the foundation of practical memory management.

Summary: The Least Recently Used Principle

We've established the foundational understanding of LRU. Let's consolidate the key insights:

Key Takeaways

•LRU evicts the least recently used page — The page whose last access is furthest in the past is chosen for replacement.
•Temporal locality justifies LRU — Programs tend to reuse recently accessed data, making past access patterns good predictors of future needs.
•LRU approximates the optimal algorithm — By using past recency to predict future use, LRU performs close to the theoretically optimal OPT algorithm.
•LRU is a stack algorithm — It never exhibits Bélády's anomaly; more frames always yield equal or fewer page faults.
•Exact LRU is expensive — Every memory reference would need to update recency ordering, incurring prohibitive overhead.
•LRU can fail on certain workloads — Sequential scans, cyclic patterns larger than memory, and other locality-breaking patterns defeat LRU's assumptions.
•Real systems use LRU approximations — Hardware reference bits and periodic sampling enable efficient LRU-like behavior.

What's Next:

Now that we understand the LRU principle, the next page examines Recency-Based Decision Making in depth—how recency is precisely defined, how it relates to the concept of stack distance, and the mathematical models that predict LRU performance.

Page Complete

You now understand the foundational principle of LRU: use past reference behavior, specifically recency, to predict future needs. LRU's elegance lies in this simple insight, grounded in the empirical observation of temporal locality that pervades all program execution.