Lru Algorithm - Learning Module

Loading content...

0/227

Recency-Based Decision Making: The Mathematics of LRU

Precision in Page Replacement

On the previous page, we established that LRU evicts the "least recently used" page. But what exactly does "recently" mean? How do we measure it? And how does recency translate into practical eviction decisions?

This page dives deep into the mechanics of recency-based decision making. We'll formalize the concept with mathematical precision, explore the stack distance model that underlies LRU's theoretical analysis, and understand exactly how an LRU system determines which page to evict at any given moment.

These foundations are crucial—they will inform our understanding of the implementation challenges and approximation strategies that follow.

What You Will Learn

By the end of this page, you will understand: (1) formal definitions of recency, (2) the LRU stack model and stack distance, (3) how LRU uses recency to order pages, (4) the relationship between stack distance and page faults, and (5) how recency decisions propagate through the memory hierarchy.

Defining Recency Precisely

Recency in the context of page replacement has a precise definition. Let's formalize it.

Definition 1: Last Access Time

For a page p, let last_access(p) denote the logical timestamp of the most recent memory reference to page p. If p has never been accessed, last_access(p) = -∞.

Definition 2: Recency Value

The recency value of a page at logical time t is: $$recency(p, t) = t - last_access(p)$$

A larger recency value means the page is "older" (less recently used). A smaller value means the page is "fresher" (more recently used).

Definition 3: LRU Victim Selection

At time t, when eviction is necessary from memory set M: $$victim = \arg\max_{p \in M} recency(p, t) = \arg\min_{p \in M} last_access(p)$$

Recency Calculation Example (Current Time t = 100)
Page	Last Access Time	Recency Value	Interpretation
P₁	98	100 - 98 = 2	Very fresh (2 units ago)
P₂	95	100 - 95 = 5	Fresh (5 units ago)
P₃	80	100 - 80 = 20	Stale (20 units ago)
P₄	50	100 - 50 = 50	Old (50 units ago)
P₅	10	100 - 10 = 90	Most stale → LRU victim

Logical Time vs. Physical Time

LRU typically uses logical time—the count of memory references—rather than wall-clock time. A reference string like [A, B, C, A, D] has logical timestamps 1, 2, 3, 4, 5. This is independent of how long each access takes in real time. Logical time captures the sequence of program events, not durations.

The LRU Stack Model

The most elegant way to understand LRU behavior is through the LRU stack model, introduced by researchers in the 1970s. This model provides a unified view of recency ordering across all possible memory sizes.

The LRU Stack

Imagine an infinite stack containing all pages ever referenced. The stack is always ordered by recency:

Position 1 (top): The most recently accessed page
Position 2: The second most recently accessed page
Position k: The kth most recently accessed page
...
Position ∞ (bottom): Pages never accessed

Key Property: When a page is accessed, it moves to position 1 (top). All pages that were above it shift down by one position. Pages below it don't move.

lru_stack_visualization.txt
LRU Stack Behavior for Reference String: A, B, C, B, A, D, C
 
Initial: [empty]
 
After accessing A:    Stack: [A]                    A at position 1
After accessing B:    Stack: [B, A]                 B at top, A moved down
After accessing C:    Stack: [C, B, A]              C at top
After accessing B:    Stack: [B, C, A]              B moves to top, C shifts down
After accessing A:    Stack: [A, B, C]              A moves to top, B shifts down
After accessing D:    Stack: [D, A, B, C]           D at top
After accessing C:    Stack: [C, D, A, B]           C moves from pos 4 to pos 1
 
The stack always maintains the recency order:
- Position 1 = most recently used
- Higher positions = less recently used

The Inclusion Property

A crucial insight from the stack model: if you have n frames of memory, the pages in memory are exactly the top n pages of the LRU stack.

With 1 frame: memory contains pages at positions {1}
With 2 frames: memory contains pages at positions {1, 2}
With 3 frames: memory contains pages at positions {1, 2, 3}
With n frames: memory contains pages at positions {1, 2, ..., n}

This directly proves that LRU is a stack algorithm: $$S_n(t) \subset S_{n+1}(t)$$

The pages in memory with n frames are always a subset of pages in memory with n+1 frames. This is why Bélády's anomaly is impossible for LRU.

The Stack Model's Power

The stack model lets us analyze LRU behavior for ALL memory sizes simultaneously. Process the reference string once to build the stack, then for any memory size n, you can instantly determine hits/faults by checking stack positions.

Stack Distance: The Core Metric

Stack distance is the key metric for analyzing LRU performance. It connects recency to cache behavior.

Definition: Stack Distance

When page p is accessed at time t, its stack distance d(p, t) is its position in the LRU stack just before the access.

d = 1: Page was at top = most recently used before this access
d = ∞: Page was not in the stack = first access to this page
d = k: Page was at position k = (k-1) distinct pages accessed since last use

The Fundamental Theorem of LRU:

With n frames of memory, a page access results in a hit if and only if d ≤ n.

In other words:

Stack distance ≤ n → HIT (page is in top n positions = in memory)
Stack distance > n → FAULT (page is below position n = not in memory)

Stack Distance Analysis: Reference String [A, B, C, B, A, D, C, B, A]
Time	Reference	Stack Before	Stack Distance	Hit if n≥	Stack After
1	A	[empty]	∞ (first)	∞	[A]
2	B	[A]	∞ (first)	∞	[B, A]
3	C	[B, A]	∞ (first)	∞	[C, B, A]
4	B	[C, B, A]	2	2	[B, C, A]
5	A	[B, C, A]	3	3	[A, B, C]
6	D	[A, B, C]	∞ (first)	∞	[D, A, B, C]
7	C	[D, A, B, C]	4	4	[C, D, A, B]
8	B	[C, D, A, B]	4	4	[B, C, D, A]
9	A	[B, C, D, A]	4	4	[A, B, C, D]

Reading the Analysis:

For 9 accesses with various memory sizes:

Memory Size	Hits (stack dist ≤ n)	Faults	Hit Rate
n = 1	0	9	0%
n = 2	1 (t=4)	8	11%
n = 3	2 (t=4,5)	7	22%
n = 4	5 (t=4,5,7,8,9)	4	56%
n = 5+	5	4	56%

Notice: more memory never increases faults (stack algorithm property). Each additional frame captures references at exactly one additional stack distance.

Stack Distance Distribution

Real programs have characteristic stack distance distributions. Programs with strong locality have many references with small stack distances. The stack distance histogram reveals how much memory a program needs—if 90% of references have stack distance ≤ 100, then 100 frames will achieve 90% hit rate.

The Recency List: Practical Representation

While the LRU stack is a theoretical model, a practical implementation typically uses a recency list—a doubly linked list of pages ordered by recency.

Data Structure: Doubly Linked List + Hash Table

lru_recency_list.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// LRU Recency List Data Structure
 
typedef struct PageNode {
    int page_number;
    struct PageNode* prev;  // More recently used
    struct PageNode* next;  // Less recently used
} PageNode;
 
typedef struct LRUCache {
    PageNode* head;         // Most recently used (MRU)
    PageNode* tail;         // Least recently used (LRU) - eviction candidate
    HashMap* page_map;      // page_number -> PageNode* for O(1) lookup
    int capacity;           // Maximum number of frames
    int size;               // Current number of pages in memory
} LRUCache;
 
/*
 * List Organization:
 *
 *   head (MRU)                                      tail (LRU)
 *      ↓                                               ↓
 *   [Page A] ←→ [Page B] ←→ [Page C] ←→ [Page D] ←→ [Page E]
 *      ↑                                               ↑
 *   Most recent                                   Evict this one
 *   access                                        (oldest access)
 */

Operations on the Recency List:

1. Access a page already in memory (Hit):

Find the node in O(1) using hash map
Unlink from current position
Insert at head (MRU position)
O(1) overall

2. Access a page not in memory (Fault):

If cache is full: remove tail (LRU victim)
Create new node
Insert at head (MRU position)
Update hash map
O(1) overall

3. Eviction:

Always evict from tail
Tail is the least recently used page
O(1) operation

lru_operations.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// Move page to MRU position (head)
void move_to_head(LRUCache* cache, PageNode* node) {
    // Unlink from current position
    if (node->prev) node->prev->next = node->next;
    if (node->next) node->next->prev = node->prev;
    
    // Update tail if we're moving the tail
    if (cache->tail == node) cache->tail = node->prev;
    
    // Insert at head
    node->prev = NULL;
    node->next = cache->head;
    if (cache->head) cache->head->prev = node;
    cache->head = node;
    
    // Update tail if list was empty
    if (!cache->tail) cache->tail = node;
}
 
// Evict LRU page (tail)
int evict_lru(LRUCache* cache) {
    if (!cache->tail) return -1;  // Empty cache
    
    PageNode* victim = cache->tail;
    int evicted_page = victim->page_number;
    
    // Update tail pointer
    cache->tail = victim->prev;
    if (cache->tail) cache->tail->next = NULL;
    else cache->head = NULL;  // Cache now empty
    
    // Remove from hash map and free
    hashmap_remove(cache->page_map, evicted_page);
    free(victim);
    cache->size--;
    
    return evicted_page;
}
 
// Access a page (returns true if hit, false if fault)
bool access_page(LRUCache* cache, int page_number) {
    PageNode* node = hashmap_get(cache->page_map, page_number);
    
    if (node) {
        // HIT: Move to head (update recency)
        move_to_head(cache, node);
        return true;
    }
    
    // FAULT: Need to bring page into memory
    if (cache->size >= cache->capacity) {
        evict_lru(cache);  // Evict least recently used
    }
    
    // Create new node at head
    PageNode* new_node = malloc(sizeof(PageNode));
    new_node->page_number = page_number;
    move_to_head(cache, new_node);
    hashmap_put(cache->page_map, page_number, new_node);
    cache->size++;
    
    return false;
}

The Hash Table is Crucial

Without the hash table, finding a page in the list would take O(n) time, making every access linear in the number of pages. The hash map provides O(1) lookup, and all list manipulations are O(1) with doubly linked lists. The combination achieves O(1) for all operations.

The Recency Update Problem

The recency list gives us O(1) operations, but there's a deeper problem: who performs these updates, and when?

The Challenge:

Every memory reference—every instruction fetch, every data load, every store—is a reference that should update the recency ordering. In a system running at 3 GHz:

~1-3 billion instructions per second
Each instruction = at least 1 memory reference (instruction fetch)
Many instructions = 2+ references (instruction + data)
Billions of recency updates per second required

Even with O(1) operations, performing software updates for every memory reference would catastrophically slow the system.

Recency Update Overhead Analysis
Scenario	Updates/Second	Overhead/Update	Total Overhead	CPU Time Left
Software LRU tracking	3 billion	50 cycles	150 billion cycles	0% (impossible)
Hardware TLB only	3 billion	N/A	N/A	Would need per-access HW
Periodic sampling (1ms)	1 thousand	50 cycles	50K cycles	~99.998%
Page fault time only	~10K (depends)	50 cycles	500K cycles	~99.98%

The Two Fundamental Approaches:

1. True LRU with Hardware Assistance

Some (older) systems had special hardware
Each memory access updates a hardware counter or register
The OS reads this information when needed
Very expensive in hardware; rarely done today

2. Approximate LRU with Periodic Sampling

Hardware provides minimal assistance: reference bit in page table
OS periodically samples reference bits to approximate recency
Accept imperfect ordering in exchange for practical overhead
This is what all modern systems do

The next two pages explore exact implementations (counters and stacks), followed by the practical approximations used in real operating systems.

The Fundamental Trade-off

Perfect LRU requires tracking every memory access. Practical systems can't do this in software. The choice is: (1) expensive hardware for true LRU, (2) approximate LRU with acceptable overhead, or (3) simpler algorithms like CLOCK. Real systems choose approximation.

Recency Across the Memory Hierarchy

Recency-based replacement isn't unique to page replacement—it appears throughout the memory hierarchy. Understanding this reveals a unifying principle.

LRU at Every Level:

Recency-Based Replacement Across Memory Hierarchy
Level	Unit	Replacement Policy	Who Updates Recency?	Time Scale
CPU Registers	Individual registers	Compiler allocation	Compiler (compile time)	N/A
L1 Cache	Cache lines (64B)	LRU or pseudo-LRU	Hardware (every access)	Nanoseconds
L2/L3 Cache	Cache lines (64B)	Pseudo-LRU/SRRIP	Hardware (every access)	Nanoseconds
TLB	Page table entries	LRU or FIFO	Hardware (every translation)	Nanoseconds
Page Cache	Pages (4KB)	LRU approximation	OS + hardware bits	Milliseconds
Disk Buffer	Disk blocks	LRU variants	Disk controller	Milliseconds

Key Insight: Hardware Implements LRU for Caches

CPU caches can afford true (or near-true) LRU because:

They're small (typically ≤16 ways per set)
Hardware can maintain state in dedicated circuits
Updates are in the critical path but take only nanoseconds

Why the OS Can't Do This for Pages:

Page tables have thousands to millions of entries
Software updates would take microseconds (vs. hardware nanoseconds)
Memory references would slow by 1000x

The page cache must use approximations while caches use near-exact LRU implemented in silicon.

The TLB Connection

The TLB (Translation Lookaside Buffer) uses LRU-like replacement for its entries. When the OS page replacement evicts a page, its TLB entry must also be invalidated. TLB misses after poor page replacement decisions compound the performance impact.

Recency vs. Frequency: A Critical Distinction

LRU uses recency—when was the page last used? An alternative approach uses frequency—how often is the page used? Understanding this distinction is crucial for appreciating LRU's strengths and weaknesses.

Comparison:

LRU (Recency-Based)

•Question: When was it last used?
•Good for: Temporal locality
•Adapts: Quickly to pattern changes
•Weakness: One-shot pollutes cache
•History: Recent access only

LFU (Frequency-Based)

•Question: How often is it used?
•Good for: Popular pages
•Adapts: Slowly to pattern changes
•Weakness: Stale popular pages persist
•History: Cumulative count

Scenario: LRU Wins

A program runs phase A (using pages 1-10) for 1 million accesses, then switches to phase B (using pages 11-20). With LFU, pages 1-10 have massive frequency counts and will never be evicted, even though the program has moved on. LRU immediately adapts—pages 11-20 become most recent as they're accessed.

Scenario: LFU Wins

A popular page is accessed frequently but happened to be idle for a brief moment when a burst of one-time accesses occurs. LRU might evict the popular page. LFU would keep it due to its high frequency count.

Modern Hybrid: ARC and LIRS

Advanced algorithms like ARC (Adaptive Replacement Cache) and LIRS (Low Inter-reference Recency Set) combine both metrics, maintaining separate lists for recency and frequency, and adaptively balancing between them based on workload characteristics.

Why LRU Usually Wins

For most workloads, LRU's quick adaptation to changing patterns is more valuable than LFU's historical memory. Phase changes in programs are common, and LRU handles them gracefully. This is why LRU (and its approximations) dominate in practice.

Formal Analysis of LRU Performance

Competitive Analysis

How does LRU compare to the optimal algorithm? This question is answered by competitive analysis.

Theorem (LRU Competitive Ratio):

For any reference string of length n over a page set of size N, with k frames of memory: $$\text{Faults}{LRU} \leq k \cdot \text{Faults}{OPT}$$

LRU is k-competitive: its fault count is at most k times the optimal. This sounds bad (k could be large), but it's actually a strong guarantee—it holds for any reference string, including adversarial ones.

In practice, LRU is much closer to optimal because real programs have locality. The competitive ratio is a worst-case bound that rarely occurs.

LRU vs. OPT on Real Workloads (Typical Results)
Workload Type	LRU Faults	OPT Faults	Ratio	Notes
Compiler (GCC)	12,450	10,200	1.22	Strong locality
Database (OLTP)	8,320	7,050	1.18	Working set behavior
Web server	15,600	13,400	1.16	Good locality
Sequential scan	100,000	10,000	10.0	Pathological case
Scientific (array)	45,000	42,000	1.07	Excellent locality

The Stack Distance Model for Prediction

Given a program's stack distance distribution D, where D(d) is the probability of a reference having stack distance d, the expected hit rate with n frames is:

$$\text{Hit Rate}(n) = \sum_{d=1}^{n} D(d)$$

This formula lets us predict LRU performance for any memory size from a single stack distance profile. It's the foundation of cache simulation techniques used in performance analysis.

The Power of Stack Distance Profiling

By measuring a program's stack distance distribution once, you can predict its LRU performance for any memory size. This technique is used by architects to size caches, by researchers to evaluate new algorithms, and by systems engineers to capacity plan memory allocation.

Summary: Recency-Based Decision Making

We've explored the precise mechanics of how LRU makes eviction decisions based on recency. Let's consolidate:

Key Takeaways

•Recency is precisely defined — It's the difference between current time and last access time. The page with maximum recency (oldest access) is evicted.
•The LRU stack model orders all pages by recency — Position 1 is most recent; position ∞ is never accessed. With n frames, memory contains positions 1 through n.
•Stack distance determines hit/fault — An access with stack distance d is a hit if d ≤ n (the memory size). This elegant rule unifies LRU analysis.
•The recency list enables O(1) LRU — A doubly linked list with hash table lookup gives O(1) access, move-to-head, and tail eviction.
•Updating recency for every access is impractical — Billions of references per second can't be tracked in software; approximations are necessary.
•Recency-based replacement appears throughout the memory hierarchy — From CPU caches to disk buffers, recency is the universal heuristic.
•LRU uses recency, LFU uses frequency — Recency adapts faster to phase changes; frequency remembers long-term popularity. LRU dominates in practice.

What's Next:

We've established how recency-based decisions work in theory. The next page explores the implementation challenges of LRU—specifically why maintaining exact recency ordering is prohibitively expensive, and the hardware/software trade-offs involved.

Page Complete

You now understand the mathematical foundations of recency-based page replacement: the formal definitions, the stack model, stack distance analysis, and the recency list data structure. This precision is essential for understanding both the elegance of LRU and the challenges of implementing it.