Loading content...
The counter implementation associates a number with each page to encode recency. There's another approach: maintain an explicit ordering of pages. Instead of asking "what's this page's timestamp?", we ask "what's this page's position in the recency order?"
This stack implementation (also called the stack model or recency list) arranges all resident pages in a linear structure where position encodes recency: top = most recently used, bottom = least recently used. Finding the LRU victim is trivial (it's at the bottom), and the challenge shifts to efficiently maintaining the ordering as pages are accessed.
The stack approach is the theoretical foundation for analyzing all LRU variants and underlies practical implementations like the Linux active/inactive list system. Understanding it deeply is essential for mastering page replacement.
By the end of this page, you will understand: (1) the stack data structure for LRU, (2) the move-to-top operation and its cost, (3) hardware stack implementations, (4) software stack optimizations, (5) doubly linked list with hash map implementation, (6) stack vs. counter trade-offs, and (7) how the stack model informs practical algorithms.
In the stack implementation, we maintain a linear structure (conceptually a stack) of all pages, ordered by recency:
Definition: LRU Stack
An LRU stack S for n pages is an ordered sequence (p₁, p₂, ..., pₙ) where:
The Update Rule:
When page p is accessed:
LRU STACK VISUALIZATION======================= Stack before access to page C: Position 1 (MRU): [A] ← Most recently usedPosition 2: [B]Position 3: [C] ← This is accessedPosition 4: [D]Position 5 (LRU): [E] ← Eviction candidate Stack after access to page C (move-to-top): Position 1 (MRU): [C] ← Now most recentPosition 2: [A] ← Shifted downPosition 3: [B] ← Shifted downPosition 4: [D] (unchanged)Position 5 (LRU): [E] (unchanged) Key observations:- C moved from position 3 to position 1- A and B (previously above C) shifted down by 1- D and E (previously below C) didn't move- Eviction still happens at position 5It's called a stack because accessing a page is like pushing it onto the top. Unlike a true stack (LIFO), we can access elements in the middle and move them to the top. A more accurate name might be "move-to-front list" or "recency list", but "LRU stack" is the historical term from theoretical analysis.
The critical operation is move-to-top (also called move-to-front). Let's analyze its complexity with different data structures.
Array Implementation:
12345678910111213141516171819202122232425262728293031323334353637
// LRU Stack using Array (INEFFICIENT) int lru_stack[MAX_PAGES]; // stack[0] = MRU, stack[n-1] = LRUint stack_size; void move_to_top_array(int page) { // Step 1: Find page's current position - O(n) int pos = -1; for (int i = 0; i < stack_size; i++) { if (lru_stack[i] == page) { pos = i; break; } } if (pos == -1) { // Page not in stack - new entry or page fault // Shift everything down - O(n) for (int i = stack_size; i > 0; i--) { lru_stack[i] = lru_stack[i-1]; } lru_stack[0] = page; if (stack_size < MAX_PAGES) stack_size++; } else if (pos > 0) { // Page in stack at position pos - move to top // Shift positions 0..pos-1 down - O(pos) = O(n) worst case int temp = lru_stack[pos]; for (int i = pos; i > 0; i--) { lru_stack[i] = lru_stack[i-1]; } lru_stack[0] = temp; } // If pos == 0, page is already at top, nothing to do} // Time complexity: O(n) for every access// Space complexity: O(n) for the arrayArray implementation is O(n) per access due to:
Can we do better?
| Data Structure | Find Page | Remove | Insert at Top | Total | Notes |
|---|---|---|---|---|---|
| Unsorted Array | O(n) | O(n) | O(n) | O(n) | All operations require shifting |
| Singly Linked List | O(n) | O(1)* | O(1) | O(n) | *Still need to find predecessor |
| Doubly Linked List | O(n) | O(1) | O(1) | O(n) | Fast remove, but must find first |
| DLL + Hash Map | O(1) | O(1) | O(1) | O(1) | Hash map provides instant lookup |
With a doubly linked list and hash table combination, all LRU operations become O(1): finding the victim (tail), moving an accessed page to front, and inserting new pages. This is the theoretically optimal data structure for software LRU.
The canonical software LRU implementation combines a doubly linked list for O(1) insertion/removal with a hash map for O(1) lookup.
Data Structure Design:
123456789101112131415161718192021222324252627
// OPTIMAL LRU STACK: Doubly Linked List + Hash Map typedef struct Node { int page_number; struct Node* prev; // Toward MRU struct Node* next; // Toward LRU} Node; typedef struct LRUStack { Node* head; // MRU (front) Node* tail; // LRU (back) - eviction candidate HashMap* lookup; // page_number -> Node* int size; int capacity;} LRUStack; /* * Structure visualization: * * head (MRU) tail (LRU) * ↓ ↓ * NULL ← [A] ↔ [B] ↔ [C] ↔ [D] ↔ [E] → NULL * ↑ ↑ * Most recent Evict this * * HashMap: { A->NodeA, B->NodeB, C->NodeC, D->NodeD, E->NodeE } */Core Operations:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
// === MOVE TO FRONT (on page access) ===void move_to_front(LRUStack* stack, Node* node) { if (node == stack->head) return; // Already at front // Unlink from current position if (node->prev) node->prev->next = node->next; if (node->next) node->next->prev = node->prev; if (node == stack->tail) stack->tail = node->prev; // Insert at front node->prev = NULL; node->next = stack->head; if (stack->head) stack->head->prev = node; stack->head = node; if (!stack->tail) stack->tail = node;} // === ACCESS PAGE (returns true if hit, false if miss) ===bool access_page(LRUStack* stack, int page) { Node* node = hashmap_get(stack->lookup, page); // O(1) if (node) { // HIT: Move to front move_to_front(stack, node); // O(1) return true; } // MISS: Need to bring in new page if (stack->size >= stack->capacity) { // Evict LRU (tail) Node* victim = stack->tail; int evicted = victim->page_number; stack->tail = victim->prev; if (stack->tail) stack->tail->next = NULL; else stack->head = NULL; hashmap_remove(stack->lookup, evicted); // O(1) free(victim); stack->size--; } // Insert new page at front Node* new_node = malloc(sizeof(Node)); new_node->page_number = page; new_node->prev = NULL; new_node->next = stack->head; if (stack->head) stack->head->prev = new_node; stack->head = new_node; if (!stack->tail) stack->tail = new_node; hashmap_put(stack->lookup, page, new_node); // O(1) stack->size++; return false;} // === GET LRU VICTIM ===int get_lru_victim(LRUStack* stack) { return stack->tail ? stack->tail->page_number : -1; // O(1)}| Operation | Time | Space | Notes |
|---|---|---|---|
| Access (hit) | O(1) | Hash lookup + pointer manipulation | |
| Access (miss) | O(1) | O(1) | May evict + insert new node |
| Find LRU victim | O(1) | Just return tail | |
| Total space | O(n) | n nodes + hash table |
Doubly linked list + hash map is the theoretically optimal software implementation: O(1) time for all operations, O(n) space. This is used in caching libraries (LRUCache in Python, LinkedHashMap in Java) and application-level caches. The problem remains: we can't call access_page() on every memory reference.
For very small sets (like CPU cache ways), hardware can implement true LRU using specialized circuits. Let's examine how.
The N×N Matrix Method:
For N entries, maintain an N×N bit matrix. Row i, column j indicates whether entry i was accessed more recently than entry j.
MATRIX-BASED HARDWARE LRU (for 4-way cache)============================================= Maintain a 4x4 bit matrix M where: M[i][j] = 1 if entry i was accessed more recently than entry j M[i][j] = 0 if entry j was accessed more recently than entry i The matrix is always anti-symmetric: M[i][j] = !M[j][i] Initial state (all entries equally old - use any diagonal): Entry 0 Entry 1 Entry 2 Entry 3 0 X 1 1 1 1 0 X 1 1 2 0 0 X 1 3 0 0 0 X LRU = row with all 0s (except diagonal) = Entry 3 After access to Entry 2: Set entire row 2 to 1s (2 is more recent than everyone) Set entire column 2 to 0s (everyone is older than 2) Entry 0 Entry 1 Entry 2 Entry 3 0 X 1 0 1 1 0 X 0 1 2 1 1 X 1 ← All 1s (most recent) 3 0 0 0 X ← All 0s (LRU victim) LRU = Entry 3 (row with all zeros) Hardware cost:- N² - N bits storage (diagonal doesn't count)- For 4-way: 12 bits = trivial- For 8-way: 56 bits = acceptable- For 16-way: 240 bits = getting expensive- For 64 entries: 4032 bits = impracticalThe log₂(N!) Bit Vector Method:
With N entries, there are N! possible orderings. To encode any ordering, we need ⌈log₂(N!)⌉ bits.
For 4 entries: log₂(24) ≈ 4.6 → 5 bits For 8 entries: log₂(40320) ≈ 15.3 → 16 bits
This is theoretically minimal but requires complex encoding/decoding circuits.
The Pseudo-LRU Alternative:
Most CPUs use pseudo-LRU instead of true LRU for caches:
TREE-BASED PSEUDO-LRU (PLR U) for 4-way:========================================= Use 3 bits arranged as a binary tree: b0 / \ b1 b2 / \ / \ E0 E1 E2 E3 Bit interpretation:- b0=0: LRU is in left subtree (E0 or E1)- b0=1: LRU is in right subtree (E2 or E3) - b1=0: among E0,E1, LRU is E0- b1=1: among E0,E1, LRU is E1- b2=0: among E2,E3, LRU is E2- b2=1: among E2,E3, LRU is E3 To find LRU: follow bits down tree If b0=0, b1=1 → LRU = E1 On access to Ek: Set bits on path to point AWAY from Ek (so next LRU search won't find Ek) Hardware cost for 8-way: 7 bits (vs 56 for true LRU)Hardware cost for 16-way: 15 bits (vs 240 for true LRU) NOT true LRU but very close in practice (>90% matches)True hardware LRU is practical only for small associativity (2-8 ways). Beyond that, pseudo-LRU or random is used. For page replacement (millions of entries), neither hardware true LRU nor pseudo-LRU is feasible. The reference bit is the only practical hardware assistance.
The stack implementation connects directly to stack distance, the key metric for LRU analysis.
Definition Reminder:
The stack distance of a page reference is its position in the LRU stack before the access. If page p is at position k, its stack distance is k.
Why Stack Distance Matters:
STACK DISTANCE AND HIT/MISS DECISION===================================== Given:- LRU stack with current n pages in memory- Access to page P at stack position k (stack distance = k) Decision:- If k ≤ n: PAGE IS IN MEMORY → HIT- If k > n: PAGE IS NOT IN MEMORY → MISS (page fault)- If k = ∞: FIRST ACCESS TO THIS PAGE → Compulsory miss Example with 4 frames: Stack: [A, B, C, D] (4 pages in memory, A=MRU, D=LRU) Access to B (position 2): Stack distance = 2 ≤ 4 → HIT New stack: [B, A, C, D] Access to E (position ∞, not in stack): Stack distance = ∞ > 4 → MISS Evict D (LRU), load E New stack: [E, B, A, C] Access to C (position 4): Stack distance = 4 ≤ 4 → HIT New stack: [C, E, B, A]Stack Distance Histogram:
By tracking stack distances for a reference string, we get a histogram that predicts LRU performance for any memory size:
| Stack Distance | Count | Cumulative | Meaning |
|---|---|---|---|
| 1 | 5000 | 5000 | Repeated immediate access (very hot) |
| 2-5 | 3000 | 8000 | Very recently used |
| 6-10 | 1000 | 9000 | Recently used |
| 11-50 | 500 | 9500 | Somewhat recent |
| 51-100 | 200 | 9700 | Getting stale |
| 101-1000 | 200 | 9900 | Old |
1000 or ∞ | 100 | 10000 | Very old or compulsory |
Interpreting the Histogram:
The histogram tells us exactly how memory size translates to hit rate under LRU. This is why stack distance profiling is a powerful tool for memory system analysis.
A single stack distance trace (computed from a reference string) can predict LRU hit rates for ANY memory size. This is the power of stack algorithms: the inclusion property means more memory only captures additional stack distances, never causing previously-hit pages to miss.
Both stack and counter implementations achieve exact LRU, but with different trade-offs. Let's compare systematically.
| Aspect | Stack (DLL + HashMap) | Counter (Timestamps) |
|---|---|---|
| Find LRU victim | O(1) - tail pointer | O(n) - scan for minimum* |
| Update on access | O(1) - move to front | O(1) - write timestamp |
| Space per page | ~24 bytes (node overhead) | 8 bytes (64-bit counter) |
| Total space | O(n) with ~3× overhead | O(n) with ~1× overhead |
| Cache behavior | Pointer chasing (poor) | Sequential scan (better)* |
| Hardware support | Not feasible for large n | Not feasible for large n |
| Concurrent access | Complex (list ordering) | Simple (timestamp write) |
*With min-heap, counter find-min becomes O(1) but update becomes O(log n).
Key Insight: They're Equivalent for Exact LRU
Both approaches face the same fundamental problem: every memory access must update the data structure. Whether you're moving a node to the front of a list or writing a timestamp, the update frequency (billions/second) is the constraint, not the O(1) cost.
When to Use Each:
Linux maintains LRU lists (stack-like) but uses reference bits (counter-like sampling) to approximate recency. The kernel's active/inactive page lists are essentially two stacks, with pages moving between them based on reference bit observation. This hybrid captures benefits of both approaches.
The stack model inspires several practical algorithms. We can't maintain the exact stack, but we can maintain partitions of it.
The Two-List Approximation (Linux):
LINUX ACTIVE/INACTIVE LIST APPROACH==================================== Instead of one precise stack, maintain two lists: ACTIVE LIST: Recently accessed pages (protected from eviction)INACTIVE LIST: Candidates for eviction (older pages) Active List Inactive List (protected) (evictable) ←──────────────────────────────────────────────────→ [P1] [P2] [P3] [P4] | [P5] [P6] [P7] [P8] [P9] [P10] ↑ The divide Promotion: Page on inactive list accessed → move to activeDemotion: Pages age from active to inactive (periodically)Eviction: Select from inactive list (preferably cold pages) This is "good enough" LRU:- Truly hot pages stay in active list- Cold pages migrate to inactive and get evicted- No per-access tracking needed! Update frequency:- Promotion: only for accessed inactive pages (subset of accesses)- Demotion: periodic batch operation (not per-access)- Result: manageable overheadThe Clock Algorithm (Second-Chance FIFO):
The Clock algorithm approximates LRU by using a circular list with reference bits:
CLOCK ALGORITHM (Stack Approximation via FIFO + Reference Bits)=============================================================== Circular buffer of pages with reference bit R: Hand→ ↓ [P0,R=1] → [P1,R=0] → [P2,R=1] → [P3,R=0] → [P4,R=1] ↑ | └───────────────────────────────────────────┘ Eviction process:1. Examine page at hand2. If R=1: Clear R to 0, advance hand (give second chance)3. If R=0: Evict this page, advance hand4. Repeat until victim found Why this approximates LRU:- Recently accessed pages have R=1, survive- Long-untouched pages have R=0, get evicted- "Age" is encoded in how many passes since R was set- Worst case: one full rotation to find victim Not true LRU, but captures the essence:- Hot pages get second chances- Cold pages (R=0 for long time) are evicted- No need to track every access!Understanding the exact stack model is essential because all approximations (Clock, 2-List, MGLRU) are designed to capture the stack's properties—hot pages near the top, cold pages near the bottom—without the impossible overhead of per-access updates. The module on LRU Approximations will explore these algorithms in full detail.
Let's see a complete, production-quality LRU cache implementation using the stack approach. This is suitable for application-level caches (not OS page replacement).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132
// PRODUCTION-QUALITY LRU CACHE (Stack Implementation) #include <stdlib.h>#include <stdint.h>#include <stdbool.h> #define HASH_SIZE 1024 // Adjust based on expected cache size typedef struct Node { uint64_t key; void* value; struct Node* prev; struct Node* next; struct Node* hash_next; // For hash table chaining} Node; typedef struct LRUCache { Node* head; // MRU Node* tail; // LRU Node** hash_table; // Array of linked list heads size_t size; size_t capacity; // Stats uint64_t hits; uint64_t misses;} LRUCache; // Hash functionstatic size_t hash(uint64_t key) { return key % HASH_SIZE;} // Create new cacheLRUCache* lru_create(size_t capacity) { LRUCache* cache = malloc(sizeof(LRUCache)); cache->head = cache->tail = NULL; cache->hash_table = calloc(HASH_SIZE, sizeof(Node*)); cache->size = 0; cache->capacity = capacity; cache->hits = cache->misses = 0; return cache;} // Internal: remove node from liststatic void unlink_node(LRUCache* cache, Node* node) { if (node->prev) node->prev->next = node->next; else cache->head = node->next; if (node->next) node->next->prev = node->prev; else cache->tail = node->prev;} // Internal: add node to frontstatic void push_front(LRUCache* cache, Node* node) { node->prev = NULL; node->next = cache->head; if (cache->head) cache->head->prev = node; cache->head = node; if (!cache->tail) cache->tail = node;} // Get value for key (returns NULL if not found)void* lru_get(LRUCache* cache, uint64_t key) { size_t h = hash(key); Node* node = cache->hash_table[h]; while (node) { if (node->key == key) { // HIT: move to front unlink_node(cache, node); push_front(cache, node); cache->hits++; return node->value; } node = node->hash_next; } cache->misses++; return NULL; // MISS} // Put key-value pairvoid lru_put(LRUCache* cache, uint64_t key, void* value) { // Check if key exists size_t h = hash(key); Node* node = cache->hash_table[h]; while (node) { if (node->key == key) { // Update value, move to front node->value = value; unlink_node(cache, node); push_front(cache, node); return; } node = node->hash_next; } // Need to insert new node if (cache->size >= cache->capacity) { // Evict LRU (tail) Node* victim = cache->tail; unlink_node(cache, victim); // Remove from hash table size_t vh = hash(victim->key); Node** pp = &cache->hash_table[vh]; while (*pp != victim) pp = &(*pp)->hash_next; *pp = victim->hash_next; free(victim); cache->size--; } // Create and insert new node node = malloc(sizeof(Node)); node->key = key; node->value = value; push_front(cache, node); node->hash_next = cache->hash_table[h]; cache->hash_table[h] = node; cache->size++;} // Get hit ratedouble lru_hit_rate(LRUCache* cache) { uint64_t total = cache->hits + cache->misses; return total ? (double)cache->hits / total : 0.0;}This implementation is what you'd find in real systems like Redis (for object caching) or database buffer pools. The key insight is that at the application level (not OS level), access frequency is low enough that per-access updates are practical.
We've thoroughly explored the stack-based approach to LRU implementation. Let's consolidate the key insights:
Module Complete: LRU Algorithm
You now have a comprehensive understanding of the LRU page replacement algorithm. You know:
The next module explores LRU Approximations—the Clock algorithm, Second-Chance, Enhanced Second-Chance, Aging, and modern multi-generation schemes—that make LRU-like behavior practical in real operating systems.
Congratulations! You've mastered the LRU algorithm in depth: from its theoretical foundations in temporal locality, through the mathematics of recency and stack distance, to the practical considerations that make exact implementation impossible and approximations necessary. This knowledge is essential for understanding modern operating system memory management.