Loading content...
We've explored the fundamental building blocks of LRU approximation: reference bits, the basic Clock algorithm, Enhanced Second-Chance with dirty bit consideration, and NRU classification. But production operating systems don't use these textbook algorithms directly—they employ sophisticated variants optimized for real-world workloads.
This capstone page explores advanced Clock variants that address limitations of the basic algorithm:
We'll also examine how Linux, FreeBSD, and Windows implement page replacement, seeing how textbook concepts translate to production code.
By the end of this page, you will understand: the limitations of basic Clock that motivate advanced variants, how two-handed clock smooths operation, how Clock-Pro handles scan-resistant workloads, how CAR adapts to changing access patterns, and how real operating systems implement page replacement. This synthesis prepares you for understanding modern memory management systems.
Before exploring advanced variants, let's precisely identify what problems they solve. The basic Clock algorithm has several limitations:
1. No Frequency Information
A page accessed once per interval looks identical (R=1) to a page accessed thousands of times. The reference bit is binary—it can't distinguish frequency patterns.
Example: Page A (code loop, 10,000 accesses/sec) and Page B (logging, 1 access/sec) both have R=1. If both survive one clearing interval, they're treated equivalently.
2. Scan Resistance Failure
When an application scans through a large file (larger than memory), the scan pages fill memory and evict useful working set pages. Basic Clock has no defense against this.
Example: A database doing a table scan for a report. Scan pages are used exactly once but evict hot index pages that will be needed again.
| Limitation | Symptom | Impact | Solution Approach |
|---|---|---|---|
| Binary reference | Can't distinguish frequency | Hot pages get same treatment as barely-used | Counters (GCLOCK) |
| Scan vulnerability | Sequential access pollutes cache | Working set evicted by one-time scans | Multi-list (Clock-Pro) |
| Fixed policy | One-size-fits-all replacement | Suboptimal for varying workloads | Adaptation (CAR) |
| Clearing overhead | TLB flush on each clear | Expensive on multi-core systems | Two-handed clock |
| No aging | Only 'recent' vs 'not recent' | Coarse time resolution | Multi-bit history |
3. Fixed Time Resolution
With periodic clearing every N seconds, we can only distinguish "accessed in the last N seconds" from "not accessed." Pages accessed 0.1 seconds ago vs. 0.9 seconds ago (within the same interval) are indistinguishable.
4. Clearing Overhead
Clearing all reference bits requires:
This overhead is acceptable every few seconds but limits how frequently we can clear.
5. No Workload Adaptation
Different workloads benefit from different policies:
Basic Clock uses a fixed policy regardless of workload characteristics.
Each limitation could be addressed with additional complexity—more bits per page, more sophisticated data structures, adaptive algorithms. The challenge is balancing improvement against overhead. Advanced variants carefully choose which limitations to address based on target workloads.
The Two-Handed Clock variant uses two pointers ("hands") that sweep through memory at a fixed distance apart. This separates reference bit clearing from victim selection, providing smoother operation and better separation of concerns.
How it works:
The gap gives pages time to be re-referenced between the clear hand passing and the evict hand arriving. This effectively creates a survival window.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
/* * Two-Handed Clock Algorithm * Separates clearing from eviction for smoother operation */ #define NUM_FRAMES 4096#define GAP_PERCENT 25 // 25% of frames between hands typedef struct { uint32_t clear_hand; // Leading hand - clears R bits uint32_t evict_hand; // Trailing hand - evicts R=0 pages uint32_t gap_frames; // Fixed distance between hands} two_handed_clock_t; two_handed_clock_t clock; void init_two_handed_clock(void) { clock.clear_hand = 0; clock.gap_frames = (NUM_FRAMES * GAP_PERCENT) / 100; // Evict hand trails by gap_frames clock.evict_hand = NUM_FRAMES - clock.gap_frames;} /* * Advance clear hand by one position * Called periodically (faster than eviction rate) */void advance_clear_hand(void) { frame_entry_t *frame = &frames[clock.clear_hand]; if (frame->occupied) { // Clear reference bit without evicting clear_reference_bit(frame); } clock.clear_hand = (clock.clear_hand + 1) % NUM_FRAMES;} /* * Find and evict a victim page */uint32_t two_handed_clock_evict(void) { uint32_t start = clock.evict_hand; uint32_t checked = 0; while (checked < NUM_FRAMES) { frame_entry_t *frame = &frames[clock.evict_hand]; if (frame->occupied && !is_referenced(frame)) { // Found victim: R=0, not referenced since clear hand passed uint32_t victim = clock.evict_hand; clock.evict_hand = (clock.evict_hand + 1) % NUM_FRAMES; return victim; } // If R=1, page was re-referenced after clearing // This is unusual but possible if evict hand catches up if (frame->occupied && is_referenced(frame)) { // Give second chance: clear and move on clear_reference_bit(frame); } clock.evict_hand = (clock.evict_hand + 1) % NUM_FRAMES; checked++; // Maintain gap: if evict hand catches up to clear hand, // advance clear hand too if (clock.evict_hand == clock.clear_hand) { advance_clear_hand(); } } // Fallback: evict current position return clock.evict_hand;} /* * Background thread: continuously advance clear hand */void clear_hand_sweeper(void) { while (true) { for (int i = 0; i < CLEAR_BATCH_SIZE; i++) { advance_clear_hand(); } sleep_ms(CLEAR_INTERVAL_MS); }}Advantages of two-handed clock:
Tuning the gap:
| Gap Size | Effect | Best For |
|---|---|---|
| Small (10%) | Quick eviction, short survival window | Memory-constrained, rapid turnover |
| Medium (25%) | Balanced survival time | General-purpose workloads |
| Large (50%) | Long survival window | Applications with irregular access patterns |
FreeBSD's VM system uses a variant of two-handed clock called the 'pageout daemon.' The clear hand equivalent runs as part of the vm_pageout thread, continuously scanning and clearing reference bits while a separate path handles actual eviction decisions.
GCLOCK (Generalized Clock) replaces the single reference bit with a counter per page, enabling frequency-aware replacement. Pages accessed more frequently have higher counters and survive longer.
How it works:
This creates a frequency-weighted aging mechanism where frequently accessed pages accumulate "survival credits."
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138
/* * GCLOCK - Generalized Clock with reference counters * Provides frequency-aware page replacement */ #define MAX_COUNT 7 // 3-bit counter (0-7)#define INCREMENT_ON_ACCESS 1#define DECREMENT_ON_SWEEP 1 typedef struct { uint32_t page_number; uint8_t ref_count; // 0 to MAX_COUNT bool dirty; bool occupied;} gclock_frame_t; gclock_frame_t frames[NUM_FRAMES];uint32_t clock_hand = 0; /* * Called when a page is accessed * Can be triggered by software (page fault handler sets R bit, * periodic scan converts R bit to counter increment) */void gclock_page_accessed(uint32_t frame_num) { gclock_frame_t *frame = &frames[frame_num]; // Increment counter, capped at MAX_COUNT if (frame->ref_count < MAX_COUNT) { frame->ref_count += INCREMENT_ON_ACCESS; }} /* * Periodic scan: convert hardware R bits to counter values */void gclock_scan_reference_bits(void) { for (uint32_t i = 0; i < NUM_FRAMES; i++) { if (!frames[i].occupied) continue; uint64_t *pte = get_pte_for_frame(i); if (*pte & REFERENCE_BIT) { // Page was accessed - increment counter gclock_page_accessed(i); // Clear R bit *pte &= ~REFERENCE_BIT; invalidate_tlb_entry_for_frame(i); } }} /* * Find victim using GCLOCK algorithm */uint32_t gclock_find_victim(void) { uint32_t start = clock_hand; while (true) { gclock_frame_t *frame = &frames[clock_hand]; if (frame->occupied) { if (frame->ref_count == 0) { // Found victim - counter exhausted uint32_t victim = clock_hand; clock_hand = (clock_hand + 1) % NUM_FRAMES; return victim; } // Decrement counter, giving page more time frame->ref_count -= DECREMENT_ON_SWEEP; } clock_hand = (clock_hand + 1) % NUM_FRAMES; // Safety: prevent infinite loop if (clock_hand == start) { // Full sweep complete - all counters were positive // Continue sweeping; counters will eventually reach 0 } }} /* * Enhanced GCLOCK with dirty-bit consideration * Prefers clean pages when counters are equal */uint32_t gclock_find_victim_enhanced(void) { uint32_t best_clean = UINT32_MAX; uint32_t best_dirty = UINT32_MAX; uint8_t min_clean_count = MAX_COUNT + 1; uint8_t min_dirty_count = MAX_COUNT + 1; uint32_t start = clock_hand; uint32_t swept = 0; // First pass: find lowest-count clean and dirty pages while (swept < NUM_FRAMES) { gclock_frame_t *frame = &frames[clock_hand]; if (frame->occupied) { if (frame->ref_count == 0) { // Immediate eviction candidate if (!frame->dirty) { return clock_hand; // Perfect: count 0, clean } if (best_dirty == UINT32_MAX) { best_dirty = clock_hand; // Count 0, but dirty } } else { // Track lowest counts for later if (!frame->dirty && frame->ref_count < min_clean_count) { min_clean_count = frame->ref_count; best_clean = clock_hand; } if (frame->dirty && frame->ref_count < min_dirty_count) { min_dirty_count = frame->ref_count; best_dirty = clock_hand; } } // Decrement all counters as we sweep if (frame->ref_count > 0) { frame->ref_count--; } } clock_hand = (clock_hand + 1) % NUM_FRAMES; swept++; } // Select victim: prefer clean, then dirty if (best_clean != UINT32_MAX) return best_clean; if (best_dirty != UINT32_MAX) return best_dirty; // Fallback (shouldn't happen with occupied frames) return start;}Counter semantics:
| Counter Value | Interpretation | Survival |
|---|---|---|
| 0 | Unused, evict immediately | None |
| 1 | Used once, nearly expired | 1 more sweep |
| 4 | Moderately used | 4 more sweeps |
| 7 (MAX) | Heavily used | 7+ more sweeps |
Frequency sensitivity:
GCLOCK captures access frequency because:
This provides scan resistance: a sequential scan only increments each page's counter to 1, while hot pages maintain high counts.
GCLOCK trades memory (2-8 bits per page vs. 1 bit) for frequency tracking. With millions of pages, this adds megabytes of overhead. Also, converting hardware R bits to counter increments requires periodic scanning, adding CPU overhead. The benefit is significantly better handling of frequency-varying workloads.
Clock-Pro is a sophisticated LRU approximation that distinguishes between pages based on their access history, specifically identifying hot pages (high reuse probability) and cold pages (low reuse probability). It provides scan resistance and adapts to changing access patterns.
Key concepts:
The genius insight:
Clock-Pro tracks the reuse distance (time between first and second access) of pages. Pages with short reuse distances are promoted to hot; those with long distances (or no second access) remain cold. Hot pages are protected from eviction; cold pages are the primary eviction candidates.
Algorithm mechanics:
Scan resistance mechanism:
Sequential scan pages:
Working set pages:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
/* * Clock-Pro Simplified Implementation * Tracks hot/cold status for scan resistance */ typedef enum { COLD, HOT } page_temp_t; typedef struct clock_pro_page { uint32_t page_id; page_temp_t temperature; bool referenced; // Reference bit bool resident; // Is page in memory? struct clock_pro_page *next, *prev;} clock_pro_page_t; // Circular list of pages (both resident and non-resident)clock_pro_page_t *clock_hand_hot; // For demoting hot pagesclock_pro_page_t *clock_hand_cold; // For evicting cold pages // Target sizes (adaptive)uint32_t target_cold_size; // Target number of cold pages /* * Page accessed - handle based on current temperature */void clock_pro_access(uint32_t page_id) { clock_pro_page_t *page = find_page(page_id); if (page == NULL) { // Page not tracked - handle fault elsewhere return; } if (page->resident) { // Resident page access page->referenced = true; if (page->temperature == COLD) { // Cold resident page accessed -> promote to hot page->temperature = HOT; // Reduce target cold size (cold pages are valuable) if (target_cold_size > MIN_COLD_SIZE) { target_cold_size--; } } } else { // Non-resident page accessed (page fault on tracked page) // This page was evicted but needed again! // Will be loaded as HOT since it demonstrated reuse // Increase target cold size (we evicted something useful) if (target_cold_size < MAX_COLD_SIZE) { target_cold_size++; } }} /* * Find cold page to evict */clock_pro_page_t* clock_pro_find_victim(void) { while (true) { clock_pro_page_t *page = clock_hand_cold; if (page->temperature == COLD && page->resident) { if (!page->referenced) { // Cold, not referenced - evict return page; } // Referenced cold page - promote to hot page->temperature = HOT; page->referenced = false; } // Advance past hot pages and move to next cold clock_hand_cold = clock_hand_cold->next; }} /* * Run hot hand - demote idle hot pages to cold */void clock_pro_demote_hot(void) { clock_pro_page_t *page = clock_hand_hot; while (count_hot_pages() > target_hot_size) { if (page->temperature == HOT) { if (!page->referenced) { // Hot page not accessed in full cycle - demote page->temperature = COLD; } else { // Clear reference bit, give another chance page->referenced = false; } } clock_hand_hot = clock_hand_hot->next; page = clock_hand_hot; }}Linux's page cache doesn't use pure Clock-Pro, but incorporates similar ideas. The inactive and active lists function similarly to cold and hot sets. Pages must be accessed twice while on the inactive list to move to active. This provides scan resistance for I/O-heavy workloads.
CAR (Clock with Adaptive Replacement) combines the efficiency of the Clock algorithm with the adaptivity of ARC (Adaptive Replacement Cache). It automatically tunes between recency-favoring and frequency-favoring behavior based on workload characteristics.
The key innovation:
CAR maintains two clocks:
An adaptive parameter p determines how many pages come from T1 vs. T2. If the workload benefits from recency, p increases to enlarge T1. If frequency matters more, p decreases to favor T2.
History tracking:
Like Clock-Pro, CAR tracks recently evicted pages (B1 for pages evicted from T1, B2 for pages evicted from T2). Faults on these "ghost" entries indicate which policy would have helped, triggering adaptation.
| Component | Contents | Purpose |
|---|---|---|
| T1 | Pages accessed once (resident) | Capture recency |
| T2 | Pages accessed 2+ times (resident) | Capture frequency |
| B1 | Ghost entries: recently evicted from T1 | Track recency misses |
| B2 | Ghost entries: recently evicted from T2 | Track frequency misses |
| p | Adaptive target size for T1 | Balance recency/frequency |
Adaptation mechanism:
On fault for page in B1 (was evicted from T1):
// Recency would have helped - increase p (more T1 space)
p = min(p + delta1, cache_size)
On fault for page in B2 (was evicted from T2):
// Frequency would have helped - decrease p (more T2 space)
p = max(p - delta2, 0)
delta calculation:
This creates a feedback loop: the cache learns from its mistakes and adjusts accordingly.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
// CAR Adaptive Replacement Logic (Simplified) state: T1, T2 = circular lists of resident pages B1, B2 = lists of ghost (evicted) page metadata p = adaptive target for |T1|, initially cache_size/2 function handle_page_fault(page_id): if page_id in T1 or page_id in T2: // Cache hit (shouldn't be a fault, but for completeness) promote_in_place(page_id) return if page_id in B1: // Ghost hit in B1 - recency miss // If we had kept this T1 page, we'd have hit delta = max(|B2| / |B1|, 1) p = min(p + delta, cache_size) // Load as T2 page (it's now frequently accessed) evict_and_load_to_T2(page_id) return if page_id in B2: // Ghost hit in B2 - frequency miss // If we had kept this T2 page, we'd have hit delta = max(|B1| / |B2|, 1) p = max(p - delta, 0) // Load as T2 page evict_and_load_to_T2(page_id) return // Complete miss - not in any list // Load as T1 page (seen once) evict_and_load_to_T1(page_id) function evict(): if |T1| > p: // T1 larger than target - evict from T1 victim = clock_sweep_T1() move_to_B1(victim) else: // Evict from T2 victim = clock_sweep_T2() move_to_B2(victim) return victim.frameCAR advantages:
Comparison with ARC:
CAR is to ARC what Clock is to LRU—an approximation with better efficiency:
| Aspect | ARC | CAR |
|---|---|---|
| Base algorithm | Pure LRU lists | Clock scanning |
| Per-access cost | O(1) with complex bookkeeping | O(1) simpler |
| Implementation | Complex (list management) | Moderate |
| Performance | Optimal for adaptive | Near-optimal |
ARC was patented by IBM, which has led to licensing complications for some open-source projects. CAR, developed separately, provides similar benefits. When implementing adaptive cache replacement, check current patent status and licensing requirements.
Linux uses a sophisticated multi-queue page replacement system that incorporates many of the ideas we've discussed. Let's examine how Linux manages page reclamation.
Linux's LRU Lists:
Linux maintains several LRU lists per memory zone and per memory cgroup:
The two-list model provides scan resistance: Pages must be accessed while on the inactive list to move to active. This is similar to Clock-Pro's cold→hot promotion.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
// Simplified Linux Page Reclaim Logic // Key functions in mm/vmscan.c function shrink_inactive_list(zone, nr_to_scan): pages_to_free = [] for each page in zone.inactive_list (up to nr_to_scan): if page.referenced: // Page was accessed while inactive if should_activate(page): // Promote to active list move_to_active_list(page) else: // Keep on inactive, clear reference clear_referenced(page) rotate_to_tail(page) else: // Not referenced - candidate for reclaim if page.dirty and not page.writeback: // Start writeback, try again later start_writeback(page) rotate_to_tail(page) elif page.writeback: // Wait for writeback to complete rotate_to_tail(page) else: // Clean, not referenced - reclaim! pages_to_free.append(page) free_pages_bulk(pages_to_free) function shrink_active_list(zone, nr_to_scan): // Move cold pages from active to inactive for each page in zone.active_list (up to nr_to_scan): if page.referenced: // Still active - keep it clear_referenced(page) // (stays on active, but reference cleared) else: // Not referenced - demote to inactive move_to_inactive_list(page) // Reclaim orchestrationfunction kswapd_reclaim(zone): // Balance between file and anonymous pages // based on swappiness parameter (0-100) file_pressure = calculate_file_pressure() anon_pressure = calculate_anon_pressure() // Scan inactive lists shrink_inactive_list(zone.inactive_file, scan_count) shrink_inactive_list(zone.inactive_anon, scan_count * anon_ratio) // If still not enough, shrink active lists if free_pages < target: shrink_active_list(zone.active_file, scan_count) shrink_active_list(zone.active_anon, scan_count * anon_ratio)Key Linux mechanisms:
kswapd: Background kernel thread that reclaims pages when memory is low (low watermark)
Direct reclaim: Synchronous reclaim when kswapd can't keep up (high memory pressure)
Swappiness: Tunable (0-100) controlling preference for file cache vs. anonymous pages
Memory cgroups: Separate LRU lists per cgroup for container isolation
Refault detection: Tracks pages that were evicted then reloaded (like B1/B2 in CAR)
The vm.swappiness parameter:
| Value | Behavior |
|---|---|
| 0 | Strongly prefer keeping anonymous pages; swap only under OOM pressure |
| 1-30 | Prefer anonymous pages; swap is last resort |
| 60 (default) | Balance between file cache and anonymous |
| 100 | Equally willing to swap anonymous pages |
Linux 6.1+ includes Multi-Gen LRU (MGLRU), a significant improvement over the traditional LRU lists. MGLRU uses generational tracking (multiple age buckets instead of active/inactive) and improved scanning heuristics. It reduces page fault rates by up to 40% on some workloads while using less CPU for scanning.
Let's briefly examine how other major operating systems implement page replacement.
FreeBSD VM System:
FreeBSD uses a multi-queue system with pages classified into queues based on their state:
The pagedaemon (similar to Linux's kswapd) runs as vm_pageout and scans pages using a clock-like mechanism. FreeBSD explicitly uses a two-handed approach for the inactive queue.
| Aspect | Linux | FreeBSD | Windows |
|---|---|---|---|
| Primary algorithm | Dual LRU lists | Clock-based queues | Working Set + Standby |
| Background daemon | kswapd | pagedaemon | Memory Manager |
| Scan trigger | Low watermark | Free page threshold | Working set limits |
| File vs anon | Separate lists + swappiness | Unified with bias | MemoryPriority |
| Dirty handling | Writeback + flush | Laundry queue | Modified page writer |
| Tuning | vm.swappiness, watermarks | vm.v_free_target, etc. | Registry settings |
Windows Memory Manager:
Windows uses a fundamentally different approach based on working sets (per-process limits) combined with global standby/modified lists:
Per-process working set: Each process has a target working set size. Pages beyond this limit are candidates for trimming.
Trimming: When a process exceeds its working set limit or system memory is low, pages are removed from the process's working set.
Standby List: Clean trimmed pages go to the standby list. They remain in memory and can be quickly reclaimed if the process faults on them.
Modified List: Dirty trimmed pages go to the modified list for eventual writeback.
Zero Page List: Pages zeroed and ready for new allocation.
Windows' standby list acts as a second-chance mechanism: Trimmed pages aren't immediately discarded. If re-accessed before reuse, they're quickly restored to the working set (soft fault). This is similar to Clock-Pro's non-resident tracking.
Windows includes Superfetch (now called SysMain), which proactively loads pages based on predicted access patterns. It uses standby list pages, so it doesn't reduce available memory—the preloaded pages are immediately reclaimable if needed for other purposes.
With so many algorithms available, how do you choose? Here's a decision framework:
Workload characteristics to consider:
| Workload | Key Characteristic | Recommended Algorithm |
|---|---|---|
| Interactive desktop | Recency matters, varied apps | Two-list (Linux default) |
| Database server | Large scans + hot set | Clock-Pro or CAR |
| Streaming/analytics | Sequential access | Simple Clock (pages used once) |
| Embedded/RT | Predictability, simplicity | Basic Clock or NRU |
| General server | Mixed workloads | Adaptive (CAR) or Linux hybrid |
| Cache (Redis, etc.) | Pure cache semantics | LRU or LFU (not OS algorithm) |
Implementation considerations:
Memory overhead: How many bits per page can you afford?
CPU overhead:
Complexity:
Tunability:
This capstone synthesizes the full spectrum of LRU approximation algorithms, from simple reference bits to sophisticated adaptive schemes.
The complete LRU approximation toolkit:
| Algorithm | Key Feature | Best For |
|---|---|---|
| Basic Clock | Simple, amortized O(1) | General use, embedded |
| Enhanced Clock | I/O aware (R+M) | Disk-backed systems |
| NRU | Stateless, random selection | Simplicity, verification |
| GCLOCK | Frequency counters | Frequency-varying workloads |
| Clock-Pro | Hot/cold distinction | Scan resistance |
| CAR | Self-tuning | Mixed/changing workloads |
Module complete! You now have a comprehensive understanding of LRU approximation algorithms—from the fundamental reference bit through advanced production implementations. These concepts form the foundation for understanding memory management in any modern operating system.
Congratulations! You've mastered LRU approximation algorithms—the practical heart of page replacement in real operating systems. From the simple elegance of the reference bit to the sophisticated adaptation of CAR, you now understand how systems make intelligent replacement decisions without the overhead of true LRU.