Lru Approximations - Learning Module

Loading content...

0/227

Clock Algorithm - Advanced Variants

Beyond Basic Clock: Production-Grade Page Replacement

We've explored the fundamental building blocks of LRU approximation: reference bits, the basic Clock algorithm, Enhanced Second-Chance with dirty bit consideration, and NRU classification. But production operating systems don't use these textbook algorithms directly—they employ sophisticated variants optimized for real-world workloads.

This capstone page explores advanced Clock variants that address limitations of the basic algorithm:

Two-Handed Clock: Separates reference bit clearing from victim selection for smoother operation
GCLOCK (Generalized Clock): Uses counters instead of single bits for frequency-aware replacement
Clock-Pro: Distinguishes hot and cold pages with adaptive behavior
CAR (Clock with Adaptive Replacement): Self-tuning algorithm that balances recency and frequency

We'll also examine how Linux, FreeBSD, and Windows implement page replacement, seeing how textbook concepts translate to production code.

What You Will Learn

By the end of this page, you will understand: the limitations of basic Clock that motivate advanced variants, how two-handed clock smooths operation, how Clock-Pro handles scan-resistant workloads, how CAR adapts to changing access patterns, and how real operating systems implement page replacement. This synthesis prepares you for understanding modern memory management systems.

Limitations of Basic Clock

Before exploring advanced variants, let's precisely identify what problems they solve. The basic Clock algorithm has several limitations:

1. No Frequency Information

A page accessed once per interval looks identical (R=1) to a page accessed thousands of times. The reference bit is binary—it can't distinguish frequency patterns.

Example: Page A (code loop, 10,000 accesses/sec) and Page B (logging, 1 access/sec) both have R=1. If both survive one clearing interval, they're treated equivalently.

2. Scan Resistance Failure

When an application scans through a large file (larger than memory), the scan pages fill memory and evict useful working set pages. Basic Clock has no defense against this.

Example: A database doing a table scan for a report. Scan pages are used exactly once but evict hot index pages that will be needed again.

Basic Clock Limitations
Limitation	Symptom	Impact	Solution Approach
Binary reference	Can't distinguish frequency	Hot pages get same treatment as barely-used	Counters (GCLOCK)
Scan vulnerability	Sequential access pollutes cache	Working set evicted by one-time scans	Multi-list (Clock-Pro)
Fixed policy	One-size-fits-all replacement	Suboptimal for varying workloads	Adaptation (CAR)
Clearing overhead	TLB flush on each clear	Expensive on multi-core systems	Two-handed clock
No aging	Only 'recent' vs 'not recent'	Coarse time resolution	Multi-bit history

3. Fixed Time Resolution

With periodic clearing every N seconds, we can only distinguish "accessed in the last N seconds" from "not accessed." Pages accessed 0.1 seconds ago vs. 0.9 seconds ago (within the same interval) are indistinguishable.

4. Clearing Overhead

Clearing all reference bits requires:

Scanning all PTEs
Performing TLB invalidation (inter-processor interrupts on multi-core)

This overhead is acceptable every few seconds but limits how frequently we can clear.

5. No Workload Adaptation

Different workloads benefit from different policies:

Interactive applications: Favor recency (keep recent pages)
Scientific computing: Favor frequency (keep hot computation pages)
Database: Favor scan resistance (don't cache table scans)

Basic Clock uses a fixed policy regardless of workload characteristics.

Fundamental Tradeoff

Each limitation could be addressed with additional complexity—more bits per page, more sophisticated data structures, adaptive algorithms. The challenge is balancing improvement against overhead. Advanced variants carefully choose which limitations to address based on target workloads.

Two-Handed Clock

The Two-Handed Clock variant uses two pointers ("hands") that sweep through memory at a fixed distance apart. This separates reference bit clearing from victim selection, providing smoother operation and better separation of concerns.

How it works:

Leading hand (clear hand): Sweeps ahead, clearing reference bits
Trailing hand (evict hand): Follows behind, evicting pages with R=0
Gap distance: Fixed separation between hands (e.g., 25% of memory)

The gap gives pages time to be re-referenced between the clear hand passing and the evict hand arriving. This effectively creates a survival window.

Converting Mermaid diagram...

two_handed_clock.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
/*
 * Two-Handed Clock Algorithm
 * Separates clearing from eviction for smoother operation
 */
 
#define NUM_FRAMES 4096
#define GAP_PERCENT 25  // 25% of frames between hands
 
typedef struct {
    uint32_t clear_hand;    // Leading hand - clears R bits
    uint32_t evict_hand;    // Trailing hand - evicts R=0 pages
    uint32_t gap_frames;    // Fixed distance between hands
} two_handed_clock_t;
 
two_handed_clock_t clock;
 
void init_two_handed_clock(void) {
    clock.clear_hand = 0;
    clock.gap_frames = (NUM_FRAMES * GAP_PERCENT) / 100;
    // Evict hand trails by gap_frames
    clock.evict_hand = NUM_FRAMES - clock.gap_frames;
}
 
/*
 * Advance clear hand by one position
 * Called periodically (faster than eviction rate)
 */
void advance_clear_hand(void) {
    frame_entry_t *frame = &frames[clock.clear_hand];
    
    if (frame->occupied) {
        // Clear reference bit without evicting
        clear_reference_bit(frame);
    }
    
    clock.clear_hand = (clock.clear_hand + 1) % NUM_FRAMES;
}
 
/*
 * Find and evict a victim page
 */
uint32_t two_handed_clock_evict(void) {
    uint32_t start = clock.evict_hand;
    uint32_t checked = 0;
    
    while (checked < NUM_FRAMES) {
        frame_entry_t *frame = &frames[clock.evict_hand];
        
        if (frame->occupied && !is_referenced(frame)) {
            // Found victim: R=0, not referenced since clear hand passed
            uint32_t victim = clock.evict_hand;
            clock.evict_hand = (clock.evict_hand + 1) % NUM_FRAMES;
            return victim;
        }
        
        // If R=1, page was re-referenced after clearing
        // This is unusual but possible if evict hand catches up
        if (frame->occupied && is_referenced(frame)) {
            // Give second chance: clear and move on
            clear_reference_bit(frame);
        }
        
        clock.evict_hand = (clock.evict_hand + 1) % NUM_FRAMES;
        checked++;
        
        // Maintain gap: if evict hand catches up to clear hand,
        // advance clear hand too
        if (clock.evict_hand == clock.clear_hand) {
            advance_clear_hand();
        }
    }
    
    // Fallback: evict current position
    return clock.evict_hand;
}
 
/*
 * Background thread: continuously advance clear hand
 */
void clear_hand_sweeper(void) {
    while (true) {
        for (int i = 0; i < CLEAR_BATCH_SIZE; i++) {
            advance_clear_hand();
        }
        sleep_ms(CLEAR_INTERVAL_MS);
    }
}

Advantages of two-handed clock:

Decoupled operations: Clearing can proceed continuously without waiting for eviction needs
Predictable survival window: Pages have a known time window to prove usefulness
Reduced TLB pressure: Clearing spreads TLB invalidation over time instead of burst clearing
Background operation: Clear hand can run in a low-priority background thread

Tuning the gap:

Gap Size	Effect	Best For
Small (10%)	Quick eviction, short survival window	Memory-constrained, rapid turnover
Medium (25%)	Balanced survival time	General-purpose workloads
Large (50%)	Long survival window	Applications with irregular access patterns

FreeBSD Implementation

FreeBSD's VM system uses a variant of two-handed clock called the 'pageout daemon.' The clear hand equivalent runs as part of the vm_pageout thread, continuously scanning and clearing reference bits while a separate path handles actual eviction decisions.

GCLOCK: Generalized Clock with Counters

GCLOCK (Generalized Clock) replaces the single reference bit with a counter per page, enabling frequency-aware replacement. Pages accessed more frequently have higher counters and survive longer.

How it works:

Each page has a counter (typically 2-8 bits) instead of a single reference bit
On page access: increment counter (up to maximum)
When clock hand passes: decrement counter instead of clearing
Evict when counter reaches zero

This creates a frequency-weighted aging mechanism where frequently accessed pages accumulate "survival credits."

gclock.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
/*
 * GCLOCK - Generalized Clock with reference counters
 * Provides frequency-aware page replacement
 */
 
#define MAX_COUNT 7  // 3-bit counter (0-7)
#define INCREMENT_ON_ACCESS 1
#define DECREMENT_ON_SWEEP 1
 
typedef struct {
    uint32_t page_number;
    uint8_t ref_count;      // 0 to MAX_COUNT
    bool dirty;
    bool occupied;
} gclock_frame_t;
 
gclock_frame_t frames[NUM_FRAMES];
uint32_t clock_hand = 0;
 
/*
 * Called when a page is accessed
 * Can be triggered by software (page fault handler sets R bit,
 * periodic scan converts R bit to counter increment)
 */
void gclock_page_accessed(uint32_t frame_num) {
    gclock_frame_t *frame = &frames[frame_num];
    
    // Increment counter, capped at MAX_COUNT
    if (frame->ref_count < MAX_COUNT) {
        frame->ref_count += INCREMENT_ON_ACCESS;
    }
}
 
/*
 * Periodic scan: convert hardware R bits to counter values
 */
void gclock_scan_reference_bits(void) {
    for (uint32_t i = 0; i < NUM_FRAMES; i++) {
        if (!frames[i].occupied) continue;
        
        uint64_t *pte = get_pte_for_frame(i);
        if (*pte & REFERENCE_BIT) {
            // Page was accessed - increment counter
            gclock_page_accessed(i);
            
            // Clear R bit
            *pte &= ~REFERENCE_BIT;
            invalidate_tlb_entry_for_frame(i);
        }
    }
}
 
/*
 * Find victim using GCLOCK algorithm
 */
uint32_t gclock_find_victim(void) {
    uint32_t start = clock_hand;
    
    while (true) {
        gclock_frame_t *frame = &frames[clock_hand];
        
        if (frame->occupied) {
            if (frame->ref_count == 0) {
                // Found victim - counter exhausted
                uint32_t victim = clock_hand;
                clock_hand = (clock_hand + 1) % NUM_FRAMES;
                return victim;
            }
            
            // Decrement counter, giving page more time
            frame->ref_count -= DECREMENT_ON_SWEEP;
        }
        
        clock_hand = (clock_hand + 1) % NUM_FRAMES;
        
        // Safety: prevent infinite loop
        if (clock_hand == start) {
            // Full sweep complete - all counters were positive
            // Continue sweeping; counters will eventually reach 0
        }
    }
}
 
/*
 * Enhanced GCLOCK with dirty-bit consideration
 * Prefers clean pages when counters are equal
 */
uint32_t gclock_find_victim_enhanced(void) {
    uint32_t best_clean = UINT32_MAX;
    uint32_t best_dirty = UINT32_MAX;
    uint8_t min_clean_count = MAX_COUNT + 1;
    uint8_t min_dirty_count = MAX_COUNT + 1;
    
    uint32_t start = clock_hand;
    uint32_t swept = 0;
    
    // First pass: find lowest-count clean and dirty pages
    while (swept < NUM_FRAMES) {
        gclock_frame_t *frame = &frames[clock_hand];
        
        if (frame->occupied) {
            if (frame->ref_count == 0) {
                // Immediate eviction candidate
                if (!frame->dirty) {
                    return clock_hand;  // Perfect: count 0, clean
                }
                if (best_dirty == UINT32_MAX) {
                    best_dirty = clock_hand;  // Count 0, but dirty
                }
            } else {
                // Track lowest counts for later
                if (!frame->dirty && frame->ref_count < min_clean_count) {
                    min_clean_count = frame->ref_count;
                    best_clean = clock_hand;
                }
                if (frame->dirty && frame->ref_count < min_dirty_count) {
                    min_dirty_count = frame->ref_count;
                    best_dirty = clock_hand;
                }
            }
            
            // Decrement all counters as we sweep
            if (frame->ref_count > 0) {
                frame->ref_count--;
            }
        }
        
        clock_hand = (clock_hand + 1) % NUM_FRAMES;
        swept++;
    }
    
    // Select victim: prefer clean, then dirty
    if (best_clean != UINT32_MAX) return best_clean;
    if (best_dirty != UINT32_MAX) return best_dirty;
    
    // Fallback (shouldn't happen with occupied frames)
    return start;
}

Counter semantics:

Counter Value	Interpretation	Survival
0	Unused, evict immediately	None
1	Used once, nearly expired	1 more sweep
4	Moderately used	4 more sweeps
7 (MAX)	Heavily used	7+ more sweeps

Frequency sensitivity:

GCLOCK captures access frequency because:

Each access increments the counter (until MAX)
Each sweep decrements all counters
Frequently accessed pages maintain higher counters
Rarely accessed pages' counters drain to zero

This provides scan resistance: a sequential scan only increments each page's counter to 1, while hot pages maintain high counts.

GCLOCK Tradeoffs

GCLOCK trades memory (2-8 bits per page vs. 1 bit) for frequency tracking. With millions of pages, this adds megabytes of overhead. Also, converting hardware R bits to counter increments requires periodic scanning, adding CPU overhead. The benefit is significantly better handling of frequency-varying workloads.

Clock-Pro: Hot and Cold Page Tracking

Clock-Pro is a sophisticated LRU approximation that distinguishes between pages based on their access history, specifically identifying hot pages (high reuse probability) and cold pages (low reuse probability). It provides scan resistance and adapts to changing access patterns.

Key concepts:

Resident pages: Currently in memory
Non-resident cold pages: Recently evicted pages tracked in metadata (not in memory)
Hot pages: Pages accessed at least twice since entering memory
Cold pages: Pages accessed only once (including newly loaded pages)

The genius insight:

Clock-Pro tracks the reuse distance (time between first and second access) of pages. Pages with short reuse distances are promoted to hot; those with long distances (or no second access) remain cold. Hot pages are protected from eviction; cold pages are the primary eviction candidates.

Converting Mermaid diagram...

Algorithm mechanics:

New pages enter as cold: Newly loaded pages are cold, eligible for eviction
Promotion to hot: If a cold page is accessed while still in memory, it becomes hot
Protection: Hot pages are not evicted; clock hand skips them
Demotion: Hot pages that aren't accessed for a full clock cycle become cold
Non-resident tracking: Evicted cold pages leave behind metadata
Fast promotion: If a non-resident page is reloaded, it enters as hot (it was needed again)

Scan resistance mechanism:

Sequential scan pages:

Load as cold
Never accessed again (scan moves to next page)
Evicted while still cold
Don't pollute the hot set

Working set pages:

Load as cold
Accessed again (within application loop)
Promoted to hot
Protected from eviction

clock_pro_simplified.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
/*
 * Clock-Pro Simplified Implementation
 * Tracks hot/cold status for scan resistance
 */
 
typedef enum { COLD, HOT } page_temp_t;
 
typedef struct clock_pro_page {
    uint32_t page_id;
    page_temp_t temperature;
    bool referenced;           // Reference bit
    bool resident;             // Is page in memory?
    struct clock_pro_page *next, *prev;
} clock_pro_page_t;
 
// Circular list of pages (both resident and non-resident)
clock_pro_page_t *clock_hand_hot;   // For demoting hot pages
clock_pro_page_t *clock_hand_cold;  // For evicting cold pages
 
// Target sizes (adaptive)
uint32_t target_cold_size;      // Target number of cold pages
 
/*
 * Page accessed - handle based on current temperature
 */
void clock_pro_access(uint32_t page_id) {
    clock_pro_page_t *page = find_page(page_id);
    
    if (page == NULL) {
        // Page not tracked - handle fault elsewhere
        return;
    }
    
    if (page->resident) {
        // Resident page access
        page->referenced = true;
        
        if (page->temperature == COLD) {
            // Cold resident page accessed -> promote to hot
            page->temperature = HOT;
            // Reduce target cold size (cold pages are valuable)
            if (target_cold_size > MIN_COLD_SIZE) {
                target_cold_size--;
            }
        }
    } else {
        // Non-resident page accessed (page fault on tracked page)
        // This page was evicted but needed again!
        // Will be loaded as HOT since it demonstrated reuse
        
        // Increase target cold size (we evicted something useful)
        if (target_cold_size < MAX_COLD_SIZE) {
            target_cold_size++;
        }
    }
}
 
/*
 * Find cold page to evict
 */
clock_pro_page_t* clock_pro_find_victim(void) {
    while (true) {
        clock_pro_page_t *page = clock_hand_cold;
        
        if (page->temperature == COLD && page->resident) {
            if (!page->referenced) {
                // Cold, not referenced - evict
                return page;
            }
            // Referenced cold page - promote to hot
            page->temperature = HOT;
            page->referenced = false;
        }
        
        // Advance past hot pages and move to next cold
        clock_hand_cold = clock_hand_cold->next;
    }
}
 
/*
 * Run hot hand - demote idle hot pages to cold
 */
void clock_pro_demote_hot(void) {
    clock_pro_page_t *page = clock_hand_hot;
    
    while (count_hot_pages() > target_hot_size) {
        if (page->temperature == HOT) {
            if (!page->referenced) {
                // Hot page not accessed in full cycle - demote
                page->temperature = COLD;
            } else {
                // Clear reference bit, give another chance
                page->referenced = false;
            }
        }
        clock_hand_hot = clock_hand_hot->next;
        page = clock_hand_hot;
    }
}

Clock-Pro in Linux

Linux's page cache doesn't use pure Clock-Pro, but incorporates similar ideas. The inactive and active lists function similarly to cold and hot sets. Pages must be accessed twice while on the inactive list to move to active. This provides scan resistance for I/O-heavy workloads.

CAR: Clock with Adaptive Replacement

CAR (Clock with Adaptive Replacement) combines the efficiency of the Clock algorithm with the adaptivity of ARC (Adaptive Replacement Cache). It automatically tunes between recency-favoring and frequency-favoring behavior based on workload characteristics.

The key innovation:

CAR maintains two clocks:

T1 clock: Pages seen only once (recency-focused)
T2 clock: Pages seen at least twice (frequency-focused)

An adaptive parameter p determines how many pages come from T1 vs. T2. If the workload benefits from recency, p increases to enlarge T1. If frequency matters more, p decreases to favor T2.

History tracking:

Like Clock-Pro, CAR tracks recently evicted pages (B1 for pages evicted from T1, B2 for pages evicted from T2). Faults on these "ghost" entries indicate which policy would have helped, triggering adaptation.

CAR Data Structures
Component	Contents	Purpose
T1	Pages accessed once (resident)	Capture recency
T2	Pages accessed 2+ times (resident)	Capture frequency
B1	Ghost entries: recently evicted from T1	Track recency misses
B2	Ghost entries: recently evicted from T2	Track frequency misses
p	Adaptive target size for T1	Balance recency/frequency

Adaptation mechanism:

On fault for page in B1 (was evicted from T1):
    // Recency would have helped - increase p (more T1 space)
    p = min(p + delta1, cache_size)
    
On fault for page in B2 (was evicted from T2):
    // Frequency would have helped - decrease p (more T2 space)
    p = max(p - delta2, 0)

delta calculation:

delta1 = max(|B2|/|B1|, 1): Proportional to how "wrong" current balance is
delta2 = max(|B1|/|B2|, 1): Symmetric adjustment

This creates a feedback loop: the cache learns from its mistakes and adjusts accordingly.

car_adaptation.pseudo

Pseudo

// CAR Adaptive Replacement Logic (Simplified)
 
state:
    T1, T2 = circular lists of resident pages
    B1, B2 = lists of ghost (evicted) page metadata
    p = adaptive target for |T1|, initially cache_size/2
 
function handle_page_fault(page_id):
    if page_id in T1 or page_id in T2:
        // Cache hit (shouldn't be a fault, but for completeness)
        promote_in_place(page_id)
        return
    
    if page_id in B1:
        // Ghost hit in B1 - recency miss
        // If we had kept this T1 page, we'd have hit
        delta = max(|B2| / |B1|, 1)
        p = min(p + delta, cache_size)
        
        // Load as T2 page (it's now frequently accessed)
        evict_and_load_to_T2(page_id)
        return
    
    if page_id in B2:
        // Ghost hit in B2 - frequency miss
        // If we had kept this T2 page, we'd have hit
        delta = max(|B1| / |B2|, 1)
        p = max(p - delta, 0)
        
        // Load as T2 page
        evict_and_load_to_T2(page_id)
        return
    
    // Complete miss - not in any list
    // Load as T1 page (seen once)
    evict_and_load_to_T1(page_id)
 
function evict():
    if |T1| > p:
        // T1 larger than target - evict from T1
        victim = clock_sweep_T1()
        move_to_B1(victim)
    else:
        // Evict from T2
        victim = clock_sweep_T2()
        move_to_B2(victim)
    
    return victim.frame

CAR advantages:

Self-tuning: No manual configuration needed
Scan resistant: Sequential scans stay in T1 and get evicted quickly
Frequency aware: Frequently accessed pages promoted to protected T2
Learning from mistakes: Ghost entries enable continuous adaptation
Clock efficiency: Uses clock scanning, not expensive list operations

Comparison with ARC:

CAR is to ARC what Clock is to LRU—an approximation with better efficiency:

Aspect	ARC	CAR
Base algorithm	Pure LRU lists	Clock scanning
Per-access cost	O(1) with complex bookkeeping	O(1) simpler
Implementation	Complex (list management)	Moderate
Performance	Optimal for adaptive	Near-optimal

Patent Considerations

ARC was patented by IBM, which has led to licensing complications for some open-source projects. CAR, developed separately, provides similar benefits. When implementing adaptive cache replacement, check current patent status and licensing requirements.

Real-World Implementation: Linux

Linux uses a sophisticated multi-queue page replacement system that incorporates many of the ideas we've discussed. Let's examine how Linux manages page reclamation.

Linux's LRU Lists:

Linux maintains several LRU lists per memory zone and per memory cgroup:

Active Anonymous: Recently used anonymous (non-file-backed) pages
Inactive Anonymous: Anonymous pages not recently used
Active File: Recently used file cache pages
Inactive File: File cache pages not recently used
Unevictable: Pages that cannot be evicted (locked, mlocked)

The two-list model provides scan resistance: Pages must be accessed while on the inactive list to move to active. This is similar to Clock-Pro's cold→hot promotion.

Converting Mermaid diagram...

linux_page_reclaim.pseudo

Pseudo

// Simplified Linux Page Reclaim Logic
 
// Key functions in mm/vmscan.c
 
function shrink_inactive_list(zone, nr_to_scan):
    pages_to_free = []
    
    for each page in zone.inactive_list (up to nr_to_scan):
        if page.referenced:
            // Page was accessed while inactive
            if should_activate(page):
                // Promote to active list
                move_to_active_list(page)
            else:
                // Keep on inactive, clear reference
                clear_referenced(page)
                rotate_to_tail(page)
        else:
            // Not referenced - candidate for reclaim
            if page.dirty and not page.writeback:
                // Start writeback, try again later
                start_writeback(page)
                rotate_to_tail(page)
            elif page.writeback:
                // Wait for writeback to complete
                rotate_to_tail(page)
            else:
                // Clean, not referenced - reclaim!
                pages_to_free.append(page)
    
    free_pages_bulk(pages_to_free)
 
function shrink_active_list(zone, nr_to_scan):
    // Move cold pages from active to inactive
    
    for each page in zone.active_list (up to nr_to_scan):
        if page.referenced:
            // Still active - keep it
            clear_referenced(page)
            // (stays on active, but reference cleared)
        else:
            // Not referenced - demote to inactive
            move_to_inactive_list(page)
 
// Reclaim orchestration
function kswapd_reclaim(zone):
    // Balance between file and anonymous pages
    // based on swappiness parameter (0-100)
    
    file_pressure = calculate_file_pressure()
    anon_pressure = calculate_anon_pressure()
    
    // Scan inactive lists
    shrink_inactive_list(zone.inactive_file, scan_count)
    shrink_inactive_list(zone.inactive_anon, scan_count * anon_ratio)
    
    // If still not enough, shrink active lists
    if free_pages < target:
        shrink_active_list(zone.active_file, scan_count)
        shrink_active_list(zone.active_anon, scan_count * anon_ratio)

Key Linux mechanisms:

kswapd: Background kernel thread that reclaims pages when memory is low (low watermark)
Direct reclaim: Synchronous reclaim when kswapd can't keep up (high memory pressure)
Swappiness: Tunable (0-100) controlling preference for file cache vs. anonymous pages
Memory cgroups: Separate LRU lists per cgroup for container isolation
Refault detection: Tracks pages that were evicted then reloaded (like B1/B2 in CAR)

The vm.swappiness parameter:

Value	Behavior
0	Strongly prefer keeping anonymous pages; swap only under OOM pressure
1-30	Prefer anonymous pages; swap is last resort
60 (default)	Balance between file cache and anonymous
100	Equally willing to swap anonymous pages

MGLRU: Next-Generation Linux

Linux 6.1+ includes Multi-Gen LRU (MGLRU), a significant improvement over the traditional LRU lists. MGLRU uses generational tracking (multiple age buckets instead of active/inactive) and improved scanning heuristics. It reduces page fault rates by up to 40% on some workloads while using less CPU for scanning.

FreeBSD and Windows Memory Management

Let's briefly examine how other major operating systems implement page replacement.

FreeBSD VM System:

FreeBSD uses a multi-queue system with pages classified into queues based on their state:

PQ_ACTIVE: Active pages with reference bit set recently
PQ_INACTIVE: Candidates for reclaim
PQ_LAUNDRY: Dirty pages queued for writeback
PQ_UNSWAPPABLE: Wired (pinned) pages
PQ_NONE: Free pages

The pagedaemon (similar to Linux's kswapd) runs as vm_pageout and scans pages using a clock-like mechanism. FreeBSD explicitly uses a two-handed approach for the inactive queue.

OS Page Replacement Comparison
Aspect	Linux	FreeBSD	Windows
Primary algorithm	Dual LRU lists	Clock-based queues	Working Set + Standby
Background daemon	kswapd	pagedaemon	Memory Manager
Scan trigger	Low watermark	Free page threshold	Working set limits
File vs anon	Separate lists + swappiness	Unified with bias	MemoryPriority
Dirty handling	Writeback + flush	Laundry queue	Modified page writer
Tuning	vm.swappiness, watermarks	vm.v_free_target, etc.	Registry settings

Windows Memory Manager:

Windows uses a fundamentally different approach based on working sets (per-process limits) combined with global standby/modified lists:

Per-process working set: Each process has a target working set size. Pages beyond this limit are candidates for trimming.
Trimming: When a process exceeds its working set limit or system memory is low, pages are removed from the process's working set.
Standby List: Clean trimmed pages go to the standby list. They remain in memory and can be quickly reclaimed if the process faults on them.
Modified List: Dirty trimmed pages go to the modified list for eventual writeback.
Zero Page List: Pages zeroed and ready for new allocation.

Windows' standby list acts as a second-chance mechanism: Trimmed pages aren't immediately discarded. If re-accessed before reuse, they're quickly restored to the working set (soft fault). This is similar to Clock-Pro's non-resident tracking.

Windows Memory Priority

•Priority 0-2: Low priority, trimmed first (background services)
•Priority 3: Default for most applications
•Priority 4-5: Higher priority, protected from trimming
•Priority 6-7: Highest normal priority (foreground applications)
•Pages also have standby priority affecting reclaim order from standby list

Superfetch/SysMain

Windows includes Superfetch (now called SysMain), which proactively loads pages based on predicted access patterns. It uses standby list pages, so it doesn't reduce available memory—the preloaded pages are immediately reclaimable if needed for other purposes.

Choosing the Right Algorithm

With so many algorithms available, how do you choose? Here's a decision framework:

Workload characteristics to consider:

Algorithm Selection Guide
Workload	Key Characteristic	Recommended Algorithm
Interactive desktop	Recency matters, varied apps	Two-list (Linux default)
Database server	Large scans + hot set	Clock-Pro or CAR
Streaming/analytics	Sequential access	Simple Clock (pages used once)
Embedded/RT	Predictability, simplicity	Basic Clock or NRU
General server	Mixed workloads	Adaptive (CAR) or Linux hybrid
Cache (Redis, etc.)	Pure cache semantics	LRU or LFU (not OS algorithm)

Implementation considerations:

Memory overhead: How many bits per page can you afford?
- Minimal: NRU (0 extra), Basic Clock (0 extra)
- Moderate: GCLOCK (2-8 bits), Enhanced Clock (0 extra)
- Higher: Clock-Pro/CAR (lists, ghost entries)
CPU overhead:
- Lowest: Basic Clock (amortized O(1))
- Moderate: Enhanced Clock, NRU (O(n) per eviction)
- Higher: CAR, Clock-Pro (list management)
Complexity:
- Simple: NRU, Basic Clock (tens of lines)
- Moderate: Enhanced Clock, GCLOCK (hundreds of lines)
- Complex: CAR, Clock-Pro (thousands of lines)
Tunability:
- Fixed: Basic Clock, NRU
- Tunable: Enhanced Clock (clearing interval)
- Adaptive: CAR, Clock-Pro (self-tuning)

For Production Systems

•Start with the OS default (Linux active/inactive lists)
•Tune swappiness based on workload
•Monitor page fault rates and scan activity
•Consider MGLRU on Linux 6.1+
•For custom systems, Clock-Pro or CAR provide best general performance

For Learning/Simulation

•Implement Basic Clock first
•Add Enhanced (R+M) handling
•Try NRU for comparison
•Implement GCLOCK to understand frequency
•Study Clock-Pro paper for advanced concepts

Summary: Advanced Clock Variants

This capstone synthesizes the full spectrum of LRU approximation algorithms, from simple reference bits to sophisticated adaptive schemes.

Key Takeaways

•Basic Clock limitations include: no frequency info, scan vulnerability, fixed policy, and coarse time resolution
•Two-Handed Clock separates clearing from eviction for smoother operation and reduced TLB pressure
•GCLOCK uses counters instead of bits to capture access frequency, providing scan resistance
•Clock-Pro distinguishes hot vs. cold pages, promoting those with short reuse distances and protecting them from eviction
•CAR adaptively balances recency and frequency by learning from ghost-hit patterns
•Linux uses active/inactive lists with separate tracking for file and anonymous pages, with MGLRU as the modern successor
•FreeBSD uses clock-based queue scanning with the pagedaemon
•Windows uses per-process working sets with global standby/modified lists and memory priority

The complete LRU approximation toolkit:

Algorithm	Key Feature	Best For
Basic Clock	Simple, amortized O(1)	General use, embedded
Enhanced Clock	I/O aware (R+M)	Disk-backed systems
NRU	Stateless, random selection	Simplicity, verification
GCLOCK	Frequency counters	Frequency-varying workloads
Clock-Pro	Hot/cold distinction	Scan resistance
CAR	Self-tuning	Mixed/changing workloads

Module complete! You now have a comprehensive understanding of LRU approximation algorithms—from the fundamental reference bit through advanced production implementations. These concepts form the foundation for understanding memory management in any modern operating system.

Module Complete

Congratulations! You've mastered LRU approximation algorithms—the practical heart of page replacement in real operating systems. From the simple elegance of the reference bit to the sophisticated adaptation of CAR, you now understand how systems make intelligent replacement decisions without the overhead of true LRU.

Clock Algorithm - Advanced Variants

Beyond Basic Clock: Production-Grade Page Replacement

This capstone page explores advanced Clock variants that address limitations of the basic algorithm:

Two-Handed Clock: Separates reference bit clearing from victim selection for smoother operation
GCLOCK (Generalized Clock): Uses counters instead of single bits for frequency-aware replacement
Clock-Pro: Distinguishes hot and cold pages with adaptive behavior
CAR (Clock with Adaptive Replacement): Self-tuning algorithm that balances recency and frequency

We'll also examine how Linux, FreeBSD, and Windows implement page replacement, seeing how textbook concepts translate to production code.

What You Will Learn

Limitations of Basic Clock

Before exploring advanced variants, let's precisely identify what problems they solve. The basic Clock algorithm has several limitations:

1. No Frequency Information

A page accessed once per interval looks identical (R=1) to a page accessed thousands of times. The reference bit is binary—it can't distinguish frequency patterns.

Example: Page A (code loop, 10,000 accesses/sec) and Page B (logging, 1 access/sec) both have R=1. If both survive one clearing interval, they're treated equivalently.

2. Scan Resistance Failure

When an application scans through a large file (larger than memory), the scan pages fill memory and evict useful working set pages. Basic Clock has no defense against this.

Example: A database doing a table scan for a report. Scan pages are used exactly once but evict hot index pages that will be needed again.

Basic Clock Limitations
Limitation	Symptom	Impact	Solution Approach
Binary reference	Can't distinguish frequency	Hot pages get same treatment as barely-used	Counters (GCLOCK)
Scan vulnerability	Sequential access pollutes cache	Working set evicted by one-time scans	Multi-list (Clock-Pro)
Fixed policy	One-size-fits-all replacement	Suboptimal for varying workloads	Adaptation (CAR)
Clearing overhead	TLB flush on each clear	Expensive on multi-core systems	Two-handed clock
No aging	Only 'recent' vs 'not recent'	Coarse time resolution	Multi-bit history

3. Fixed Time Resolution

4. Clearing Overhead

Clearing all reference bits requires:

Scanning all PTEs
Performing TLB invalidation (inter-processor interrupts on multi-core)

This overhead is acceptable every few seconds but limits how frequently we can clear.

5. No Workload Adaptation

Different workloads benefit from different policies:

Interactive applications: Favor recency (keep recent pages)
Scientific computing: Favor frequency (keep hot computation pages)
Database: Favor scan resistance (don't cache table scans)

Basic Clock uses a fixed policy regardless of workload characteristics.

Fundamental Tradeoff

Two-Handed Clock

How it works:

Leading hand (clear hand): Sweeps ahead, clearing reference bits
Trailing hand (evict hand): Follows behind, evicting pages with R=0
Gap distance: Fixed separation between hands (e.g., 25% of memory)

The gap gives pages time to be re-referenced between the clear hand passing and the evict hand arriving. This effectively creates a survival window.

Converting Mermaid diagram...

two_handed_clock.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
/*
 * Two-Handed Clock Algorithm
 * Separates clearing from eviction for smoother operation
 */
 
#define NUM_FRAMES 4096
#define GAP_PERCENT 25  // 25% of frames between hands
 
typedef struct {
    uint32_t clear_hand;    // Leading hand - clears R bits
    uint32_t evict_hand;    // Trailing hand - evicts R=0 pages
    uint32_t gap_frames;    // Fixed distance between hands
} two_handed_clock_t;
 
two_handed_clock_t clock;
 
void init_two_handed_clock(void) {
    clock.clear_hand = 0;
    clock.gap_frames = (NUM_FRAMES * GAP_PERCENT) / 100;
    // Evict hand trails by gap_frames
    clock.evict_hand = NUM_FRAMES - clock.gap_frames;
}
 
/*
 * Advance clear hand by one position
 * Called periodically (faster than eviction rate)
 */
void advance_clear_hand(void) {
    frame_entry_t *frame = &frames[clock.clear_hand];
    
    if (frame->occupied) {
        // Clear reference bit without evicting
        clear_reference_bit(frame);
    }
    
    clock.clear_hand = (clock.clear_hand + 1) % NUM_FRAMES;
}
 
/*
 * Find and evict a victim page
 */
uint32_t two_handed_clock_evict(void) {
    uint32_t start = clock.evict_hand;
    uint32_t checked = 0;
    
    while (checked < NUM_FRAMES) {
        frame_entry_t *frame = &frames[clock.evict_hand];
        
        if (frame->occupied && !is_referenced(frame)) {
            // Found victim: R=0, not referenced since clear hand passed
            uint32_t victim = clock.evict_hand;
            clock.evict_hand = (clock.evict_hand + 1) % NUM_FRAMES;
            return victim;
        }
        
        // If R=1, page was re-referenced after clearing
        // This is unusual but possible if evict hand catches up
        if (frame->occupied && is_referenced(frame)) {
            // Give second chance: clear and move on
            clear_reference_bit(frame);
        }
        
        clock.evict_hand = (clock.evict_hand + 1) % NUM_FRAMES;
        checked++;
        
        // Maintain gap: if evict hand catches up to clear hand,
        // advance clear hand too
        if (clock.evict_hand == clock.clear_hand) {
            advance_clear_hand();
        }
    }
    
    // Fallback: evict current position
    return clock.evict_hand;
}
 
/*
 * Background thread: continuously advance clear hand
 */
void clear_hand_sweeper(void) {
    while (true) {
        for (int i = 0; i < CLEAR_BATCH_SIZE; i++) {
            advance_clear_hand();
        }
        sleep_ms(CLEAR_INTERVAL_MS);
    }
}

Advantages of two-handed clock:

Decoupled operations: Clearing can proceed continuously without waiting for eviction needs
Predictable survival window: Pages have a known time window to prove usefulness
Reduced TLB pressure: Clearing spreads TLB invalidation over time instead of burst clearing
Background operation: Clear hand can run in a low-priority background thread

Tuning the gap:

Gap Size	Effect	Best For
Small (10%)	Quick eviction, short survival window	Memory-constrained, rapid turnover
Medium (25%)	Balanced survival time	General-purpose workloads
Large (50%)	Long survival window	Applications with irregular access patterns

FreeBSD Implementation

GCLOCK: Generalized Clock with Counters

How it works:

Each page has a counter (typically 2-8 bits) instead of a single reference bit
On page access: increment counter (up to maximum)
When clock hand passes: decrement counter instead of clearing
Evict when counter reaches zero

This creates a frequency-weighted aging mechanism where frequently accessed pages accumulate "survival credits."

gclock.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
/*
 * GCLOCK - Generalized Clock with reference counters
 * Provides frequency-aware page replacement
 */
 
#define MAX_COUNT 7  // 3-bit counter (0-7)
#define INCREMENT_ON_ACCESS 1
#define DECREMENT_ON_SWEEP 1
 
typedef struct {
    uint32_t page_number;
    uint8_t ref_count;      // 0 to MAX_COUNT
    bool dirty;
    bool occupied;
} gclock_frame_t;
 
gclock_frame_t frames[NUM_FRAMES];
uint32_t clock_hand = 0;
 
/*
 * Called when a page is accessed
 * Can be triggered by software (page fault handler sets R bit,
 * periodic scan converts R bit to counter increment)
 */
void gclock_page_accessed(uint32_t frame_num) {
    gclock_frame_t *frame = &frames[frame_num];
    
    // Increment counter, capped at MAX_COUNT
    if (frame->ref_count < MAX_COUNT) {
        frame->ref_count += INCREMENT_ON_ACCESS;
    }
}
 
/*
 * Periodic scan: convert hardware R bits to counter values
 */
void gclock_scan_reference_bits(void) {
    for (uint32_t i = 0; i < NUM_FRAMES; i++) {
        if (!frames[i].occupied) continue;
        
        uint64_t *pte = get_pte_for_frame(i);
        if (*pte & REFERENCE_BIT) {
            // Page was accessed - increment counter
            gclock_page_accessed(i);
            
            // Clear R bit
            *pte &= ~REFERENCE_BIT;
            invalidate_tlb_entry_for_frame(i);
        }
    }
}
 
/*
 * Find victim using GCLOCK algorithm
 */
uint32_t gclock_find_victim(void) {
    uint32_t start = clock_hand;
    
    while (true) {
        gclock_frame_t *frame = &frames[clock_hand];
        
        if (frame->occupied) {
            if (frame->ref_count == 0) {
                // Found victim - counter exhausted
                uint32_t victim = clock_hand;
                clock_hand = (clock_hand + 1) % NUM_FRAMES;
                return victim;
            }
            
            // Decrement counter, giving page more time
            frame->ref_count -= DECREMENT_ON_SWEEP;
        }
        
        clock_hand = (clock_hand + 1) % NUM_FRAMES;
        
        // Safety: prevent infinite loop
        if (clock_hand == start) {
            // Full sweep complete - all counters were positive
            // Continue sweeping; counters will eventually reach 0
        }
    }
}
 
/*
 * Enhanced GCLOCK with dirty-bit consideration
 * Prefers clean pages when counters are equal
 */
uint32_t gclock_find_victim_enhanced(void) {
    uint32_t best_clean = UINT32_MAX;
    uint32_t best_dirty = UINT32_MAX;
    uint8_t min_clean_count = MAX_COUNT + 1;
    uint8_t min_dirty_count = MAX_COUNT + 1;
    
    uint32_t start = clock_hand;
    uint32_t swept = 0;
    
    // First pass: find lowest-count clean and dirty pages
    while (swept < NUM_FRAMES) {
        gclock_frame_t *frame = &frames[clock_hand];
        
        if (frame->occupied) {
            if (frame->ref_count == 0) {
                // Immediate eviction candidate
                if (!frame->dirty) {
                    return clock_hand;  // Perfect: count 0, clean
                }
                if (best_dirty == UINT32_MAX) {
                    best_dirty = clock_hand;  // Count 0, but dirty
                }
            } else {
                // Track lowest counts for later
                if (!frame->dirty && frame->ref_count < min_clean_count) {
                    min_clean_count = frame->ref_count;
                    best_clean = clock_hand;
                }
                if (frame->dirty && frame->ref_count < min_dirty_count) {
                    min_dirty_count = frame->ref_count;
                    best_dirty = clock_hand;
                }
            }
            
            // Decrement all counters as we sweep
            if (frame->ref_count > 0) {
                frame->ref_count--;
            }
        }
        
        clock_hand = (clock_hand + 1) % NUM_FRAMES;
        swept++;
    }
    
    // Select victim: prefer clean, then dirty
    if (best_clean != UINT32_MAX) return best_clean;
    if (best_dirty != UINT32_MAX) return best_dirty;
    
    // Fallback (shouldn't happen with occupied frames)
    return start;
}

Counter semantics:

Counter Value	Interpretation	Survival
0	Unused, evict immediately	None
1	Used once, nearly expired	1 more sweep
4	Moderately used	4 more sweeps
7 (MAX)	Heavily used	7+ more sweeps

Frequency sensitivity:

GCLOCK captures access frequency because:

Each access increments the counter (until MAX)
Each sweep decrements all counters
Frequently accessed pages maintain higher counters
Rarely accessed pages' counters drain to zero

This provides scan resistance: a sequential scan only increments each page's counter to 1, while hot pages maintain high counts.

GCLOCK Tradeoffs

Clock-Pro: Hot and Cold Page Tracking

Key concepts:

Resident pages: Currently in memory
Non-resident cold pages: Recently evicted pages tracked in metadata (not in memory)
Hot pages: Pages accessed at least twice since entering memory
Cold pages: Pages accessed only once (including newly loaded pages)

The genius insight:

Converting Mermaid diagram...

Algorithm mechanics:

New pages enter as cold: Newly loaded pages are cold, eligible for eviction
Promotion to hot: If a cold page is accessed while still in memory, it becomes hot
Protection: Hot pages are not evicted; clock hand skips them
Demotion: Hot pages that aren't accessed for a full clock cycle become cold
Non-resident tracking: Evicted cold pages leave behind metadata
Fast promotion: If a non-resident page is reloaded, it enters as hot (it was needed again)

Scan resistance mechanism:

Sequential scan pages:

Load as cold
Never accessed again (scan moves to next page)
Evicted while still cold
Don't pollute the hot set

Working set pages:

Load as cold
Accessed again (within application loop)
Promoted to hot
Protected from eviction

clock_pro_simplified.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
/*
 * Clock-Pro Simplified Implementation
 * Tracks hot/cold status for scan resistance
 */
 
typedef enum { COLD, HOT } page_temp_t;
 
typedef struct clock_pro_page {
    uint32_t page_id;
    page_temp_t temperature;
    bool referenced;           // Reference bit
    bool resident;             // Is page in memory?
    struct clock_pro_page *next, *prev;
} clock_pro_page_t;
 
// Circular list of pages (both resident and non-resident)
clock_pro_page_t *clock_hand_hot;   // For demoting hot pages
clock_pro_page_t *clock_hand_cold;  // For evicting cold pages
 
// Target sizes (adaptive)
uint32_t target_cold_size;      // Target number of cold pages
 
/*
 * Page accessed - handle based on current temperature
 */
void clock_pro_access(uint32_t page_id) {
    clock_pro_page_t *page = find_page(page_id);
    
    if (page == NULL) {
        // Page not tracked - handle fault elsewhere
        return;
    }
    
    if (page->resident) {
        // Resident page access
        page->referenced = true;
        
        if (page->temperature == COLD) {
            // Cold resident page accessed -> promote to hot
            page->temperature = HOT;
            // Reduce target cold size (cold pages are valuable)
            if (target_cold_size > MIN_COLD_SIZE) {
                target_cold_size--;
            }
        }
    } else {
        // Non-resident page accessed (page fault on tracked page)
        // This page was evicted but needed again!
        // Will be loaded as HOT since it demonstrated reuse
        
        // Increase target cold size (we evicted something useful)
        if (target_cold_size < MAX_COLD_SIZE) {
            target_cold_size++;
        }
    }
}
 
/*
 * Find cold page to evict
 */
clock_pro_page_t* clock_pro_find_victim(void) {
    while (true) {
        clock_pro_page_t *page = clock_hand_cold;
        
        if (page->temperature == COLD && page->resident) {
            if (!page->referenced) {
                // Cold, not referenced - evict
                return page;
            }
            // Referenced cold page - promote to hot
            page->temperature = HOT;
            page->referenced = false;
        }
        
        // Advance past hot pages and move to next cold
        clock_hand_cold = clock_hand_cold->next;
    }
}
 
/*
 * Run hot hand - demote idle hot pages to cold
 */
void clock_pro_demote_hot(void) {
    clock_pro_page_t *page = clock_hand_hot;
    
    while (count_hot_pages() > target_hot_size) {
        if (page->temperature == HOT) {
            if (!page->referenced) {
                // Hot page not accessed in full cycle - demote
                page->temperature = COLD;
            } else {
                // Clear reference bit, give another chance
                page->referenced = false;
            }
        }
        clock_hand_hot = clock_hand_hot->next;
        page = clock_hand_hot;
    }
}

Clock-Pro in Linux

CAR: Clock with Adaptive Replacement

The key innovation:

CAR maintains two clocks:

T1 clock: Pages seen only once (recency-focused)
T2 clock: Pages seen at least twice (frequency-focused)

An adaptive parameter p determines how many pages come from T1 vs. T2. If the workload benefits from recency, p increases to enlarge T1. If frequency matters more, p decreases to favor T2.

History tracking:

CAR Data Structures
Component	Contents	Purpose
T1	Pages accessed once (resident)	Capture recency
T2	Pages accessed 2+ times (resident)	Capture frequency
B1	Ghost entries: recently evicted from T1	Track recency misses
B2	Ghost entries: recently evicted from T2	Track frequency misses
p	Adaptive target size for T1	Balance recency/frequency

Adaptation mechanism:

On fault for page in B1 (was evicted from T1):
    // Recency would have helped - increase p (more T1 space)
    p = min(p + delta1, cache_size)
    
On fault for page in B2 (was evicted from T2):
    // Frequency would have helped - decrease p (more T2 space)
    p = max(p - delta2, 0)

delta calculation:

delta1 = max(|B2|/|B1|, 1): Proportional to how "wrong" current balance is
delta2 = max(|B1|/|B2|, 1): Symmetric adjustment

This creates a feedback loop: the cache learns from its mistakes and adjusts accordingly.

car_adaptation.pseudo

Pseudo

// CAR Adaptive Replacement Logic (Simplified)
 
state:
    T1, T2 = circular lists of resident pages
    B1, B2 = lists of ghost (evicted) page metadata
    p = adaptive target for |T1|, initially cache_size/2
 
function handle_page_fault(page_id):
    if page_id in T1 or page_id in T2:
        // Cache hit (shouldn't be a fault, but for completeness)
        promote_in_place(page_id)
        return
    
    if page_id in B1:
        // Ghost hit in B1 - recency miss
        // If we had kept this T1 page, we'd have hit
        delta = max(|B2| / |B1|, 1)
        p = min(p + delta, cache_size)
        
        // Load as T2 page (it's now frequently accessed)
        evict_and_load_to_T2(page_id)
        return
    
    if page_id in B2:
        // Ghost hit in B2 - frequency miss
        // If we had kept this T2 page, we'd have hit
        delta = max(|B1| / |B2|, 1)
        p = max(p - delta, 0)
        
        // Load as T2 page
        evict_and_load_to_T2(page_id)
        return
    
    // Complete miss - not in any list
    // Load as T1 page (seen once)
    evict_and_load_to_T1(page_id)
 
function evict():
    if |T1| > p:
        // T1 larger than target - evict from T1
        victim = clock_sweep_T1()
        move_to_B1(victim)
    else:
        // Evict from T2
        victim = clock_sweep_T2()
        move_to_B2(victim)
    
    return victim.frame

CAR advantages:

Self-tuning: No manual configuration needed
Scan resistant: Sequential scans stay in T1 and get evicted quickly
Frequency aware: Frequently accessed pages promoted to protected T2
Learning from mistakes: Ghost entries enable continuous adaptation
Clock efficiency: Uses clock scanning, not expensive list operations

Comparison with ARC:

CAR is to ARC what Clock is to LRU—an approximation with better efficiency:

Aspect	ARC	CAR
Base algorithm	Pure LRU lists	Clock scanning
Per-access cost	O(1) with complex bookkeeping	O(1) simpler
Implementation	Complex (list management)	Moderate
Performance	Optimal for adaptive	Near-optimal

Patent Considerations

Real-World Implementation: Linux

Linux uses a sophisticated multi-queue page replacement system that incorporates many of the ideas we've discussed. Let's examine how Linux manages page reclamation.

Linux's LRU Lists:

Linux maintains several LRU lists per memory zone and per memory cgroup:

Active Anonymous: Recently used anonymous (non-file-backed) pages
Inactive Anonymous: Anonymous pages not recently used
Active File: Recently used file cache pages
Inactive File: File cache pages not recently used
Unevictable: Pages that cannot be evicted (locked, mlocked)

The two-list model provides scan resistance: Pages must be accessed while on the inactive list to move to active. This is similar to Clock-Pro's cold→hot promotion.

Converting Mermaid diagram...

linux_page_reclaim.pseudo

Pseudo

// Simplified Linux Page Reclaim Logic
 
// Key functions in mm/vmscan.c
 
function shrink_inactive_list(zone, nr_to_scan):
    pages_to_free = []
    
    for each page in zone.inactive_list (up to nr_to_scan):
        if page.referenced:
            // Page was accessed while inactive
            if should_activate(page):
                // Promote to active list
                move_to_active_list(page)
            else:
                // Keep on inactive, clear reference
                clear_referenced(page)
                rotate_to_tail(page)
        else:
            // Not referenced - candidate for reclaim
            if page.dirty and not page.writeback:
                // Start writeback, try again later
                start_writeback(page)
                rotate_to_tail(page)
            elif page.writeback:
                // Wait for writeback to complete
                rotate_to_tail(page)
            else:
                // Clean, not referenced - reclaim!
                pages_to_free.append(page)
    
    free_pages_bulk(pages_to_free)
 
function shrink_active_list(zone, nr_to_scan):
    // Move cold pages from active to inactive
    
    for each page in zone.active_list (up to nr_to_scan):
        if page.referenced:
            // Still active - keep it
            clear_referenced(page)
            // (stays on active, but reference cleared)
        else:
            // Not referenced - demote to inactive
            move_to_inactive_list(page)
 
// Reclaim orchestration
function kswapd_reclaim(zone):
    // Balance between file and anonymous pages
    // based on swappiness parameter (0-100)
    
    file_pressure = calculate_file_pressure()
    anon_pressure = calculate_anon_pressure()
    
    // Scan inactive lists
    shrink_inactive_list(zone.inactive_file, scan_count)
    shrink_inactive_list(zone.inactive_anon, scan_count * anon_ratio)
    
    // If still not enough, shrink active lists
    if free_pages < target:
        shrink_active_list(zone.active_file, scan_count)
        shrink_active_list(zone.active_anon, scan_count * anon_ratio)

Key Linux mechanisms:

kswapd: Background kernel thread that reclaims pages when memory is low (low watermark)
Direct reclaim: Synchronous reclaim when kswapd can't keep up (high memory pressure)
Swappiness: Tunable (0-100) controlling preference for file cache vs. anonymous pages
Memory cgroups: Separate LRU lists per cgroup for container isolation
Refault detection: Tracks pages that were evicted then reloaded (like B1/B2 in CAR)

The vm.swappiness parameter:

Value	Behavior
0	Strongly prefer keeping anonymous pages; swap only under OOM pressure
1-30	Prefer anonymous pages; swap is last resort
60 (default)	Balance between file cache and anonymous
100	Equally willing to swap anonymous pages

MGLRU: Next-Generation Linux

FreeBSD and Windows Memory Management

Let's briefly examine how other major operating systems implement page replacement.

FreeBSD VM System:

FreeBSD uses a multi-queue system with pages classified into queues based on their state:

PQ_ACTIVE: Active pages with reference bit set recently
PQ_INACTIVE: Candidates for reclaim
PQ_LAUNDRY: Dirty pages queued for writeback
PQ_UNSWAPPABLE: Wired (pinned) pages
PQ_NONE: Free pages

The pagedaemon (similar to Linux's kswapd) runs as vm_pageout and scans pages using a clock-like mechanism. FreeBSD explicitly uses a two-handed approach for the inactive queue.

OS Page Replacement Comparison
Aspect	Linux	FreeBSD	Windows
Primary algorithm	Dual LRU lists	Clock-based queues	Working Set + Standby
Background daemon	kswapd	pagedaemon	Memory Manager
Scan trigger	Low watermark	Free page threshold	Working set limits
File vs anon	Separate lists + swappiness	Unified with bias	MemoryPriority
Dirty handling	Writeback + flush	Laundry queue	Modified page writer
Tuning	vm.swappiness, watermarks	vm.v_free_target, etc.	Registry settings

Windows Memory Manager:

Windows uses a fundamentally different approach based on working sets (per-process limits) combined with global standby/modified lists:

Per-process working set: Each process has a target working set size. Pages beyond this limit are candidates for trimming.
Trimming: When a process exceeds its working set limit or system memory is low, pages are removed from the process's working set.
Standby List: Clean trimmed pages go to the standby list. They remain in memory and can be quickly reclaimed if the process faults on them.
Modified List: Dirty trimmed pages go to the modified list for eventual writeback.
Zero Page List: Pages zeroed and ready for new allocation.

Windows Memory Priority

•Priority 0-2: Low priority, trimmed first (background services)
•Priority 3: Default for most applications
•Priority 4-5: Higher priority, protected from trimming
•Priority 6-7: Highest normal priority (foreground applications)
•Pages also have standby priority affecting reclaim order from standby list

Superfetch/SysMain

Choosing the Right Algorithm

With so many algorithms available, how do you choose? Here's a decision framework:

Workload characteristics to consider:

Algorithm Selection Guide
Workload	Key Characteristic	Recommended Algorithm
Interactive desktop	Recency matters, varied apps	Two-list (Linux default)
Database server	Large scans + hot set	Clock-Pro or CAR
Streaming/analytics	Sequential access	Simple Clock (pages used once)
Embedded/RT	Predictability, simplicity	Basic Clock or NRU
General server	Mixed workloads	Adaptive (CAR) or Linux hybrid
Cache (Redis, etc.)	Pure cache semantics	LRU or LFU (not OS algorithm)

Implementation considerations:

Memory overhead: How many bits per page can you afford?
- Minimal: NRU (0 extra), Basic Clock (0 extra)
- Moderate: GCLOCK (2-8 bits), Enhanced Clock (0 extra)
- Higher: Clock-Pro/CAR (lists, ghost entries)
CPU overhead:
- Lowest: Basic Clock (amortized O(1))
- Moderate: Enhanced Clock, NRU (O(n) per eviction)
- Higher: CAR, Clock-Pro (list management)
Complexity:
- Simple: NRU, Basic Clock (tens of lines)
- Moderate: Enhanced Clock, GCLOCK (hundreds of lines)
- Complex: CAR, Clock-Pro (thousands of lines)
Tunability:
- Fixed: Basic Clock, NRU
- Tunable: Enhanced Clock (clearing interval)
- Adaptive: CAR, Clock-Pro (self-tuning)

For Production Systems

•Start with the OS default (Linux active/inactive lists)
•Tune swappiness based on workload
•Monitor page fault rates and scan activity
•Consider MGLRU on Linux 6.1+
•For custom systems, Clock-Pro or CAR provide best general performance

For Learning/Simulation

•Implement Basic Clock first
•Add Enhanced (R+M) handling
•Try NRU for comparison
•Implement GCLOCK to understand frequency
•Study Clock-Pro paper for advanced concepts

Summary: Advanced Clock Variants

This capstone synthesizes the full spectrum of LRU approximation algorithms, from simple reference bits to sophisticated adaptive schemes.

Key Takeaways

•Basic Clock limitations include: no frequency info, scan vulnerability, fixed policy, and coarse time resolution
•Two-Handed Clock separates clearing from eviction for smoother operation and reduced TLB pressure
•GCLOCK uses counters instead of bits to capture access frequency, providing scan resistance
•Clock-Pro distinguishes hot vs. cold pages, promoting those with short reuse distances and protecting them from eviction
•CAR adaptively balances recency and frequency by learning from ghost-hit patterns
•Linux uses active/inactive lists with separate tracking for file and anonymous pages, with MGLRU as the modern successor
•FreeBSD uses clock-based queue scanning with the pagedaemon
•Windows uses per-process working sets with global standby/modified lists and memory priority

The complete LRU approximation toolkit:

Algorithm	Key Feature	Best For
Basic Clock	Simple, amortized O(1)	General use, embedded
Enhanced Clock	I/O aware (R+M)	Disk-backed systems
NRU	Stateless, random selection	Simplicity, verification
GCLOCK	Frequency counters	Frequency-varying workloads
Clock-Pro	Hot/cold distinction	Scan resistance
CAR	Self-tuning	Mixed/changing workloads

Module Complete