Operating SystemsLRU Approximations

LRU Approximation Algorithms

LevelIntermediate

Duration75 mins

TopicLRU Approximations

1 / 5

Reference Bit

The Hardware-Software Partnership

In the previous module, we discovered that LRU (Least Recently Used) offers excellent page replacement performance, closely approximating the theoretically optimal algorithm. However, we also uncovered a brutal reality: true LRU requires timestamping every memory access or maintaining perfectly ordered data structures updated on every memory reference—operations that occur billions of times per second on modern systems.

This overhead is simply unacceptable. A memory reference that takes 100 nanoseconds cannot afford additional bookkeeping that might double or triple its latency. So how do real operating systems achieve LRU-like behavior without LRU-like cost?

The answer lies in a elegant hardware-software partnership: modern processors provide a single-bit indicator called the reference bit (also known as the access bit or used bit), which the hardware sets automatically on every memory access. The operating system periodically samples and clears these bits, using the accumulated information to approximate page access recency without any per-access software intervention.

What You Will Learn

By the end of this page, you will understand the reference bit at a fundamental level: its hardware implementation, how the MMU sets it during address translation, how the OS clears and samples it, and why this simple mechanism forms the foundation for all practical LRU approximation algorithms used in production operating systems.

The Cost of True LRU Revisited

Before diving into the reference bit, let's precisely quantify why true LRU is impractical. This understanding motivates the clever compromises that make LRU approximation both necessary and valuable.

True LRU Implementation Options:

Timestamp Approach: Record the current time on every memory access, then scan all pages to find the oldest timestamp during replacement
Stack Approach: Maintain a stack where accessed pages move to the top, with the LRU page always at the bottom
Counter Approach: Each page has a counter; on access, copy the clock value to the page's counter

Each approach requires significant work per memory reference:

True LRU Implementation Costs
Approach	Per-Access Cost	Replacement Cost	Memory Overhead
Timestamp	Write 64-bit timestamp	O(n) scan all pages	8 bytes per page
Stack	Remove and push (O(n) worst case)	O(1) pop from bottom	Pointers per page
Counter	Write counter value	O(n) scan for minimum	4-8 bytes per page

The scale problem:

Consider a modern system running a workload with the following characteristics:

Memory references: 1 billion per second (conservative for modern CPUs)
Memory access time: 100 nanoseconds
Software overhead for true LRU: 10-50 nanoseconds per access (optimistic)

Adding even 10ns of overhead per access increases total memory access time by 10%, which translates directly to application slowdown. For memory-intensive workloads, this overhead compounds—the CPU spends more time on bookkeeping than on actual computation.

The fundamental insight:

We don't actually need to know the exact order of page accesses. We need to know approximately which pages were used recently and which weren't. This relaxation of requirements opens the door to dramatically more efficient solutions.

The 80/20 Principle in Action

LRU approximation algorithms embody the engineering principle that 80% of the benefit can often be achieved with 20% of the cost. By accepting slight inaccuracy in LRU ordering, we eliminate virtually all per-access overhead while maintaining excellent replacement decisions in practice.

The Reference Bit Concept

The reference bit (R-bit) is a single bit associated with each page table entry (PTE) that indicates whether the page has been accessed since the bit was last cleared. This deceptively simple mechanism provides the foundation for all practical LRU approximation.

Core semantics:

Hardware sets the bit to 1 automatically whenever the page is read from or written to
Software clears the bit to 0 periodically to track access within time windows
The bit persists until explicitly cleared by the operating system

This creates an elegant information flow:

OS clears all reference bits at time T₀
Hardware sets bits as pages are accessed during interval [T₀, T₁]
At time T₁, OS samples the bits: R=1 means "accessed recently," R=0 means "not accessed recently"
OS clears bits again, beginning a new sampling interval

Converting Mermaid diagram...

Key properties of the reference bit:

Binary, not temporal: The bit only records whether access occurred, not when or how many times
Per-page granularity: Each page has its own reference bit, enabling per-page access tracking
Zero per-access software overhead: The hardware sets the bit as part of normal address translation—no software intervention required
Sampling-based information: The OS must periodically sample and clear bits to convert binary information into temporal approximations
Physical implementation: The bit exists in the page table entry, which resides in main memory and is cached in the TLB

Historical Context

The reference bit has been a standard feature of memory management units since the 1970s. IBM's System/370 (1970) included reference bits, as did the Intel 8086's follow-on processors with protected mode. Today, virtually every architecture supporting virtual memory—x86, x86-64, ARM, RISC-V, POWER—provides reference bits in page table entries.

Hardware Implementation in the MMU

Understanding how the reference bit works at the hardware level illuminates why this mechanism is so efficient. The setting of the reference bit is integrated into the address translation process that must happen anyway for every memory access.

Address Translation with Reference Bit Setting:

When the CPU issues a memory access with a virtual address, the MMU performs the following steps:

Extract page number from the virtual address
Check TLB for cached translation
If TLB miss, walk the page table hierarchy to find the PTE
Check permissions (read/write/execute, user/supervisor)
Set reference bit in the PTE if not already set (atomic hardware operation)
If write access, also set the dirty bit (covered separately)
Cache the translation in the TLB
Form physical address and complete the memory access

The critical insight: step 5 adds negligible overhead because the PTE must be read anyway for translation. The hardware simply performs a conditional write-back as part of the same memory transaction.

reference_bit_hardware.pseudo

Pseudo

// Pseudocode: MMU Reference Bit Setting (Hardware Logic)
 
function translate_address(virtual_address, access_type):
    page_number = extract_page_number(virtual_address)
    page_offset = extract_offset(virtual_address)
    
    // Step 1: TLB Lookup
    tlb_entry = tlb_lookup(page_number)
    
    if tlb_entry.valid:
        pte = tlb_entry.pte
    else:
        // TLB miss: walk page table
        pte = page_table_walk(page_number)
        if pte is NULL or not pte.present:
            raise PAGE_FAULT
        
        // Cache in TLB for future accesses
        tlb_insert(page_number, pte)
    
    // Step 2: Permission Check
    if not check_permissions(pte, access_type):
        raise PROTECTION_FAULT
    
    // Step 3: Set Reference Bit (Atomic Hardware Operation)
    // This happens AUTOMATICALLY with no software intervention
    if pte.reference_bit == 0:
        pte.reference_bit = 1  // Hardware atomic set
        
        // Note: Some architectures require TLB flush coordination
        // to ensure the PTE write is visible
    
    // Step 4: Set Dirty Bit if Write
    if access_type == WRITE and pte.dirty_bit == 0:
        pte.dirty_bit = 1  // Hardware atomic set
    
    // Step 5: Form Physical Address
    frame_number = pte.frame_number
    physical_address = (frame_number << PAGE_SHIFT) | page_offset
    
    return physical_address

x86/x86-64 Page Table Entry Format:

On x86-64 systems, the reference bit is bit 5 of the 64-bit page table entry (called the "Accessed" bit in Intel terminology):

63    62:52    51:M    M-1:12    11:9   8     7     6     5     4     3     2     1     0
┌─────┬───────┬───────┬─────────┬──────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ NX  │ Avail │ Rsvd  │ Frame # │Avail │ G   │ PAT │ D   │ A   │ PCD │ PWT │ U/S │ R/W │ P   │
└─────┴───────┴───────┴─────────┴──────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
                                                        ▲
                                                        │
                                           Reference Bit (Accessed)

Bit 5 (A - Accessed): Set by hardware on any access (read or write)
Bit 6 (D - Dirty): Set by hardware only on write access
The "Avail" fields are available for OS use, not touched by hardware

TLB Consistency Challenge

When the reference bit is set, the update goes to the page table in memory. However, the TLB may cache a copy of the PTE without the bit set. Modern CPUs handle this by either: (1) updating both the PTE and TLB entry atomically, or (2) always reading reference/dirty bits from memory. The OS must also issue TLB flushes when clearing bits to ensure consistency.

Operating System Sampling and Clearing

While hardware sets the reference bit automatically, the operating system controls the sampling and clearing of these bits. This gives the OS flexibility in how it uses reference bit information for page replacement decisions.

The OS's Role:

Periodic clearing: Clear reference bits at regular intervals (e.g., every timer interrupt)
Sampling: Read reference bits when making replacement decisions
Aggregation: Combine information across multiple sampling intervals to approximate recency
Policy decisions: Use accumulated data to select victim pages

Clearing strategies:

Reference Bit Clearing Strategies
Strategy	When to Clear	Advantages	Disadvantages
Timer-based	Every N timer interrupts	Predictable intervals, low overhead	Fixed interval may not match workload
On page fault	When searching for victim	Only clear when necessary	May clear too infrequently
Hybrid	Timer + additional clears during high pressure	Adaptive to memory pressure	More complex implementation
Per-page rotation	Clear each page's bit at different times	Spreads information across time	Complex bookkeeping

reference_bit_sampling.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
/*
 * Reference Bit Sampling and Clearing in a Page Replacement System
 * This code illustrates OS-level manipulation of reference bits
 */
 
#include <stdint.h>
#include <stdbool.h>
 
#define NUM_FRAMES 1024
#define REFERENCE_BIT_MASK 0x20  // Bit 5 in x86 PTE
 
typedef struct {
    uint64_t pte;           // The actual page table entry
    uint8_t age;            // Age counter for aging algorithm
    bool in_memory;         // Is this page currently in a frame?
} page_info_t;
 
typedef struct {
    uint32_t page_number;   // Which page is in this frame
    uint32_t process_id;    // Which process owns this page
} frame_t;
 
page_info_t* page_table;
frame_t frames[NUM_FRAMES];
 
/*
 * Clear reference bit for a single page
 * Returns the previous value of the reference bit
 */
bool clear_reference_bit(page_info_t* page) {
    bool was_referenced = (page->pte & REFERENCE_BIT_MASK) != 0;
    
    // Clear the bit in the page table entry
    page->pte &= ~REFERENCE_BIT_MASK;
    
    // CRITICAL: Must invalidate TLB entry to ensure consistency
    // The TLB may have cached a copy with the bit set
    invalidate_tlb_entry(page);
    
    return was_referenced;
}
 
/*
 * Sample reference bit without clearing
 * Used when we want to read but not reset the tracking
 */
bool sample_reference_bit(page_info_t* page) {
    return (page->pte & REFERENCE_BIT_MASK) != 0;
}
 
/*
 * Timer interrupt handler: periodic reference bit processing
 * Called approximately every 10ms (100 Hz timer)
 */
void timer_interrupt_handler(void) {
    static uint32_t tick_count = 0;
    tick_count++;
    
    // Every 10 ticks (100ms), process reference bits
    if (tick_count % 10 == 0) {
        process_reference_bits();
    }
    
    // Standard timer housekeeping...
    schedule_next_process();
}
 
/*
 * Process reference bits for all resident pages
 * This implements "aging" - shift right and add reference bit at left
 */
void process_reference_bits(void) {
    for (int i = 0; i < NUM_FRAMES; i++) {
        if (frames[i].page_number == INVALID_PAGE)
            continue;
            
        page_info_t* page = get_page_info(
            frames[i].process_id, 
            frames[i].page_number
        );
        
        // Shift age right (older accesses become less significant)
        page->age >>= 1;
        
        // If referenced, set high bit of age
        if (clear_reference_bit(page)) {
            page->age |= 0x80;  // Set most significant bit
        }
    }
}
 
/*
 * Find victim page for replacement
 * Select page with lowest age value (least recently used approximation)
 */
uint32_t find_victim_frame(void) {
    uint32_t victim_frame = 0;
    uint8_t min_age = 0xFF;
    
    for (int i = 0; i < NUM_FRAMES; i++) {
        if (frames[i].page_number == INVALID_PAGE)
            continue;
            
        page_info_t* page = get_page_info(
            frames[i].process_id, 
            frames[i].page_number
        );
        
        // Lower age = less recently/frequently used
        if (page->age < min_age) {
            min_age = page->age;
            victim_frame = i;
        }
    }
    
    return victim_frame;
}

The TLB Invalidation Imperative:

When the OS clears a reference bit in the page table entry, it must also invalidate the corresponding TLB entry. Otherwise:

The TLB may still have the old PTE with R=1 cached
Future accesses hit the TLB and use the cached translation
The hardware doesn't re-read the PTE, so the memory's R=0 remains unchanged
The OS incorrectly concludes the page was never accessed

This TLB invalidation adds some overhead to the clearing operation, but it's essential for correctness. Fortunately, clearing happens infrequently (e.g., 10-100 times per second) rather than on every memory access.

Converting Binary Information to Temporal Approximation

A single reference bit provides binary information: accessed (1) or not accessed (0). To approximate LRU, we need temporal information: when was the page last accessed, or at least a ranking of pages by recency.

The key insight is that repeated sampling over time converts binary snapshots into temporal approximations.

Single-sample information:

R=1: Page was accessed sometime during the last sampling interval
R=0: Page was NOT accessed during the last sampling interval

Multi-sample information accumulation:

Reference Bit History Interpretation
History Pattern	Interpretation	LRU Approximation
...1111 (always set)	Constantly accessed	Highly active, poor victim
...1110 (recently not set)	Active but cooling down	Recently paused, moderate choice
...0001 (just set)	Recently accessed after long idle	Just became active, poor victim
...0000 (never set)	No recent access	Excellent victim candidate
...1010 (periodic)	Intermittently accessed	Moderate activity, middle choice

The Aging Algorithm (Preview):

To capture this history, operating systems often use an aging technique:

Each page maintains an age counter (typically 8 bits)
On each sampling interval:
- Shift the age counter right by 1 bit (divide by 2)
- If R=1, set the leftmost bit of the age counter
- Clear R to begin tracking the next interval

The result: recent accesses contribute to high-order bits (large value), while older accesses shift into low-order bits (small contribution). A page with age 0b10000000 (128) was accessed in the last interval; a page with 0b00000001 (1) was accessed 7 intervals ago and not since.

Mathematical formalization:

If Rᵢ represents the reference bit value in interval i (most recent = interval 0), then:

Age = R₀×2⁷ + R₁×2⁶ + R₂×2⁵ + R₃×2⁴ + R₄×2³ + R₅×2² + R₆×2¹ + R₇×2⁰

This makes recent accesses exponentially more significant than older ones, approximating LRU while requiring only 8 bits of storage per page.

Precision vs. Memory Tradeoff

Using 8 bits of age history can differentiate 256 levels of recency. Using 16 bits provides 65,536 levels. However, in practice, 8 bits prove sufficient for excellent page replacement decisions. The marginal benefit of additional history bits diminishes rapidly, while the memory overhead doubles.

Limitations and Architectural Considerations

While the reference bit is remarkably useful, it has inherent limitations that shape the design of LRU approximation algorithms.

Fundamental Limitations:

Reference Bit Limitations

•Binary, not frequency: A page accessed once and a page accessed 1000 times in the same interval look identical (both have R=1)
•No ordering within intervals: Two pages accessed in the same interval cannot be ordered by recency
•Interval boundary artifacts: A page accessed just before clearing may appear "older" than one accessed just after, despite being more recently used
•Clearing overhead: Each clear requires TLB invalidation, which has non-trivial cost on modern multi-core systems
•Sampling rate tradeoff: Faster sampling provides finer granularity but increases overhead; slower sampling reduces overhead but loses precision

Architectural variations affecting reference bit behavior:

Architecture-Specific Reference Bit Behaviors
Architecture	Reference Bit Support	Special Considerations
x86/x86-64	Full hardware support	TLB caching requires careful invalidation
ARM	Hardware support (AF bit)	64KB granule support in ARMv8.1+
RISC-V	Optional (Sv39/Sv48)	May require software emulation
POWER	Full hardware support	Segment lookaside buffer considerations
MIPS	Software-managed TLB	Reference bits often emulated via TLB faults

Multi-core complications:

On multi-core systems, multiple CPUs may access the same page simultaneously. The reference bit must be set atomically to avoid race conditions. Modern architectures handle this through:

Atomic read-modify-write operations on PTE updates
Hardware cache coherency protocols ensuring all cores see the update
TLB shootdown mechanisms to invalidate cached translations across cores

When the OS clears reference bits, it must issue inter-processor interrupts (IPIs) to force all cores to flush relevant TLB entries—a significant overhead on many-core systems.

Software-Managed Reference Bits

On architectures without hardware reference bit support (or when more precise tracking is needed), the OS can emulate reference bits by marking pages as invalid. Any access triggers a page fault; the handler marks the page as referenced and makes it valid again. This provides flexible tracking at the cost of fault overhead for every first access.

The Foundation for LRU Approximation Algorithms

The reference bit serves as the fundamental building block for all practical LRU approximation algorithms. Each algorithm represents a different approach to extracting maximum useful information from this single-bit signal.

The algorithm landscape built on reference bits:

LRU Approximation Algorithms

•NRU (Not Recently Used): Combines reference and dirty bits to create 4 priority classes; evicts from lowest class
•Second-Chance / Clock: Gives each page a "second chance" before eviction if reference bit is set; circular queue implementation
•Enhanced Second-Chance: Extends second-chance by considering both reference and dirty bits for better victim selection
•Aging: Maintains a multi-bit history counter per page, shifted on each sampling interval
•Working Set Clock: Combines clock algorithm with working set concepts for adaptive replacement

Converting Mermaid diagram...

Comparing information usage:

Algorithm	Reference Bit Usage	Additional State	Complexity
NRU	Current R value + D bit	None	O(n) scan
Second-Chance	Clear R when page gets chance	Queue position	O(1) amortized
Clock	Same as second-chance	Clock hand position	O(1) amortized
Enhanced	Current R + D bits	Circular position	O(n) worst case
Aging	Accumulate R into history	8-16 bits per page	O(n) scan

The following pages will explore each of these algorithms in detail, revealing how they transform the simple reference bit into effective page replacement policies.

Summary: The Reference Bit

The reference bit exemplifies elegant systems design: a minimal hardware mechanism that enables sophisticated software policies. Let's consolidate the key concepts:

Key Takeaways

•True LRU is impractical at hardware speeds due to per-access bookkeeping overhead that would significantly slow memory access
•The reference bit is a single bit per page, set automatically by hardware on any access, providing zero-cost access tracking
•Hardware sets, software clears: The MMU sets the bit during address translation; the OS periodically clears it to track access within time windows
•Binary to temporal conversion: Repeated sampling and clearing transforms binary access snapshots into approximate recency information
•TLB coordination is essential: When clearing reference bits, the OS must invalidate TLB entries to ensure hardware re-reads the cleared values
•Foundation for practical algorithms: All production LRU approximation algorithms—NRU, Second-Chance, Clock, Aging—build on reference bit information

What's next:

With the reference bit mechanism firmly understood, we'll explore specific algorithms that use this information. The next page examines the Second-Chance (Clock) Algorithm, which provides an elegant circular-queue implementation for efficient LRU approximation.

Page Complete

You now understand the reference bit—the hardware foundation for practical LRU approximation. This single bit, automatically set by the MMU and periodically cleared by the OS, enables efficient page replacement decisions without the prohibitive overhead of true LRU tracking.

1 / 5

Loading learning content...

Operating SystemsLRU Approximations

LRU Approximation Algorithms

LevelIntermediate

Duration75 mins

TopicLRU Approximations

1 / 5

Reference Bit

The Hardware-Software Partnership

What You Will Learn

The Cost of True LRU Revisited

True LRU Implementation Options:

Timestamp Approach: Record the current time on every memory access, then scan all pages to find the oldest timestamp during replacement
Stack Approach: Maintain a stack where accessed pages move to the top, with the LRU page always at the bottom
Counter Approach: Each page has a counter; on access, copy the clock value to the page's counter

Each approach requires significant work per memory reference:

True LRU Implementation Costs
Approach	Per-Access Cost	Replacement Cost	Memory Overhead
Timestamp	Write 64-bit timestamp	O(n) scan all pages	8 bytes per page
Stack	Remove and push (O(n) worst case)	O(1) pop from bottom	Pointers per page
Counter	Write counter value	O(n) scan for minimum	4-8 bytes per page

The scale problem:

Consider a modern system running a workload with the following characteristics:

Memory references: 1 billion per second (conservative for modern CPUs)
Memory access time: 100 nanoseconds
Software overhead for true LRU: 10-50 nanoseconds per access (optimistic)

The fundamental insight:

The 80/20 Principle in Action

The Reference Bit Concept

Core semantics:

Hardware sets the bit to 1 automatically whenever the page is read from or written to
Software clears the bit to 0 periodically to track access within time windows
The bit persists until explicitly cleared by the operating system

This creates an elegant information flow:

OS clears all reference bits at time T₀
Hardware sets bits as pages are accessed during interval [T₀, T₁]
At time T₁, OS samples the bits: R=1 means "accessed recently," R=0 means "not accessed recently"
OS clears bits again, beginning a new sampling interval

Converting Mermaid diagram...

Key properties of the reference bit:

Binary, not temporal: The bit only records whether access occurred, not when or how many times
Per-page granularity: Each page has its own reference bit, enabling per-page access tracking
Zero per-access software overhead: The hardware sets the bit as part of normal address translation—no software intervention required
Sampling-based information: The OS must periodically sample and clear bits to convert binary information into temporal approximations
Physical implementation: The bit exists in the page table entry, which resides in main memory and is cached in the TLB

Historical Context

Hardware Implementation in the MMU

Address Translation with Reference Bit Setting:

When the CPU issues a memory access with a virtual address, the MMU performs the following steps:

Extract page number from the virtual address
Check TLB for cached translation
If TLB miss, walk the page table hierarchy to find the PTE
Check permissions (read/write/execute, user/supervisor)
Set reference bit in the PTE if not already set (atomic hardware operation)
If write access, also set the dirty bit (covered separately)
Cache the translation in the TLB
Form physical address and complete the memory access

reference_bit_hardware.pseudo

Pseudo

// Pseudocode: MMU Reference Bit Setting (Hardware Logic)
 
function translate_address(virtual_address, access_type):
    page_number = extract_page_number(virtual_address)
    page_offset = extract_offset(virtual_address)
    
    // Step 1: TLB Lookup
    tlb_entry = tlb_lookup(page_number)
    
    if tlb_entry.valid:
        pte = tlb_entry.pte
    else:
        // TLB miss: walk page table
        pte = page_table_walk(page_number)
        if pte is NULL or not pte.present:
            raise PAGE_FAULT
        
        // Cache in TLB for future accesses
        tlb_insert(page_number, pte)
    
    // Step 2: Permission Check
    if not check_permissions(pte, access_type):
        raise PROTECTION_FAULT
    
    // Step 3: Set Reference Bit (Atomic Hardware Operation)
    // This happens AUTOMATICALLY with no software intervention
    if pte.reference_bit == 0:
        pte.reference_bit = 1  // Hardware atomic set
        
        // Note: Some architectures require TLB flush coordination
        // to ensure the PTE write is visible
    
    // Step 4: Set Dirty Bit if Write
    if access_type == WRITE and pte.dirty_bit == 0:
        pte.dirty_bit = 1  // Hardware atomic set
    
    // Step 5: Form Physical Address
    frame_number = pte.frame_number
    physical_address = (frame_number << PAGE_SHIFT) | page_offset
    
    return physical_address

x86/x86-64 Page Table Entry Format:

On x86-64 systems, the reference bit is bit 5 of the 64-bit page table entry (called the "Accessed" bit in Intel terminology):

63    62:52    51:M    M-1:12    11:9   8     7     6     5     4     3     2     1     0
┌─────┬───────┬───────┬─────────┬──────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ NX  │ Avail │ Rsvd  │ Frame # │Avail │ G   │ PAT │ D   │ A   │ PCD │ PWT │ U/S │ R/W │ P   │
└─────┴───────┴───────┴─────────┴──────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
                                                        ▲
                                                        │
                                           Reference Bit (Accessed)

Bit 5 (A - Accessed): Set by hardware on any access (read or write)
Bit 6 (D - Dirty): Set by hardware only on write access
The "Avail" fields are available for OS use, not touched by hardware

TLB Consistency Challenge

Operating System Sampling and Clearing

The OS's Role:

Periodic clearing: Clear reference bits at regular intervals (e.g., every timer interrupt)
Sampling: Read reference bits when making replacement decisions
Aggregation: Combine information across multiple sampling intervals to approximate recency
Policy decisions: Use accumulated data to select victim pages

Clearing strategies:

Reference Bit Clearing Strategies
Strategy	When to Clear	Advantages	Disadvantages
Timer-based	Every N timer interrupts	Predictable intervals, low overhead	Fixed interval may not match workload
On page fault	When searching for victim	Only clear when necessary	May clear too infrequently
Hybrid	Timer + additional clears during high pressure	Adaptive to memory pressure	More complex implementation
Per-page rotation	Clear each page's bit at different times	Spreads information across time	Complex bookkeeping

reference_bit_sampling.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
/*
 * Reference Bit Sampling and Clearing in a Page Replacement System
 * This code illustrates OS-level manipulation of reference bits
 */
 
#include <stdint.h>
#include <stdbool.h>
 
#define NUM_FRAMES 1024
#define REFERENCE_BIT_MASK 0x20  // Bit 5 in x86 PTE
 
typedef struct {
    uint64_t pte;           // The actual page table entry
    uint8_t age;            // Age counter for aging algorithm
    bool in_memory;         // Is this page currently in a frame?
} page_info_t;
 
typedef struct {
    uint32_t page_number;   // Which page is in this frame
    uint32_t process_id;    // Which process owns this page
} frame_t;
 
page_info_t* page_table;
frame_t frames[NUM_FRAMES];
 
/*
 * Clear reference bit for a single page
 * Returns the previous value of the reference bit
 */
bool clear_reference_bit(page_info_t* page) {
    bool was_referenced = (page->pte & REFERENCE_BIT_MASK) != 0;
    
    // Clear the bit in the page table entry
    page->pte &= ~REFERENCE_BIT_MASK;
    
    // CRITICAL: Must invalidate TLB entry to ensure consistency
    // The TLB may have cached a copy with the bit set
    invalidate_tlb_entry(page);
    
    return was_referenced;
}
 
/*
 * Sample reference bit without clearing
 * Used when we want to read but not reset the tracking
 */
bool sample_reference_bit(page_info_t* page) {
    return (page->pte & REFERENCE_BIT_MASK) != 0;
}
 
/*
 * Timer interrupt handler: periodic reference bit processing
 * Called approximately every 10ms (100 Hz timer)
 */
void timer_interrupt_handler(void) {
    static uint32_t tick_count = 0;
    tick_count++;
    
    // Every 10 ticks (100ms), process reference bits
    if (tick_count % 10 == 0) {
        process_reference_bits();
    }
    
    // Standard timer housekeeping...
    schedule_next_process();
}
 
/*
 * Process reference bits for all resident pages
 * This implements "aging" - shift right and add reference bit at left
 */
void process_reference_bits(void) {
    for (int i = 0; i < NUM_FRAMES; i++) {
        if (frames[i].page_number == INVALID_PAGE)
            continue;
            
        page_info_t* page = get_page_info(
            frames[i].process_id, 
            frames[i].page_number
        );
        
        // Shift age right (older accesses become less significant)
        page->age >>= 1;
        
        // If referenced, set high bit of age
        if (clear_reference_bit(page)) {
            page->age |= 0x80;  // Set most significant bit
        }
    }
}
 
/*
 * Find victim page for replacement
 * Select page with lowest age value (least recently used approximation)
 */
uint32_t find_victim_frame(void) {
    uint32_t victim_frame = 0;
    uint8_t min_age = 0xFF;
    
    for (int i = 0; i < NUM_FRAMES; i++) {
        if (frames[i].page_number == INVALID_PAGE)
            continue;
            
        page_info_t* page = get_page_info(
            frames[i].process_id, 
            frames[i].page_number
        );
        
        // Lower age = less recently/frequently used
        if (page->age < min_age) {
            min_age = page->age;
            victim_frame = i;
        }
    }
    
    return victim_frame;
}

The TLB Invalidation Imperative:

When the OS clears a reference bit in the page table entry, it must also invalidate the corresponding TLB entry. Otherwise:

The TLB may still have the old PTE with R=1 cached
Future accesses hit the TLB and use the cached translation
The hardware doesn't re-read the PTE, so the memory's R=0 remains unchanged
The OS incorrectly concludes the page was never accessed

Converting Binary Information to Temporal Approximation

The key insight is that repeated sampling over time converts binary snapshots into temporal approximations.

Single-sample information:

R=1: Page was accessed sometime during the last sampling interval
R=0: Page was NOT accessed during the last sampling interval

Multi-sample information accumulation:

Reference Bit History Interpretation
History Pattern	Interpretation	LRU Approximation
...1111 (always set)	Constantly accessed	Highly active, poor victim
...1110 (recently not set)	Active but cooling down	Recently paused, moderate choice
...0001 (just set)	Recently accessed after long idle	Just became active, poor victim
...0000 (never set)	No recent access	Excellent victim candidate
...1010 (periodic)	Intermittently accessed	Moderate activity, middle choice

The Aging Algorithm (Preview):

To capture this history, operating systems often use an aging technique:

Each page maintains an age counter (typically 8 bits)
On each sampling interval:
- Shift the age counter right by 1 bit (divide by 2)
- If R=1, set the leftmost bit of the age counter
- Clear R to begin tracking the next interval

Mathematical formalization:

If Rᵢ represents the reference bit value in interval i (most recent = interval 0), then:

Age = R₀×2⁷ + R₁×2⁶ + R₂×2⁵ + R₃×2⁴ + R₄×2³ + R₅×2² + R₆×2¹ + R₇×2⁰

This makes recent accesses exponentially more significant than older ones, approximating LRU while requiring only 8 bits of storage per page.

Precision vs. Memory Tradeoff

Limitations and Architectural Considerations

While the reference bit is remarkably useful, it has inherent limitations that shape the design of LRU approximation algorithms.

Fundamental Limitations:

Reference Bit Limitations

•Binary, not frequency: A page accessed once and a page accessed 1000 times in the same interval look identical (both have R=1)
•No ordering within intervals: Two pages accessed in the same interval cannot be ordered by recency
•Interval boundary artifacts: A page accessed just before clearing may appear "older" than one accessed just after, despite being more recently used
•Clearing overhead: Each clear requires TLB invalidation, which has non-trivial cost on modern multi-core systems
•Sampling rate tradeoff: Faster sampling provides finer granularity but increases overhead; slower sampling reduces overhead but loses precision

Architectural variations affecting reference bit behavior:

Architecture-Specific Reference Bit Behaviors
Architecture	Reference Bit Support	Special Considerations
x86/x86-64	Full hardware support	TLB caching requires careful invalidation
ARM	Hardware support (AF bit)	64KB granule support in ARMv8.1+
RISC-V	Optional (Sv39/Sv48)	May require software emulation
POWER	Full hardware support	Segment lookaside buffer considerations
MIPS	Software-managed TLB	Reference bits often emulated via TLB faults

Multi-core complications:

On multi-core systems, multiple CPUs may access the same page simultaneously. The reference bit must be set atomically to avoid race conditions. Modern architectures handle this through:

Atomic read-modify-write operations on PTE updates
Hardware cache coherency protocols ensuring all cores see the update
TLB shootdown mechanisms to invalidate cached translations across cores

When the OS clears reference bits, it must issue inter-processor interrupts (IPIs) to force all cores to flush relevant TLB entries—a significant overhead on many-core systems.

Software-Managed Reference Bits

The Foundation for LRU Approximation Algorithms

The algorithm landscape built on reference bits:

LRU Approximation Algorithms

•NRU (Not Recently Used): Combines reference and dirty bits to create 4 priority classes; evicts from lowest class
•Second-Chance / Clock: Gives each page a "second chance" before eviction if reference bit is set; circular queue implementation
•Enhanced Second-Chance: Extends second-chance by considering both reference and dirty bits for better victim selection
•Aging: Maintains a multi-bit history counter per page, shifted on each sampling interval
•Working Set Clock: Combines clock algorithm with working set concepts for adaptive replacement

Converting Mermaid diagram...

Comparing information usage:

Algorithm	Reference Bit Usage	Additional State	Complexity
NRU	Current R value + D bit	None	O(n) scan
Second-Chance	Clear R when page gets chance	Queue position	O(1) amortized
Clock	Same as second-chance	Clock hand position	O(1) amortized
Enhanced	Current R + D bits	Circular position	O(n) worst case
Aging	Accumulate R into history	8-16 bits per page	O(n) scan

The following pages will explore each of these algorithms in detail, revealing how they transform the simple reference bit into effective page replacement policies.

Summary: The Reference Bit

The reference bit exemplifies elegant systems design: a minimal hardware mechanism that enables sophisticated software policies. Let's consolidate the key concepts:

Key Takeaways

•True LRU is impractical at hardware speeds due to per-access bookkeeping overhead that would significantly slow memory access
•The reference bit is a single bit per page, set automatically by hardware on any access, providing zero-cost access tracking
•Hardware sets, software clears: The MMU sets the bit during address translation; the OS periodically clears it to track access within time windows
•Binary to temporal conversion: Repeated sampling and clearing transforms binary access snapshots into approximate recency information
•TLB coordination is essential: When clearing reference bits, the OS must invalidate TLB entries to ensure hardware re-reads the cleared values
•Foundation for practical algorithms: All production LRU approximation algorithms—NRU, Second-Chance, Clock, Aging—build on reference bit information

What's next:

Page Complete

1 / 5