Loading learning content...
Imagine you need to track which books in a vast library have been touched recently. You could station an observer at each shelf, recording every time someone picks up a book—but the overhead would be astronomical. Instead, imagine each book has a small flag that automatically raises when touched. Periodically, you walk through and note which flags are up, then reset them all.
This is precisely how the reference bit (also called the accessed bit or, confusingly, sometimes the modify bit in older literature) works in modern processors. Every page table entry contains a single bit that the hardware sets automatically whenever the page is accessed—whether for reading or writing. This humble bit enables efficient approximation of LRU without the prohibitive cost of tracking exact access times.
This page provides an exhaustive exploration of the reference bit—its mechanism, role in page replacement, techniques for leveraging it, and its central importance to practical memory management.
By the end of this page, you will understand: (1) The precise hardware mechanism that sets and maintains the reference bit, (2) How reference bits enable efficient LRU approximation, (3) The Clock algorithm and its variants, (4) Reference bit scanning and clearing strategies, (5) The relationship between reference bits and working set detection, and (6) Architectures without hardware reference bits.
Different architectures and texts use different names: Reference bit, Accessed bit, Use bit, or A bit (x86). Some older sources call it the 'modify bit' (not to be confused with the dirty bit). The Intel x86 manual uses 'Accessed' (A) for read/write access and 'Dirty' (D) for write-only. We'll use 'reference bit' as the general term.
The reference bit (or accessed bit) is a single bit in each page table entry that indicates whether the corresponding page has been accessed since the bit was last cleared.
Formal Definition:
Reference Bit = 1 : Page has been accessed (read OR write) since last clear
Reference Bit = 0 : Page has NOT been accessed since last clear
Key Distinction from Dirty Bit:
| Bit | Set On | Cleared By | Purpose |
|---|---|---|---|
| Reference (A) | Any access (read or write) | OS explicitly | Track recent usage |
| Dirty (D) | Write only | OS after write-back | Track modifications |
Location in Page Table Entry (x86-64):
64-bit Page Table Entry:
┌─────┬─────┬─────┬───┬───┬───┬───┬───┬─────────────────────────┬─────┐
│ NX │ Avl │ Avl │ G │PAT│ D │ A │PCD│ Physical Frame (40b) │R/W/P│
│ 1 │ 3 │ 7 │ 1 │ 1 │ 1 │ 1 │ 1 │ 40 │ 3 │
└─────┴─────┴─────┴───┴───┴───┴───┴───┴─────────────────────────┴─────┘
↑ ↑
│ └── Accessed (Reference) Bit
└────── Dirty Bit
The reference bit is adjacent to the dirty bit, and both are set by hardware during address translation.
The Reference Bit Lifecycle:
┌──────────────────────────────────────────────────────────────────┐
│ REFERENCE BIT LIFECYCLE │
├──────────────────────────────────────────────────────────────────┤
│ │
│ Page Loaded Page Accessed OS Clears Bit │
│ ───────────►A=0 ───────────────►A=1 ──────────────►A=0 │
│ (automatic) (explicit) │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Time: ──────────────────────────────────────────────► │ │
│ │ │ │ │ │ │ │ │
│ │ Load Access Access Clear Access │ │
│ │ A=0 A=1 A=1 A=0 A=1 │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ Key: A=0 means unreferenced since last clear │
│ A=1 means referenced since last clear │
└──────────────────────────────────────────────────────────────────┘
The Fundamental Insight:
A reference bit of 0 means the page hasn't been accessed since the OS last cleared it. If the OS clears bits periodically, pages with A=0 are not recently used—precisely the pages suitable for eviction.
This transforms an impossible tracking problem (record every access with timestamp) into a tractable one (periodically sample who's been accessed).
The reference bit provides only binary information: accessed or not accessed within a given interval. This is much coarser than true LRU (which orders all pages by exact recency), but it's sufficient for effective victim selection with dramatically lower overhead. The key is that we can identify pages that are 'cold' enough to evict safely.
Understanding exactly when and how the reference bit is set clarifies why it's so useful for memory management.
When the Reference Bit Is Set:
The reference bit is set automatically by the Memory Management Unit (MMU) during successful address translation:
Any Memory Access (Load or Store):
1. CPU executes instruction accessing memory
2. MMU walks page table to translate virtual address
3. MMU finds PTE for target page
4. MMU checks valid bit (faults if not present)
5. MMU checks permissions (faults if insufficient)
6. MMU atomically:
a. Sets Reference bit = 1
b. If write: also sets Dirty bit = 1
c. Returns physical address
7. Access completes
This happens on EVERY successful memory access.
TLB and Reference Bit:
The TLB (Translation Lookaside Buffer) caches page table entries. How does this affect reference bit behavior?
With TLB Hit (Common Case):
1. Virtual address looked up in TLB
2. Entry found → translation returned
3. Reference bit in TLB entry set (if not already)
4. NO page table walk needed
Reference bit update behavior varies by architecture:
- Some set bit in TLB entry only (must flush to page table)
- Some write-through to page table (consistent view)
- Some set only on TLB miss (less precise)
Clearing the Reference Bit:
Unlike setting (automatic), clearing requires explicit OS action:
/* Clear reference bit for a page */
void clear_reference_bit(struct page *page) {
pte_t *pte = get_pte_for_page(page);
/* Clear the accessed bit in PTE */
*pte = pte_mkold(*pte); /* Architecture-specific clear */
/* CRITICAL: Must invalidate TLB entry */
/* Otherwise TLB has A=1 while memory has A=0 */
flush_tlb_page(page_to_virtual(page));
}
The TLB Flush Requirement:
Problem:
TLB caches PTE with A=1
OS clears A=0 in page table
TLB still has A=1
TLB hit → access proceeds without updating page table
OS sees A=0, thinks page unused
Evicts actively-used page!
Solution:
When clearing reference bit, flush corresponding TLB entry.
Next access causes TLB miss → page table walk → A=1 set in table.
x86 implementation:
INVLPG instruction invalidates single TLB entry
Or full TLB flush (more overhead)
For SMP: must flush on ALL CPUs (TLB shootdown)
On multi-processor systems, each CPU has its own TLB. Clearing a reference bit requires invalidating that TLB entry on ALL CPUs that might have cached it. This 'TLB shootdown' involves inter-processor interrupts and synchronization, adding significant overhead. Systems try to batch shootdowns and minimize their frequency.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
/* Reference Bit Operations in Linux Kernel (Simplified) */ #include <asm/pgtable.h> /* * Check if page has been accessed. * Returns true if reference/accessed bit is set. */static inline int page_referenced(struct page *page) { pte_t *pte; int referenced = 0; /* Check all PTEs mapping this page (may be shared) */ struct rmap_walk_control rwc = { .arg = &referenced, .rmap_one = page_referenced_one, }; rmap_walk(page, &rwc); return referenced;} /* * Clear reference bit and return old value. * Caller must handle TLB invalidation. */static inline int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { int young; /* Atomically clear accessed bit and get old value */ young = pte_young(*ptep); if (young) { /* Clear the accessed bit */ set_pte_at(vma->vm_mm, address, ptep, pte_mkold(*ptep)); /* Flush TLB to make change visible */ flush_tlb_page(vma, address); } return young;} /* * Age a page by clearing its reference bit. * Used by page replacement algorithms. */void page_age(struct page *page) { struct rmap_walk_control rwc = { .rmap_one = page_mkold_one, /* Clear accessed bit in each PTE */ .done = page_not_mapped, }; /* Walk all mappings and clear accessed bit */ rmap_walk(page, &rwc);}True LRU requires maintaining an exact ordering of all pages by access time—expensive both in space and time. Reference bits enable efficient approximations that capture enough recency information for effective victim selection.
Why True LRU Is Expensive:
True LRU Requirements:
- Maintain ordered list of all pages
- Move accessed page to list head on EVERY access
- Evict from list tail
Cost Analysis (with 4 million pages):
- Memory accesses per second: ~1 billion
- List updates per second: ~1 billion
- Lock contention: catastrophic on SMP
- Memory for list pointers: ~64 MB
This is completely impractical for OS memory management.
The Reference Bit Approximation:
Approximation Strategy:
- Don't track exact order
- Just distinguish "recently used" from "not recently used"
- Periodically sample via reference bits
With reference bits:
- Hardware sets bits automatically (no software overhead on access)
- OS periodically scans and clears bits
- Pages with bit=0 haven't been used since last scan
- Evict pages with bit=0 (approximately least recent)
| Aspect | True LRU | Reference Bit Approximation |
|---|---|---|
| Per-access overhead | Update data structure | None (hardware) |
| Ordering precision | Exact order of all pages | Binary: recent or not |
| Selection accuracy | Perfect LRU | ~85-95% of LRU |
| Memory overhead | 16+ bytes per page | 1 bit per page |
| SMP scalability | Poor (contention) | Excellent |
| Implementation | Complex | Simple |
Levels of Approximation:
More sophisticated approximations use reference bits in different ways:
Level 1: Simple Scan (One-Bit History)
Eviction:
1. Scan pages for reference_bit = 0
2. If found, evict
3. If not found, clear all reference bits
4. Scan again
Precision: Can only distinguish 2 categories (referenced/not)
Level 2: Additional History Bits (Multi-Bit History)
Maintain software history bits per page:
- Reference bit shifts into history register
- History register contains last N samples
- Lower history value = less recently used
Example with 8-bit history:
Page A history: 00000001 (accessed once, long ago)
Page B history: 11110000 (accessed recently, now quiet)
Page C history: 10101010 (intermittent access)
Evict: Page A (lowest value)
Precision: Can distinguish 256 categories with 8 bits
Level 3: Aging Algorithm
Periodic aging process:
1. For each page:
a. Right-shift history bits
b. OR reference bit into high bit
c. Clear reference bit
Pages with higher history values were accessed more recently.
Provides approximate ordering based on recent access density.
Reference bit LRU approximations typically achieve 80-95% of true LRU's page fault performance at a fraction of the cost. The remaining 5-20% is often not worth the overhead of true LRU. For most workloads, knowing 'not recently accessed' is sufficient—you don't need to know exactly when among all unreferenced pages is 'least' recent.
The Clock algorithm (also called Second-Chance algorithm) is the most widely-used LRU approximation. It's elegant, efficient, and forms the basis of page replacement in most real operating systems.
Core Concept:
Imagine pages arranged in a circle, with a "clock hand" pointing to one page. When we need a victim:
1. Examine page at clock hand
2. If reference bit = 0:
→ Select this page as victim
→ Advance hand past it
3. If reference bit = 1:
→ Page was recently used (give "second chance")
→ Clear reference bit to 0
→ Advance hand to next page
→ Repeat from step 1
Visual Representation:
Clock Hand
↓
┌───────────────────────────────┐
│ │
┌────▼────┐ ┌─────────┐ ┌─────────┐
│ Page A │ │ Page B │ │ Page C │
│ R=1 │ │ R=0 │ │ R=1 │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└──────────────┴──────────────┘
Step 1: Hand at A (R=1), give second chance, set R=0, advance
Step 2: Hand at B (R=0), select B as victim
Why "Second Chance":
A page with reference bit = 1 has been accessed recently. Instead of evicting it, we:
Clock Algorithm Properties:
Time Complexity:
- Best case: O(1) - first page checked is victim
- Worst case: O(n) - all pages recently referenced
- Average case: Depends on reference pattern, typically O(1) to O(k)
where k << n
Space Complexity:
- O(1) additional - just the clock hand position
- Reference bits are in page table (already exist)
Fairness:
- Each page gets equal opportunity
- No starvation (will eventually find victim)
Comparison to True LRU:
Consider 4 pages accessed at times: A(t=10), B(t=20), C(t=15), D(t=5)
True LRU order (most recent first): B → C → A → D
Eviction order: D → A → C → B
Clock sees only: A(R=1), B(R=1), C(R=1), D(R=1)
No ordering information! All recently used.
First pass clears all: A(R=0), B(R=0), C(R=0), D(R=0)
Second pass evicts first encountered: whichever hand reaches first.
Clock trades precision for efficiency. For most workloads, this is acceptable.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
/* Clock (Second-Chance) Algorithm Implementation */ struct clock_state { int hand; /* Current position in page array */ int num_frames; /* Total number of frames */ struct page **frames; /* Array of page pointers */}; static struct clock_state clock; /* * Find a victim page using the Clock algorithm. * Returns pointer to the victim page. */struct page *clock_find_victim(void) { int passes = 0; /* Limit search to prevent infinite loop in pathalogical cases */ int max_iterations = clock.num_frames * 2; int iterations = 0; while (iterations < max_iterations) { struct page *page = clock.frames[clock.hand]; /* Skip pinned/locked pages */ if (page_is_locked(page)) { advance_clock_hand(); iterations++; continue; } /* Check reference bit */ if (page_referenced(page)) { /* Recently used - give second chance */ clear_page_reference(page); /* Set R=0 */ advance_clock_hand(); iterations++; } else { /* Not recently used - victim found! */ advance_clock_hand(); return page; } } /* Fallback: all pages locked or referenced */ return NULL; /* Memory allocation will fail */} static void advance_clock_hand(void) { clock.hand = (clock.hand + 1) % clock.num_frames;} /* * Check if page was referenced (hardware reference bit). * Also considers software reference tracking. */static int page_referenced(struct page *page) { pte_t *pte = page_to_pte(page); /* Check hardware accessed bit */ if (pte_young(*pte)) return 1; /* Optionally check software tracking */ if (page->flags & PG_referenced) return 1; return 0;} static void clear_page_reference(struct page *page) { pte_t *pte = page_to_pte(page); /* Clear hardware bit */ *pte = pte_mkold(*pte); /* Clear software bit */ page->flags &= ~PG_referenced; /* Invalidate TLB entry */ flush_tlb_page(page);}The basic Clock algorithm can be improved by incorporating additional information, particularly the dirty bit. These enhanced versions provide better victim selection with minimal additional cost.
Enhanced Clock (NRU - Not Recently Used):
Combine reference bit (R) and dirty bit (D) into four classes:
┌─────────┬─────────┬──────────────────────────────────────────────┐
│ R bit │ D bit │ Interpretation │
├─────────┼─────────┼──────────────────────────────────────────────┤
│ 0 │ 0 │ Class 0: Not referenced, not modified │
│ │ │ → Best victim: minimal cost │
├─────────┼─────────┼──────────────────────────────────────────────┤
│ 0 │ 1 │ Class 1: Not referenced, modified │
│ │ │ → Good victim: will need write │
├─────────┼─────────┼──────────────────────────────────────────────┤
│ 1 │ 0 │ Class 2: Referenced, not modified │
│ │ │ → Poor victim: recently used, but clean │
├─────────┼─────────┼──────────────────────────────────────────────┤
│ 1 │ 1 │ Class 3: Referenced, modified │
│ │ │ → Worst victim: recently used AND dirty │
└─────────┴─────────┴──────────────────────────────────────────────┘
Enhanced Clock Selection Process:
Pass 1: Look for Class 0 (R=0, D=0)
If found → victim
If not found → continue to Pass 2
Pass 2: Look for Class 1 (R=0, D=1)
While scanning, clear R bits (demote Class 2→0, Class 3→1)
If found → victim
If not found → continue to Pass 3
Pass 3: Repeat from Pass 1
Now some pages have been demoted
Will find victims in Class 0 or 1
Two-Handed Clock:
Use two hands instead of one for better performance:
Front Hand Back Hand
↓ ↓
┌────────────────────────────────────────────┐
│ ○───○───○───○───○───○───○───○───○───○ │
└────────────────────────────────────────────┘
Front hand: Clears reference bits (resets pages)
Back hand: Evicts pages with R=0 (selects victims)
Gap between hands = "second chance" interval
Larger gap = more time for inactive pages to be detected
Smaller gap = more aggressive reclamation
Advantages of Two-Handed Clock:
1. Continuous operation
- Back hand always finds pages with R=0
- Front hand ensures R bits are continuously cleared
2. Tunable aggression
- Gap controls how long pages have to prove they're active
- Large gap: favors long-term working set
- Small gap: favors recently accessed pages
3. Better batch behavior
- Can pre-identify victims before memory pressure
- Reduces critical path latency
WSClock (Working Set Clock):
Combines Clock with Working Set tracking:
For each page, track:
- Reference bit (R)
- Last use time (virtual time)
- Working set membership
Victim selection:
1. If R=1: set R=0, update last use time, skip
2. If R=0 and age > τ (working set window): evict
3. If R=0 and age ≤ τ: skip (in working set)
4. If dirty: schedule write, skip
5. If clean and old: evict immediately
Benefit: Naturally handles varying working set sizes
Linux uses a variant of enhanced Clock with LRU lists. Pages move between 'active' and 'inactive' lists based on reference bit state. The inactive list tail is the primary victim pool. Reference bits are cleared during list demotion. This combines LRU approximation with working set-like behavior.
Reference bits enable an important system capability: detecting a process's working set—the set of pages actively needed for efficient execution.
The Working Set Concept:
Working Set Definition:
W(t, Δ) = set of pages accessed in the interval [t - Δ, t]
Where:
t = current time
Δ = working set window (time interval)
Pages in W(t, Δ) are "active" and should remain resident.
Pages outside W(t, Δ) are "inactive" and can be evicted.
Detecting Working Set with Reference Bits:
Basic approach:
1. At time t, clear all reference bits
2. Let process run for interval Δ
3. At time t + Δ, check reference bits
4. Pages with R=1 are in the working set
5. Pages with R=0 are not in the working set
This gives an approximation of W(t+Δ, Δ).
Practical Working Set Tracking:
Real systems approximate working sets continuously:
Periodic Sampling Approach:
┌────────────────────────────────────────────────────────────────┐
│ Time: ───────────────────────────────────────────────► │
│ │ │ │ │ │ │
│ T1 T2 T3 T4 T5 │
│ │ │ │ │ │ │
│ Sample Sample Sample Sample Sample │
│ & clear & clear & clear & clear & clear │
│ │
│ Pages referenced between samples → in working set │
│ Pages NOT referenced → candidates for eviction │
└────────────────────────────────────────────────────────────────┘
Sampling interval determines precision:
- Short interval: Fine-grained detection, high overhead
- Long interval: Coarse detection, low overhead
- Typical: 100ms - 1 second
Multi-bit History for Working Set:
Maintain history shift register per page:
At each sample interval:
1. Shift history right one bit
2. OR reference bit into leftmost position
3. Clear reference bit
Example with 8-bit history:
Page A: 11111111 → Accessed every interval (hot)
Page B: 11110000 → Accessed recently, now quiet (cooling)
Page C: 00000001 → Accessed once long ago (cold)
Page D: 10101010 → Accessed every other interval (periodic)
Working set ≈ Pages with history above threshold
| Condition | System State | Action |
|---|---|---|
| ∑WSS ≪ Available Memory | Under-utilized | Can run more processes |
| ∑WSS ≈ Available Memory | Optimal utilization | Stable operation |
| ∑WSS > Available Memory | Over-committed | Thrashing risk, consider swapping |
| WSS of P > P's allocation | Process undersized | Increase allocation or swap out |
By tracking working sets via reference bits, the OS can detect when the sum of all working sets exceeds physical memory—the precondition for thrashing. When detected, the OS can proactively swap out entire processes (reducing multiprogramming level) to ensure remaining processes have sufficient frames for their working sets.
Not all processor architectures provide hardware reference bits. How do operating systems on such platforms track access patterns?
Architectures Without Hardware Reference Bits:
Notable examples:
- MIPS: No hardware R bit (software managed TLB)
- Some ARM variants
- Older embedded processors
- Software-managed TLB architectures
Challenge: How to detect page access without hardware help?
Software Simulation via Page Protection:
The clever solution uses page protection faults:
┌────────────────────────────────────────────────────────────────┐
│ SOFTWARE REFERENCE BIT SIMULATION │
├────────────────────────────────────────────────────────────────┤
│ │
│ 1. Mark all pages as NOT ACCESSIBLE (protection fault if │
│ accessed) │
│ │
│ 2. When page is accessed: │
│ a. Protection fault occurs │
│ b. Fault handler records access (sets software R bit) │
│ c. Handler marks page ACCESSIBLE │
│ d. Return from fault; access proceeds │
│ │
│ 3. Periodically: │
│ a. Check software R bits (which pages were accessed) │
│ b. Clear software R bits │
│ c. Re-mark pages as NOT ACCESSIBLE │
│ d. Next access to each page will fault again │
│ │
└────────────────────────────────────────────────────────────────┘
Implementation Details:
/* Software reference bit tracking */
struct page *pages;
uint8_t *software_reference_bits; /* Software R bits */
/* Called on protection fault */
void handle_access_fault(unsigned long address) {
struct page *page = address_to_page(address);
/* Set software reference bit */
software_reference_bits[page->index] = 1;
/* Make page accessible */
set_page_accessible(page);
/* Return and retry access */
}
/* Periodic reference bit scan */
void scan_reference_bits(void) {
for (int i = 0; i < num_pages; i++) {
if (software_reference_bits[i]) {
/* Page was accessed since last scan */
process_referenced_page(&pages[i]);
}
/* Clear software bit */
software_reference_bits[i] = 0;
/* Mark page inaccessible for next period */
set_page_inaccessible(&pages[i]);
}
}
Cost Analysis:
Overhead per unique page access (first access after scan):
- Protection fault: ~1000 CPU cycles
- Handler execution: ~500 CPU cycles
- Return from fault: ~500 CPU cycles
Total: ~2000 cycles = ~1 microsecond @ 2 GHz
With 100 pages accessed between scans:
Overhead: 100 × 1μs = 100μs per scan interval
If scan interval = 1 second: 0.01% overhead
If scan interval = 10ms: 1% overhead
Tradeoff: Longer intervals = lower overhead, coarser tracking
MIPS processors use software-managed TLBs—the OS handles TLB misses entirely. This provides flexibility: the OS can set any bit field it wants in TLB entries, including implementing reference tracking at TLB-miss time. The tradeoff is higher TLB miss cost, but complete control over paging behavior.
Optimization: Sampling
Full protection-based tracking is expensive. Sampling reduces overhead:
Sampling approach:
- Periodically select random subset of pages
- Mark only sampled pages as inaccessible
- Extrapolate from sampled pages to estimate access rates
Benefit:
- If 10% of pages sampled: ~10x lower fault overhead
- Statistical accuracy sufficient for eviction decisions
Used in some research systems and specialized environments.
We've thoroughly explored the reference bit—a simple piece of hardware support with profound implications for memory management. Let's consolidate our understanding:
Module Complete:
This concludes Module 1: Page Replacement Need. We've established:
With this foundation, you're prepared for the subsequent modules which explore specific page replacement algorithms (FIFO, OPT, LRU, Clock, and their variants), frame allocation strategies, and the pathological condition of thrashing.
You now have comprehensive understanding of why page replacement is necessary and the hardware/software mechanisms that enable efficient implementation. The dirty bit and reference bit provide the information needed to make intelligent victim selection decisions—the foundation of all practical page replacement algorithms.