Operating SystemsPage Replacement

FIFO Page Replacement Algorithm

LevelIntermediate

Duration60 mins

TopicPage Replacement

5 / 5

Queue Management

The Operational Heart of FIFO

We've established FIFO's theoretical foundations, implementation approach, anomalous behavior, and performance characteristics. Now we examine the operational details that make FIFO work in production systems: queue management.

The queue is FIFO's central data structure. Its implementation affects memory overhead, cache performance, concurrency safety, and integration with operating system memory management. These practical considerations often determine whether FIFO is viable in a given context.

In this final page, we explore queue management comprehensively—from low-level memory layout to high-level integration with page table management and multi-process coordination.

What You Will Learn

By the end of this page, you will understand how to implement FIFO queues efficiently in kernel space, manage queues for multiple processes, handle concurrency and synchronization, integrate with page table updates, and optimize memory layout for cache performance.

Queue Data Structure Deep Dive

The queue for FIFO page replacement tracks which pages are resident and their arrival order. Let's examine implementation options in detail.

Option 1: Circular Buffer Implementation

The circular buffer (ring buffer) is the most efficient queue implementation for fixed-capacity scenarios like page replacement:

typedef struct {
    page_t* buffer;     // Array of page identifiers
    uint32_t capacity;  // Maximum entries (= number of frames)
    uint32_t head;      // Index of oldest entry (dequeue here)
    uint32_t tail;      // Index for next insert (enqueue here)
    uint32_t count;     // Current number of entries
} CircularQueue;

Key Operations:

circular_queue.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Enqueue: Add new page to back of queue
bool queue_enqueue(CircularQueue* q, page_t page) {
    if (q->count >= q->capacity) {
        return false;  // Queue full - must dequeue first
    }
    
    q->buffer[q->tail] = page;
    q->tail = (q->tail + 1) % q->capacity;  // Wrap around
    q->count++;
    return true;
}
 
// Dequeue: Remove oldest page from front of queue
page_t queue_dequeue(CircularQueue* q) {
    if (q->count == 0) {
        return INVALID_PAGE;  // Queue empty
    }
    
    page_t page = q->buffer[q->head];
    q->head = (q->head + 1) % q->capacity;  // Wrap around
    q->count--;
    return page;
}
 
// Peek: View oldest page without removing
page_t queue_peek(CircularQueue* q) {
    if (q->count == 0) {
        return INVALID_PAGE;
    }
    return q->buffer[q->head];
}
 
// Contains: Check if page is in queue (O(n) - can be optimized)
bool queue_contains(CircularQueue* q, page_t page) {
    for (uint32_t i = 0; i < q->count; i++) {
        uint32_t idx = (q->head + i) % q->capacity;
        if (q->buffer[idx] == page) {
            return true;
        }
    }
    return false;
}

Memory Layout Considerations:

Aspect	Circular Buffer	Linked List
Memory overhead	Fixed (capacity × sizeof(page_t))	Variable (n × (sizeof(page_t) + sizeof(ptr)))
Cache behavior	Excellent (contiguous)	Poor (scattered allocations)
Allocation	Single allocation	Per-element allocation
Resize capability	Requires new buffer + copy	Easy (just add nodes)
Implementation complexity	Low	Very low

For kernel implementations where frame count is fixed at boot, circular buffers are strongly preferred.

Cache Line Alignment

For optimal cache performance, align the buffer to cache line boundaries (typically 64 bytes on modern x86). This ensures sequential iteration doesn't cross cache lines unnecessarily, reducing cache misses during queue operations.

Per-Process vs. Global Queue Management

Operating systems can organize FIFO queues at different granularities:

Option A: Global Queue (System-Wide)

A single queue tracks all pages from all processes in arrival order.

[Process A, Page 5] → [Process B, Page 2] → [Process A, Page 7] → ...

Advantages:

Simple implementation: one queue to manage
Natural load balancing: active processes get more frames automatically
Memory efficient: single data structure

Disadvantages:

One process can starve others
No isolation: runaway process affects everyone
Harder to enforce per-process policies

Option B: Per-Process Queues (Local)

Each process has its own queue tracking only its own pages.

Process A: [Page 5] → [Page 7] → [Page 2] → ...
Process B: [Page 2] → [Page 1] → [Page 4] → ...
Process C: [Page 1] → [Page 3] → ...

Advantages:

Process isolation: each process manages its own pages
Fairness: can enforce equal allocation
Predictability: process behavior doesn't affect others

Disadvantages:

More complex: N queues to manage
Less flexible: idle processes hold frames
Per-process overhead

Converting Mermaid diagram...

Comparison of Queue Organization Strategies
Factor	Global Queue	Per-Process Queues	Hybrid
Fairness	Weak	Strong	Configurable
Implementation	Simple	Complex	Moderate
Memory efficiency	High	Moderate	Moderate
Isolation	None	Full	Partial
Load balancing	Automatic	Manual	Partial
Scalability	Limited (contention)	Good	Good
Used in practice	Rare	Linux (local replacement)	Windows (working sets)

Modern OS Approach

Modern operating systems typically use per-process or per-process-group queues with dynamic allocation adjustment. This provides fairness while allowing the system to reclaim frames from idle processes when needed. Pure global queuing is rare due to fairness concerns.

Concurrency and Synchronization

In multi-processor systems, FIFO queue operations must be thread-safe. Multiple CPUs can trigger page faults simultaneously, requiring synchronized access to the queue.

The Concurrency Challenge:

Page faults can occur on any CPU at any time. Without synchronization:

Two CPUs might dequeue the same page
Concurrent enqueue operations might corrupt the queue
Page table updates might become inconsistent with queue state

Synchronization Approaches:

concurrent_fifo.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Approach 1: Global Lock (Simple but Contention-Prone)
typedef struct {
    CircularQueue queue;
    spinlock_t lock;
} LockedFIFOQueue;
 
page_t locked_dequeue(LockedFIFOQueue* q) {
    spin_lock(&q->lock);
    page_t page = queue_dequeue(&q->queue);
    spin_unlock(&q->lock);
    return page;
}
 
// Approach 2: Per-CPU Queues (Scalable but Complex)
typedef struct {
    CircularQueue per_cpu_queues[MAX_CPUS];
    spinlock_t global_lock;  // For cross-CPU operations
} PerCPUFIFOQueues;
 
// Each CPU manages its own queue, reducing contention
// Global lock only needed for rebalancing
 
// Approach 3: Lock-Free Queue (Advanced)
typedef struct {
    atomic_uint head;
    atomic_uint tail;
    _Atomic(page_t) buffer[MAX_FRAMES];
} LockFreeCircularQueue;
 
bool lockfree_enqueue(LockFreeCircularQueue* q, page_t page) {
    uint32_t tail = atomic_load(&q->tail);
    uint32_t next_tail = (tail + 1) % MAX_FRAMES;
    
    // CAS loop for thread-safe insertion
    while (!atomic_compare_exchange_weak(&q->tail, &tail, next_tail)) {
        tail = atomic_load(&q->tail);
        next_tail = (tail + 1) % MAX_FRAMES;
    }
    
    atomic_store(&q->buffer[tail], page);
    return true;
}

Synchronization Strategy Trade-offs
Strategy	Contention	Complexity	Latency	Best For
Global spinlock	High	Low	Variable (high under load)	Single-CPU or low-contention
Per-CPU queues	Low	Moderate	Low (local access)	Multi-CPU, uneven load OK
Lock-free	Very low	High	Low	High-performance systems
RCU-based	Very low	Very high	Low	Read-heavy workloads

Lock Ordering

When FIFO queue locks interact with page table locks, strict lock ordering is essential to prevent deadlock. A common convention: (1) Acquire page table lock, (2) Acquire FIFO queue lock, (3) Perform operation, (4) Release in reverse order. Never violate this ordering.

Integration with Page Tables

FIFO queue management must coordinate closely with page table operations. Every queue change implies a corresponding page table change, and vice versa.

The Coordination Requirements:

On Enqueue (Page Load):
- Allocate physical frame
- Read page from disk into frame
- Update page table entry: set valid bit, set frame number
- Enqueue page identifier
- Flush TLB entry (if cached stale)
On Dequeue (Page Eviction):
- Identify victim from queue
- Check dirty bit in page table entry
- If dirty: write page to disk
- Update page table entry: clear valid bit
- Dequeue page identifier
- Flush TLB entry (prevent stale access)
- Release physical frame to free list

fifo_page_table_integration.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// Complete page load with queue and page table coordination
int fifo_load_page(process_t* proc, vaddr_t vaddr) {
    // 1. Get or allocate a frame
    frame_t frame;
    
    if (free_frame_available()) {
        frame = allocate_free_frame();
    } else {
        // Must evict - coordinate queue and page table
        page_info_t victim = queue_dequeue(&proc->fifo_queue);
        frame = victim.frame;
        
        // Update victim's page table entry
        pte_t* victim_pte = get_pte(victim.process, victim.vaddr);
        
        if (pte_is_dirty(victim_pte)) {
            // Write back to swap
            disk_write(get_swap_slot(victim), frame_to_kaddr(frame));
        }
        
        pte_clear_valid(victim_pte);
        tlb_flush_entry(victim.vaddr);
    }
    
    // 2. Read the faulted page from disk
    swap_slot_t slot = get_swap_slot_for_page(proc, vaddr);
    disk_read(slot, frame_to_kaddr(frame));
    
    // 3. Update page table for new page
    pte_t* pte = get_pte(proc, vaddr);
    pte_set_frame(pte, frame);
    pte_set_valid(pte);
    pte_clear_dirty(pte);  // Just loaded, not yet modified
    
    // 4. Enqueue the newly loaded page
    page_info_t new_page = {
        .process = proc,
        .vaddr = vaddr,
        .frame = frame
    };
    queue_enqueue(&proc->fifo_queue, new_page);
    
    // 5. TLB may cache new entry automatically on next access
    
    return 0;  // Success
}

Critical Atomicity Requirement:

Page table and queue updates must be atomic from the perspective of the faulting process:

Pre-fault state: Page not in queue, PTE invalid
Post-success state: Page in queue, PTE valid, frame contains page data
No intermediate state should be observable

This requires holding appropriate locks throughout the fault handler execution and ensuring disk I/O completion before releasing locks or resuming the process.

The Dirty Bit Efficiency

The dirty bit in the page table entry is crucial for FIFO performance. Without it, every eviction would require a disk write. With the dirty bit, read-only pages (code, read-only data) never need writeback, potentially halving eviction costs.

Memory Layout Optimization

Queue memory layout significantly affects kernel performance. Thoughtful layout reduces cache misses during queue operations.

Cache-Optimized Queue Structure:

cache_optimized_queue.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Cache line size on x86-64
#define CACHE_LINE_SIZE 64
 
// Pack hot fields together
typedef struct __attribute__((aligned(CACHE_LINE_SIZE))) {
    // Hot fields - accessed on every operation (fit in one cache line)
    uint32_t head;          // 4 bytes
    uint32_t tail;          // 4 bytes
    uint32_t count;         // 4 bytes
    uint32_t capacity;      // 4 bytes
    spinlock_t lock;        // 4 bytes (typically)
    uint8_t  _pad1[44];     // Pad to 64 bytes
    
    // Cold fields - rarely accessed (separate cache line)
    stats_t stats;          // Statistics (optional)
    uint32_t resize_count;  // Debug info
    
    // Buffer - should be cache-line aligned
    page_entry_t buffer[] __attribute__((aligned(CACHE_LINE_SIZE)));
} OptimizedQueue;
 
// Page entry should be power-of-2 size for efficient indexing
typedef struct {
    uint32_t process_id;    // 4 bytes
    uint32_t page_number;   // 4 bytes
    frame_t  frame;         // 4 bytes
    uint32_t flags;         // 4 bytes
} page_entry_t;  // 16 bytes - exactly power of 2
 
// Efficient indexing (multiplication by power of 2 = shift)
static inline page_entry_t* get_entry(OptimizedQueue* q, uint32_t idx) {
    // idx * 16 = idx << 4, very fast
    return &q->buffer[idx];
}

Layout Principles:

Queue Memory Layout Guidelines
Principle	Implementation	Benefit
Hot/cold separation	Put head/tail/count in first cache line	Single cache miss for common ops
Cache line alignment	attribute((aligned(64)))	No false sharing, clean lines
Power-of-2 entries	Make entry size 8, 16, 32, or 64 bytes	Shift instead of multiply
Contiguous buffer	Allocate buffer as single block	Prefetcher works effectively
Avoid pointers	Use indices instead of pointers in queue	Better cache locality
NUMA awareness	Allocate queue on local NUMA node	Avoid cross-node latency

The False Sharing Trap

False sharing occurs when unrelated data shares a cache line, causing invalidations across CPUs. In per-CPU queue designs, ensure each CPU's queue metadata occupies separate cache lines. Otherwise, one CPU's updates invalidate another CPU's cache, destroying the isolation benefit.

Handling Edge Cases and Error Conditions

Robust queue management must handle edge cases that can occur in real systems:

Edge Case 1: Queue Corruption Recovery

Hardware errors or bugs might corrupt queue state. Defensive programming can detect and recover:

bool queue_validate(CircularQueue* q) {
    // Invariants that must hold
    if (q->count > q->capacity) return false;
    if (q->head >= q->capacity) return false;
    if (q->tail >= q->capacity) return false;
    
    // Count should match head/tail relationship
    uint32_t expected_count;
    if (q->tail >= q->head) {
        expected_count = q->tail - q->head;
    } else {
        expected_count = q->capacity - q->head + q->tail;
    }
    if (expected_count != q->count) return false;
    
    return true;
}

Edge Case 2: Process Termination

When a process terminates, all its pages must be removed from the queue:

void queue_remove_process(CircularQueue* q, pid_t pid) {
    // Must iterate and compact - O(n) operation
    uint32_t read_idx = q->head;
    uint32_t write_idx = q->head;
    uint32_t remaining = q->count;
    
    while (remaining > 0) {
        page_entry_t* entry = &q->buffer[read_idx];
        
        if (entry->process_id != pid) {
            // Keep this entry
            if (write_idx != read_idx) {
                q->buffer[write_idx] = *entry;
            }
            write_idx = (write_idx + 1) % q->capacity;
        }
        // else: skip (remove) this entry
        
        read_idx = (read_idx + 1) % q->capacity;
        remaining--;
    }
    
    q->tail = write_idx;
    q->count = /* recalculate */;
}

Edge Case 3: Hot Unplug (NUMA/Memory)

Modern servers support memory hot-removal. Queued pages in removed memory must be handled:

Identify pages in affected memory range
Migrate data to remaining memory
Update queue entries with new frame locations
Update page tables to reflect new frames
Complete hot-unplug

Other Edge Cases to Handle

•Fork and copy-on-write — Parent and child initially share pages; queue must handle this sharing correctly until COW triggers
•Memory pressure signals — Queue may need to proactively evict pages when pressure increases, before faults occur
•Hibernation/resume — Queue state must be persisted and restored with correct frame mappings
•OOM conditions — When swap is full, eviction cannot succeed; must trigger OOM killer or fail gracefully
•Shared memory regions — Pages mapped by multiple processes need careful refcounting in queue

The OOM Trap

Out-of-memory handling is critical. If both physical memory AND swap are exhausted, FIFO cannot evict (nowhere to write dirty pages). Systems must have OOM policies: kill a process, refuse new allocations, or compress memory. FIFO itself cannot solve resource exhaustion.

Debugging and Monitoring

Effective queue management requires visibility into queue behavior for debugging and performance monitoring.

Essential Metrics to Track:

fifo_metrics.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
typedef struct {
    // Operation counts
    uint64_t enqueues;           // Total pages loaded
    uint64_t dequeues;           // Total pages evicted
    uint64_t hits;               // References to resident pages
    
    // Derived metrics
    double hit_ratio;            // hits / (hits + dequeues)
    double eviction_rate;        // dequeues / time_window
    
    // Queue health
    uint32_t current_size;       // Pages currently resident
    uint32_t peak_size;          // Maximum observed size
    
    // Timing
    uint64_t total_dequeue_ns;   // Total time in dequeue path
    uint64_t max_dequeue_ns;     // Worst-case dequeue latency
    
    // Per-process breakdown
    struct process_stats {
        pid_t pid;
        uint32_t pages_resident;
        uint32_t faults_total;
    } per_process[MAX_PROCESSES];
    
} FIFOMetrics;
 
// Hook into queue operations
void metrics_on_enqueue(FIFOMetrics* m, pid_t pid) {
    atomic_fetch_add(&m->enqueues, 1);
    atomic_fetch_add(&m->current_size, 1);
    
    if (m->current_size > m->peak_size) {
        m->peak_size = m->current_size;
    }
}
 
void metrics_on_dequeue(FIFOMetrics* m, pid_t pid, uint64_t latency_ns) {
    atomic_fetch_add(&m->dequeues, 1);
    atomic_fetch_sub(&m->current_size, 1);
    atomic_fetch_add(&m->total_dequeue_ns, latency_ns);
    
    // Update max atomically (not perfectly accurate but close)
    uint64_t old_max = atomic_load(&m->max_dequeue_ns);
    while (latency_ns > old_max) {
        if (atomic_compare_exchange_weak(&m->max_dequeue_ns, &old_max, latency_ns)) {
            break;
        }
    }
}

Debugging Strategies:

FIFO Queue Debugging Approaches
Issue	Symptom	Debugging Approach
Excessive evictions	High fault rate despite memory	Check per-process page counts; identify memory hogs
Queue corruption	Kernel crash in queue code	Enable queue validation on each operation
Stale pages	Wrong data in pages	Verify TLB flush completeness after eviction
Memory leak	Queue never shrinks	Track process termination cleanup
Performance degradation	Slow queue operations	Profile lock contention, cache misses
Unfairness	One process dominates frames	Implement per-process quotas, monitor allocation

Production Monitoring

In production systems, expose queue metrics via /proc or sysfs interfaces. Key alerts: hit ratio drops below threshold, per-process frame count exceeds limit, queue size approaches capacity. These indicate memory pressure or anomalous behavior.

Module Summary: FIFO Page Replacement

We've now completed our comprehensive exploration of the FIFO page replacement algorithm. Let's consolidate everything we've learned across all five pages of this module:

Module Summary: Key Concepts

•First-In, First-Out Principle — FIFO evicts the page that has been resident longest, using arrival time as a proxy for future utility.
•Simple Implementation — A queue (typically circular buffer) naturally represents FIFO ordering with O(1) operations and zero hit overhead.
•Belady's Anomaly — FIFO lacks the stack property, allowing more frames to paradoxically increase page faults for certain reference strings.
•Performance Limitations — FIFO typically produces 20-50% more faults than optimal; hot-cold workloads expose its weakness in protecting frequently-used pages.
•Queue Management — Production implementations require concurrency safety, page table integration, cache-optimized layout, and robust edge-case handling.

When to Use FIFO:

Scenario	FIFO Appropriate?	Reason
Teaching/prototyping	Yes	Simplicity aids understanding
Sequential workloads	Yes	Matches access pattern
Abundant memory	Yes	Eviction quality matters less
Hot-cold workloads	No	Cannot protect hot pages
Memory-constrained systems	No	Performance gap too large
Production OS paging	No	LRU approximations superior

The Bigger Picture:

FIFO represents the simplest non-trivial page replacement algorithm. Its limitations—Belady's anomaly, inability to adapt to access patterns, poor performance under memory pressure—motivate the development of more sophisticated algorithms. Understanding FIFO deeply prepares you to appreciate why LRU, Clock, and modern adaptive algorithms make the design choices they do.

Module Complete

You have mastered the FIFO page replacement algorithm, from theoretical foundations through practical implementation. You understand not just how FIFO works, but why it works that way, when it succeeds, when it fails, and what its limitations teach us about algorithm design. Continue to the next module to explore the Optimal algorithm and understand the theoretical best case for page replacement.

5 / 5

Loading learning content...

Operating SystemsPage Replacement

FIFO Page Replacement Algorithm

LevelIntermediate

Duration60 mins

TopicPage Replacement

5 / 5

Queue Management

The Operational Heart of FIFO

In this final page, we explore queue management comprehensively—from low-level memory layout to high-level integration with page table management and multi-process coordination.

What You Will Learn

Queue Data Structure Deep Dive

The queue for FIFO page replacement tracks which pages are resident and their arrival order. Let's examine implementation options in detail.

Option 1: Circular Buffer Implementation

The circular buffer (ring buffer) is the most efficient queue implementation for fixed-capacity scenarios like page replacement:

typedef struct {
    page_t* buffer;     // Array of page identifiers
    uint32_t capacity;  // Maximum entries (= number of frames)
    uint32_t head;      // Index of oldest entry (dequeue here)
    uint32_t tail;      // Index for next insert (enqueue here)
    uint32_t count;     // Current number of entries
} CircularQueue;

Key Operations:

circular_queue.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Enqueue: Add new page to back of queue
bool queue_enqueue(CircularQueue* q, page_t page) {
    if (q->count >= q->capacity) {
        return false;  // Queue full - must dequeue first
    }
    
    q->buffer[q->tail] = page;
    q->tail = (q->tail + 1) % q->capacity;  // Wrap around
    q->count++;
    return true;
}
 
// Dequeue: Remove oldest page from front of queue
page_t queue_dequeue(CircularQueue* q) {
    if (q->count == 0) {
        return INVALID_PAGE;  // Queue empty
    }
    
    page_t page = q->buffer[q->head];
    q->head = (q->head + 1) % q->capacity;  // Wrap around
    q->count--;
    return page;
}
 
// Peek: View oldest page without removing
page_t queue_peek(CircularQueue* q) {
    if (q->count == 0) {
        return INVALID_PAGE;
    }
    return q->buffer[q->head];
}
 
// Contains: Check if page is in queue (O(n) - can be optimized)
bool queue_contains(CircularQueue* q, page_t page) {
    for (uint32_t i = 0; i < q->count; i++) {
        uint32_t idx = (q->head + i) % q->capacity;
        if (q->buffer[idx] == page) {
            return true;
        }
    }
    return false;
}

Memory Layout Considerations:

Aspect	Circular Buffer	Linked List
Memory overhead	Fixed (capacity × sizeof(page_t))	Variable (n × (sizeof(page_t) + sizeof(ptr)))
Cache behavior	Excellent (contiguous)	Poor (scattered allocations)
Allocation	Single allocation	Per-element allocation
Resize capability	Requires new buffer + copy	Easy (just add nodes)
Implementation complexity	Low	Very low

For kernel implementations where frame count is fixed at boot, circular buffers are strongly preferred.

Cache Line Alignment

Per-Process vs. Global Queue Management

Operating systems can organize FIFO queues at different granularities:

Option A: Global Queue (System-Wide)

A single queue tracks all pages from all processes in arrival order.

[Process A, Page 5] → [Process B, Page 2] → [Process A, Page 7] → ...

Advantages:

Simple implementation: one queue to manage
Natural load balancing: active processes get more frames automatically
Memory efficient: single data structure

Disadvantages:

One process can starve others
No isolation: runaway process affects everyone
Harder to enforce per-process policies

Option B: Per-Process Queues (Local)

Each process has its own queue tracking only its own pages.

Process A: [Page 5] → [Page 7] → [Page 2] → ...
Process B: [Page 2] → [Page 1] → [Page 4] → ...
Process C: [Page 1] → [Page 3] → ...

Advantages:

Process isolation: each process manages its own pages
Fairness: can enforce equal allocation
Predictability: process behavior doesn't affect others

Disadvantages:

More complex: N queues to manage
Less flexible: idle processes hold frames
Per-process overhead

Converting Mermaid diagram...

Comparison of Queue Organization Strategies
Factor	Global Queue	Per-Process Queues	Hybrid
Fairness	Weak	Strong	Configurable
Implementation	Simple	Complex	Moderate
Memory efficiency	High	Moderate	Moderate
Isolation	None	Full	Partial
Load balancing	Automatic	Manual	Partial
Scalability	Limited (contention)	Good	Good
Used in practice	Rare	Linux (local replacement)	Windows (working sets)

Modern OS Approach

Concurrency and Synchronization

In multi-processor systems, FIFO queue operations must be thread-safe. Multiple CPUs can trigger page faults simultaneously, requiring synchronized access to the queue.

The Concurrency Challenge:

Page faults can occur on any CPU at any time. Without synchronization:

Two CPUs might dequeue the same page
Concurrent enqueue operations might corrupt the queue
Page table updates might become inconsistent with queue state

Synchronization Approaches:

concurrent_fifo.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Approach 1: Global Lock (Simple but Contention-Prone)
typedef struct {
    CircularQueue queue;
    spinlock_t lock;
} LockedFIFOQueue;
 
page_t locked_dequeue(LockedFIFOQueue* q) {
    spin_lock(&q->lock);
    page_t page = queue_dequeue(&q->queue);
    spin_unlock(&q->lock);
    return page;
}
 
// Approach 2: Per-CPU Queues (Scalable but Complex)
typedef struct {
    CircularQueue per_cpu_queues[MAX_CPUS];
    spinlock_t global_lock;  // For cross-CPU operations
} PerCPUFIFOQueues;
 
// Each CPU manages its own queue, reducing contention
// Global lock only needed for rebalancing
 
// Approach 3: Lock-Free Queue (Advanced)
typedef struct {
    atomic_uint head;
    atomic_uint tail;
    _Atomic(page_t) buffer[MAX_FRAMES];
} LockFreeCircularQueue;
 
bool lockfree_enqueue(LockFreeCircularQueue* q, page_t page) {
    uint32_t tail = atomic_load(&q->tail);
    uint32_t next_tail = (tail + 1) % MAX_FRAMES;
    
    // CAS loop for thread-safe insertion
    while (!atomic_compare_exchange_weak(&q->tail, &tail, next_tail)) {
        tail = atomic_load(&q->tail);
        next_tail = (tail + 1) % MAX_FRAMES;
    }
    
    atomic_store(&q->buffer[tail], page);
    return true;
}

Synchronization Strategy Trade-offs
Strategy	Contention	Complexity	Latency	Best For
Global spinlock	High	Low	Variable (high under load)	Single-CPU or low-contention
Per-CPU queues	Low	Moderate	Low (local access)	Multi-CPU, uneven load OK
Lock-free	Very low	High	Low	High-performance systems
RCU-based	Very low	Very high	Low	Read-heavy workloads

Lock Ordering

Integration with Page Tables

FIFO queue management must coordinate closely with page table operations. Every queue change implies a corresponding page table change, and vice versa.

The Coordination Requirements:

On Enqueue (Page Load):
- Allocate physical frame
- Read page from disk into frame
- Update page table entry: set valid bit, set frame number
- Enqueue page identifier
- Flush TLB entry (if cached stale)
On Dequeue (Page Eviction):
- Identify victim from queue
- Check dirty bit in page table entry
- If dirty: write page to disk
- Update page table entry: clear valid bit
- Dequeue page identifier
- Flush TLB entry (prevent stale access)
- Release physical frame to free list

fifo_page_table_integration.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// Complete page load with queue and page table coordination
int fifo_load_page(process_t* proc, vaddr_t vaddr) {
    // 1. Get or allocate a frame
    frame_t frame;
    
    if (free_frame_available()) {
        frame = allocate_free_frame();
    } else {
        // Must evict - coordinate queue and page table
        page_info_t victim = queue_dequeue(&proc->fifo_queue);
        frame = victim.frame;
        
        // Update victim's page table entry
        pte_t* victim_pte = get_pte(victim.process, victim.vaddr);
        
        if (pte_is_dirty(victim_pte)) {
            // Write back to swap
            disk_write(get_swap_slot(victim), frame_to_kaddr(frame));
        }
        
        pte_clear_valid(victim_pte);
        tlb_flush_entry(victim.vaddr);
    }
    
    // 2. Read the faulted page from disk
    swap_slot_t slot = get_swap_slot_for_page(proc, vaddr);
    disk_read(slot, frame_to_kaddr(frame));
    
    // 3. Update page table for new page
    pte_t* pte = get_pte(proc, vaddr);
    pte_set_frame(pte, frame);
    pte_set_valid(pte);
    pte_clear_dirty(pte);  // Just loaded, not yet modified
    
    // 4. Enqueue the newly loaded page
    page_info_t new_page = {
        .process = proc,
        .vaddr = vaddr,
        .frame = frame
    };
    queue_enqueue(&proc->fifo_queue, new_page);
    
    // 5. TLB may cache new entry automatically on next access
    
    return 0;  // Success
}

Critical Atomicity Requirement:

Page table and queue updates must be atomic from the perspective of the faulting process:

Pre-fault state: Page not in queue, PTE invalid
Post-success state: Page in queue, PTE valid, frame contains page data
No intermediate state should be observable

This requires holding appropriate locks throughout the fault handler execution and ensuring disk I/O completion before releasing locks or resuming the process.

The Dirty Bit Efficiency

Memory Layout Optimization

Queue memory layout significantly affects kernel performance. Thoughtful layout reduces cache misses during queue operations.

Cache-Optimized Queue Structure:

cache_optimized_queue.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Cache line size on x86-64
#define CACHE_LINE_SIZE 64
 
// Pack hot fields together
typedef struct __attribute__((aligned(CACHE_LINE_SIZE))) {
    // Hot fields - accessed on every operation (fit in one cache line)
    uint32_t head;          // 4 bytes
    uint32_t tail;          // 4 bytes
    uint32_t count;         // 4 bytes
    uint32_t capacity;      // 4 bytes
    spinlock_t lock;        // 4 bytes (typically)
    uint8_t  _pad1[44];     // Pad to 64 bytes
    
    // Cold fields - rarely accessed (separate cache line)
    stats_t stats;          // Statistics (optional)
    uint32_t resize_count;  // Debug info
    
    // Buffer - should be cache-line aligned
    page_entry_t buffer[] __attribute__((aligned(CACHE_LINE_SIZE)));
} OptimizedQueue;
 
// Page entry should be power-of-2 size for efficient indexing
typedef struct {
    uint32_t process_id;    // 4 bytes
    uint32_t page_number;   // 4 bytes
    frame_t  frame;         // 4 bytes
    uint32_t flags;         // 4 bytes
} page_entry_t;  // 16 bytes - exactly power of 2
 
// Efficient indexing (multiplication by power of 2 = shift)
static inline page_entry_t* get_entry(OptimizedQueue* q, uint32_t idx) {
    // idx * 16 = idx << 4, very fast
    return &q->buffer[idx];
}

Layout Principles:

Queue Memory Layout Guidelines
Principle	Implementation	Benefit
Hot/cold separation	Put head/tail/count in first cache line	Single cache miss for common ops
Cache line alignment	attribute((aligned(64)))	No false sharing, clean lines
Power-of-2 entries	Make entry size 8, 16, 32, or 64 bytes	Shift instead of multiply
Contiguous buffer	Allocate buffer as single block	Prefetcher works effectively
Avoid pointers	Use indices instead of pointers in queue	Better cache locality
NUMA awareness	Allocate queue on local NUMA node	Avoid cross-node latency

The False Sharing Trap

Handling Edge Cases and Error Conditions

Robust queue management must handle edge cases that can occur in real systems:

Edge Case 1: Queue Corruption Recovery

Hardware errors or bugs might corrupt queue state. Defensive programming can detect and recover:

bool queue_validate(CircularQueue* q) {
    // Invariants that must hold
    if (q->count > q->capacity) return false;
    if (q->head >= q->capacity) return false;
    if (q->tail >= q->capacity) return false;
    
    // Count should match head/tail relationship
    uint32_t expected_count;
    if (q->tail >= q->head) {
        expected_count = q->tail - q->head;
    } else {
        expected_count = q->capacity - q->head + q->tail;
    }
    if (expected_count != q->count) return false;
    
    return true;
}

Edge Case 2: Process Termination

When a process terminates, all its pages must be removed from the queue:

void queue_remove_process(CircularQueue* q, pid_t pid) {
    // Must iterate and compact - O(n) operation
    uint32_t read_idx = q->head;
    uint32_t write_idx = q->head;
    uint32_t remaining = q->count;
    
    while (remaining > 0) {
        page_entry_t* entry = &q->buffer[read_idx];
        
        if (entry->process_id != pid) {
            // Keep this entry
            if (write_idx != read_idx) {
                q->buffer[write_idx] = *entry;
            }
            write_idx = (write_idx + 1) % q->capacity;
        }
        // else: skip (remove) this entry
        
        read_idx = (read_idx + 1) % q->capacity;
        remaining--;
    }
    
    q->tail = write_idx;
    q->count = /* recalculate */;
}

Edge Case 3: Hot Unplug (NUMA/Memory)

Modern servers support memory hot-removal. Queued pages in removed memory must be handled:

Identify pages in affected memory range
Migrate data to remaining memory
Update queue entries with new frame locations
Update page tables to reflect new frames
Complete hot-unplug

Other Edge Cases to Handle

•Fork and copy-on-write — Parent and child initially share pages; queue must handle this sharing correctly until COW triggers
•Memory pressure signals — Queue may need to proactively evict pages when pressure increases, before faults occur
•Hibernation/resume — Queue state must be persisted and restored with correct frame mappings
•OOM conditions — When swap is full, eviction cannot succeed; must trigger OOM killer or fail gracefully
•Shared memory regions — Pages mapped by multiple processes need careful refcounting in queue

The OOM Trap

Debugging and Monitoring

Effective queue management requires visibility into queue behavior for debugging and performance monitoring.

Essential Metrics to Track:

fifo_metrics.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
typedef struct {
    // Operation counts
    uint64_t enqueues;           // Total pages loaded
    uint64_t dequeues;           // Total pages evicted
    uint64_t hits;               // References to resident pages
    
    // Derived metrics
    double hit_ratio;            // hits / (hits + dequeues)
    double eviction_rate;        // dequeues / time_window
    
    // Queue health
    uint32_t current_size;       // Pages currently resident
    uint32_t peak_size;          // Maximum observed size
    
    // Timing
    uint64_t total_dequeue_ns;   // Total time in dequeue path
    uint64_t max_dequeue_ns;     // Worst-case dequeue latency
    
    // Per-process breakdown
    struct process_stats {
        pid_t pid;
        uint32_t pages_resident;
        uint32_t faults_total;
    } per_process[MAX_PROCESSES];
    
} FIFOMetrics;
 
// Hook into queue operations
void metrics_on_enqueue(FIFOMetrics* m, pid_t pid) {
    atomic_fetch_add(&m->enqueues, 1);
    atomic_fetch_add(&m->current_size, 1);
    
    if (m->current_size > m->peak_size) {
        m->peak_size = m->current_size;
    }
}
 
void metrics_on_dequeue(FIFOMetrics* m, pid_t pid, uint64_t latency_ns) {
    atomic_fetch_add(&m->dequeues, 1);
    atomic_fetch_sub(&m->current_size, 1);
    atomic_fetch_add(&m->total_dequeue_ns, latency_ns);
    
    // Update max atomically (not perfectly accurate but close)
    uint64_t old_max = atomic_load(&m->max_dequeue_ns);
    while (latency_ns > old_max) {
        if (atomic_compare_exchange_weak(&m->max_dequeue_ns, &old_max, latency_ns)) {
            break;
        }
    }
}

Debugging Strategies:

FIFO Queue Debugging Approaches
Issue	Symptom	Debugging Approach
Excessive evictions	High fault rate despite memory	Check per-process page counts; identify memory hogs
Queue corruption	Kernel crash in queue code	Enable queue validation on each operation
Stale pages	Wrong data in pages	Verify TLB flush completeness after eviction
Memory leak	Queue never shrinks	Track process termination cleanup
Performance degradation	Slow queue operations	Profile lock contention, cache misses
Unfairness	One process dominates frames	Implement per-process quotas, monitor allocation

Production Monitoring

Module Summary: FIFO Page Replacement

We've now completed our comprehensive exploration of the FIFO page replacement algorithm. Let's consolidate everything we've learned across all five pages of this module:

Module Summary: Key Concepts

•First-In, First-Out Principle — FIFO evicts the page that has been resident longest, using arrival time as a proxy for future utility.
•Simple Implementation — A queue (typically circular buffer) naturally represents FIFO ordering with O(1) operations and zero hit overhead.
•Belady's Anomaly — FIFO lacks the stack property, allowing more frames to paradoxically increase page faults for certain reference strings.
•Performance Limitations — FIFO typically produces 20-50% more faults than optimal; hot-cold workloads expose its weakness in protecting frequently-used pages.
•Queue Management — Production implementations require concurrency safety, page table integration, cache-optimized layout, and robust edge-case handling.

When to Use FIFO:

Scenario	FIFO Appropriate?	Reason
Teaching/prototyping	Yes	Simplicity aids understanding
Sequential workloads	Yes	Matches access pattern
Abundant memory	Yes	Eviction quality matters less
Hot-cold workloads	No	Cannot protect hot pages
Memory-constrained systems	No	Performance gap too large
Production OS paging	No	LRU approximations superior

The Bigger Picture:

Module Complete

5 / 5