Operating SystemsPage Fault Handling

Page Fault Handling: The Critical Path to Virtual Memory

LevelIntermediate

Duration90 mins

TopicPage Fault Handling

1 / 5

Page Fault Detection

The Moment Memory Fails—And How We Detect It

Every time a process accesses memory, a complex dance occurs between the CPU and the Memory Management Unit (MMU). Most of the time, this dance is invisible and instantaneous—the virtual address maps to a physical frame, and execution continues. But sometimes, the requested page isn't in physical memory. At that precise instant, a page fault occurs, and the entire trajectory of the CPU changes.

Page fault detection is the foundational mechanism that enables virtual memory. Without the ability to detect when a page is absent from physical memory, the operating system would have no opportunity to bring that page in from disk. The elegance of this detection lies in its hardware-software partnership: the hardware detects the condition in nanoseconds, and the software (the OS) handles the complex recovery.

Understanding page fault detection isn't merely academic—it's essential for diagnosing performance issues, designing efficient memory management policies, and understanding why certain workloads exhibit particular behaviors. This page provides an exhaustive exploration of how page faults are detected, from the bit-level mechanisms to the architectural implications.

What You Will Learn

By the end of this page, you will understand: (1) The hardware mechanisms that detect page faults, (2) The role of the valid-invalid bit and protection bits, (3) The precise timing of fault detection during address translation, (4) How different architectures implement fault detection, and (5) The distinction between page faults and other memory exceptions.

The Virtual Memory Contract

Before we can understand page fault detection, we must understand the fundamental contract that virtual memory provides:

The Contract: A process is given a virtual address space that appears complete and contiguous. The process can reference any valid virtual address, and the system guarantees that:

If the page is in memory: The access completes with the expected semantics (read returns data, write modifies data, execute runs code).
If the page is not in memory: The access appears to complete correctly—the page is transparently brought in from secondary storage.

This contract creates an abstraction where the process doesn't know (or care) whether its pages are currently resident in physical memory. But to fulfill this contract, the system must be able to detect when a page is absent.

The Problem: Modern CPUs execute instructions at rates exceeding 3 billion per second. Each instruction may access memory multiple times (fetch the instruction, fetch operands, store results). Detecting absent pages must happen without adding significant overhead to the memory access path—ideally, it must be free for present pages and only incur cost when a page is actually absent.

Memory Access Scenarios and Detection Requirements
Scenario	Required Detection	Performance Constraint
Page present, access permitted	No fault detection needed	Must complete in ~1 cycle (TLB hit) or ~100 cycles (page table walk)
Page present, access denied	Protection violation detection	Must trap quickly; no disk I/O needed
Page absent (not in memory)	Page fault detection	Must trap and allow OS to handle; disk I/O expected
Invalid virtual address	Segmentation fault detection	Must trap; this is a program bug, not a recoverable condition

The Cost Asymmetry

When a page is present, memory access takes nanoseconds. When a page is absent, handling the page fault takes milliseconds—a difference of 6 orders of magnitude. This massive asymmetry means that detection must be optimized for the common case (page present) while still correctly identifying the rare case (page absent).

The Valid-Invalid Bit: The Core Detection Mechanism

The primary mechanism for page fault detection is the valid-invalid bit (or present bit) in each page table entry (PTE). This single bit encodes whether the corresponding page is currently resident in physical memory.

How it works:

When a page is loaded into a physical frame, the OS sets the valid bit to 1 and records the frame number in the PTE.
When a page is swapped out or has never been loaded, the OS sets the valid bit to 0.
During address translation, the MMU checks this bit.
If the bit is 0, the MMU generates a page fault exception before the memory access completes.

A critical distinction: The valid-invalid bit has two different meanings depending on context:

Semantic validity: Does this virtual address represent a legitimate part of the process's address space?
Residency validity: Is this page currently in physical memory?

Different systems handle these two concepts differently. Some use a single bit for both meanings; others use separate mechanisms.

page_table_entry.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Typical x86-64 Page Table Entry (PTE) structure (simplified)
// The actual format is defined by the processor architecture
 
struct PageTableEntry {
    uint64_t present        : 1;   // Bit 0:  Page is in physical memory (valid)
    uint64_t read_write     : 1;   // Bit 1:  0 = read-only, 1 = read/write
    uint64_t user_super     : 1;   // Bit 2:  0 = supervisor only, 1 = user accessible
    uint64_t write_through  : 1;   // Bit 3:  Write-through caching
    uint64_t cache_disabled : 1;   // Bit 4:  Disable caching
    uint64_t accessed       : 1;   // Bit 5:  Page has been accessed (set by MMU)
    uint64_t dirty          : 1;   // Bit 6:  Page has been written (set by MMU)
    uint64_t page_size      : 1;   // Bit 7:  Page size (0 = 4KB, 1 = larger)
    uint64_t global         : 1;   // Bit 8:  Global page (not flushed on context switch)
    uint64_t available      : 3;   // Bits 9-11: Available for OS use
    uint64_t frame_number   : 40;  // Bits 12-51: Physical frame number
    uint64_t reserved       : 11;  // Bits 52-62: Reserved
    uint64_t no_execute     : 1;   // Bit 63: No-execute bit (NX)
};
 
// Detection logic (conceptual - actually in hardware)
bool is_page_present(PageTableEntry pte) {
    return pte.present == 1;
}
 
// What the MMU does on each memory access (simplified)
PhysicalAddress translate(VirtualAddress vaddr) {
    PageTableEntry pte = walk_page_table(vaddr);
    
    if (!pte.present) {
        // Page fault! Transfer control to OS
        raise_exception(PAGE_FAULT, vaddr);
        // Does not return - OS handles this
    }
    
    if (!check_permissions(pte, access_type)) {
        // Protection violation!
        raise_exception(PROTECTION_FAULT, vaddr);
    }
    
    // Page is present and access is permitted
    return form_physical_address(pte.frame_number, get_offset(vaddr));
}

Why a Single Bit Suffices

A page is either in memory or it isn't—there's no partial state. The binary nature of this condition makes a single bit both necessary and sufficient. When the bit is 0, the rest of the PTE can be used by the OS to store information about where the page resides on disk (swap space location). The hardware ignores these bits when the present bit is 0.

Detection Timing: When Exactly Does Detection Occur?

Page fault detection doesn't happen at an arbitrary time—it occurs at a precisely defined point in the instruction execution pipeline. Understanding this timing is crucial for understanding why page faults can be handled and instructions restarted.

The Address Translation Pipeline:

CPU generates    TLB        Page Table    Valid Bit    Physical    Memory
Virtual Address → Lookup  →  Walk (if miss) → Check   → Address  → Access
                    ↓           ↓              ↓
                  TLB Hit:   ~100-400       Fault or
                   ~1 cycle   cycles        Continue

Critical timing property: The valid bit check occurs before any memory access is performed. If the page is not present:

The faulting memory access has not occurred.
No data has been read from or written to an invalid location.
No registers have been modified by the faulting instruction (for simple instructions).
The CPU state can be captured and later restored.

This property—called precise exceptions—is what makes page fault handling possible. Modern CPUs invest significant transistor budget to ensure that exceptions are precise, meaning the processor state when the exception is taken exactly matches what it would be if execution had stopped just before the faulting instruction.

Detection Points in the Address Translation Path
Stage	What's Checked	Possible Outcomes
Virtual Address Generation	Address calculated by instruction	May generate invalid address (NULL, out of range)
TLB Lookup	Is translation cached?	TLB hit (fast path) or TLB miss (initiate walk)
Page Table Walk	Traverse page table levels	Intermediate table entries may be invalid
Valid Bit Check	Is P bit = 1?	Page fault if P = 0
Permission Check	R/W, U/S, NX bits	Protection fault if access denied
Physical Address Formation	Combine frame + offset	Ready for memory access
Memory Access	Actual read/write to RAM	Access completes

The TLB Complication:

When a translation is cached in the TLB (Translation Lookaside Buffer), the valid bit check happens implicitly. Pages with valid = 0 are never cached in the TLB. A TLB entry only exists for pages that are present in physical memory. Therefore:

TLB hit: Page is guaranteed to be present (no fault)
TLB miss: Must walk page table; fault may occur during walk

This design ensures that the common path (TLB hit) involves no additional checking—the presence of a TLB entry is itself proof of page validity.

Complex Instructions and Multiple Faults

Some complex instructions (like x86's string operations or block move instructions) may access multiple memory locations. Each access can potentially fault. The architecture must ensure that either: (1) partial progress can be saved and resumed, or (2) such instructions can be restarted from the beginning. CISC architectures like x86 have intricate mechanisms for this, while RISC architectures typically avoid multi-memory instructions altogether.

Hardware Architecture: The MMU and Fault Generation

The Memory Management Unit (MMU) is the hardware component responsible for address translation and, consequently, page fault detection. Modern MMUs are sophisticated pieces of circuitry that handle millions of translations per second.

MMU Components Relevant to Fault Detection:

TLB (Translation Lookaside Buffer): A small, fast cache of recent translations. Only valid pages have TLB entries.
Page Table Walker: Hardware that traverses the page table hierarchy on TLB misses. Checks each level's valid bit.
Permission Checker: Compares the access type (read/write/execute) against the page's permission bits.
Exception Generation Logic: When the valid bit is 0 or permissions are violated, this circuitry generates an exception.

The Exception Generation Process:

When the MMU detects a page fault:

The current instruction is aborted (if it hasn't completed).
The faulting virtual address is saved to a special register (e.g., CR2 on x86).
The type of fault is encoded and saved.
The CPU transitions to kernel mode.
Control is transferred to the page fault handler at a predefined address.

Converting Mermaid diagram...

Architecture-Specific Details:

x86/x86-64: The page fault is exception vector 14 (0x0E). The faulting address is stored in CR2. An error code pushed on the stack indicates whether the fault was read/write, user/supervisor, and whether the page was present (present bit for validating if it was a permission issue vs. a true 'page not present' fault).

ARM (AArch64): Page faults generate a Synchronous Abort. The Fault Address Register (FAR) holds the faulting address. The Exception Syndrome Register (ESR) encodes the fault type.

RISC-V: Page faults are classified as load page fault, store/AMO page fault, or instruction page fault. The stval CSR holds the faulting address.

Despite architectural differences, the core detection mechanism remains the same: the MMU checks the valid bit and raises an exception if the page is absent.

Types of Page Faults: Not All Faults Are Equal

While we've focused on the classic case—accessing a page that isn't in memory—the valid-invalid bit mechanism actually detects several different conditions. The OS must distinguish among these to respond appropriately.

Classification of Page Faults:

Page Fault Types and Their Meanings

•Minor (Soft) Page Fault: The page is in physical memory but not mapped into this process's page table. Common when multiple processes share read-only pages (like shared libraries). Resolution: Update page table entry—no disk I/O needed.
•Major (Hard) Page Fault: The page is not in physical memory and must be read from disk. This is the 'classic' page fault. Resolution: Find page on disk, allocate frame, read page, update PTE.
•Invalid Access Fault: The virtual address is not part of the process's valid address space. This is a bug (e.g., NULL pointer dereference, out-of-bounds access). Resolution: Terminate process with segmentation fault.
•Protection Fault: The page is present, but the access type is not permitted (e.g., writing to read-only page, executing non-executable page). Resolution depends on context—may be copy-on-write trigger or a genuine error.

Page Fault Type Detection and Resolution
Fault Type	Detection Method	Typical Resolution	Performance Impact
Minor fault	Valid bit = 0, but page in memory	Map page, update PTE	~1-10 µs
Major fault	Valid bit = 0, page on disk	Disk I/O to load page	~5-15 ms (SSD), ~10-50 ms (HDD)
Invalid access	Address not in valid ranges	Kill process (SIGSEGV)	N/A (process terminates)
Protection fault	Valid = 1, permission denied	COW trigger or kill process	~1-10 µs (COW) or terminate

The Copy-on-Write Insight

Copy-on-write (COW) cleverly repurposes the protection fault mechanism. Pages shared between parent and child after fork() are marked read-only. When either process writes, a protection fault occurs—but instead of terminating the process, the OS recognizes this as a COW trigger, makes a private copy, and resumes execution. The same detection hardware serves multiple purposes.

Multi-Level Page Table Considerations

Modern systems use multi-level page tables (2-5 levels, depending on address space size and architecture). Page fault detection is more nuanced with multi-level tables because a fault can occur at any level.

Faults at Intermediate Levels:

When walking a multi-level page table, the MMU must check the valid bit of each intermediate table entry:

Level 4 (PML4)  →  Level 3 (PDPT)  →  Level 2 (PD)  →  Level 1 (PT)  →  Page
     ↓                  ↓                 ↓                ↓
  Valid?            Valid?            Valid?          Valid?

If any intermediate level entry has valid = 0, the walk terminates with a page fault. But what does this mean?

Interpretation of Intermediate Invalid Entries:

A whole region of the address space has no physical mapping.
The OS hasn't allocated page table pages for this region.
This is common—most of a 64-bit address space is unmapped.

The OS Perspective:

When the OS receives a page fault, it doesn't immediately know at which level the fault occurred. The OS must:

Look up the faulting address in its own data structures (VMAs, memory maps).
Determine if the address is valid for this process.
If valid, determine what action to take (load from disk, zero-fill, etc.).
Possibly allocate intermediate page table pages before the final mapping can be created.

multilevel_fault_handling.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Conceptual example: Handling faults in a 4-level page table
// This is what the OS might do when handling a page fault
 
void handle_page_fault(VirtualAddress faulting_addr) {
    Process *current = get_current_process();
    
    // Step 1: Is this address valid for this process?
    VMA *vma = find_vma(current->mm, faulting_addr);
    if (vma == NULL) {
        // Address is not in any valid memory region
        send_signal(current, SIGSEGV);
        return;
    }
    
    // Step 2: Check permissions
    if (!vma_permits(vma, current_access_type)) {
        // Might be COW or genuine protection violation
        if (is_cow_page(vma, faulting_addr)) {
            handle_cow(current, faulting_addr);
            return;
        }
        send_signal(current, SIGSEGV);
        return;
    }
    
    // Step 3: Walk page table, creating intermediate levels if needed
    pte_t *pte = walk_page_table_allocating(current->pgdir, faulting_addr);
    
    // Step 4: Determine source of page content
    if (vma_is_anonymous(vma)) {
        // Zero-fill page (anonymous memory like heap/stack)
        Page *page = allocate_zeroed_page();
        map_page(pte, page, vma->permissions);
    } else {
        // File-backed: read from file
        Page *page = allocate_page();
        read_page_from_file(vma->file, vma_offset(vma, faulting_addr), page);
        map_page(pte, page, vma->permissions);
    }
    
    // Page is now mapped, process can resume
}

Lazy Allocation at All Levels

Operating systems employ lazy allocation not just for pages but for page table pages themselves. A fresh process doesn't have all 4 levels of page tables pre-allocated. When a page fault occurs in an unmapped region (but valid according to the VMA), the OS allocates the necessary intermediate page table pages on demand. This saves memory for processes with sparse address space usage.

Detection vs. Handling: The Critical Distinction

It's important to clearly distinguish detection from handling. This page focuses on detection—the remaining pages in this module cover handling.

Detection is a hardware operation:

Performed by the MMU
Takes nanoseconds
Results in an exception being raised
Saves minimal information (faulting address, fault type)

Handling is a software operation:

Performed by the OS kernel
Takes microseconds to milliseconds
Determines what to do about the fault
May involve complex policy decisions and I/O

The handoff between detection and handling:

Detection to Handling Transition
Step	Performed By	Time Scale	Action
Address translation	MMU hardware	nanoseconds	Walk page table, check valid bit
Fault detection	MMU hardware	nanoseconds	Valid bit = 0 detected
Exception generation	CPU hardware	nanoseconds	Save state, switch to kernel mode
Handler dispatch	CPU + OS	nanoseconds	Jump to page fault handler
Fault analysis	OS kernel	microseconds	Determine fault type and action
Page acquisition	OS kernel + I/O	milliseconds	Read page from disk if needed
Page table update	OS kernel	microseconds	Set valid bit, update mapping
Instruction restart	CPU hardware	nanoseconds	Resume faulting instruction

What detection provides to handling:

When the page fault handler begins execution, it has access to:

The faulting virtual address (from a special register like CR2)
The fault type (encoded in an error code)
The saved processor state (registers, flags, instruction pointer)
The access type (read, write, or execute)
The privilege level (user mode or kernel mode)

This information is everything the OS needs to begin its analysis. The hardware's job is done—it has detected the condition and preserved enough state for software to handle it.

Kernel Page Faults Are Different

When a page fault occurs in kernel mode, the situation is more delicate. The kernel can't use its normal mechanisms (which might themselves cause page faults). Kernel page faults typically occur when accessing user-space memory or in specific, controlled circumstances. A page fault due to a kernel bug is usually fatal—the dreaded kernel panic or BSOD.

Performance Implications of Fault Detection

The design of fault detection has significant performance implications. Understanding these helps in optimizing memory-intensive applications and in understanding system behavior.

The Zero-Overhead Principle:

Fault detection adds no overhead to the common case (page present). This is achieved because:

The valid bit is checked as part of the normal translation path.
TLB entries inherently encode validity—no separate check needed for TLB hits.
The exception path is only taken when truly necessary.

Overhead of Taking a Fault:

While detection itself is 'free,' taking a fault is expensive:

Mode switch: Transition from user to kernel mode
Pipeline flush: The CPU's instruction pipeline is emptied
Cache pollution: Kernel code may displace user data from caches
Context saving: Registers must be saved for later restoration
Handler execution: The fault handler must run

Even for a minor fault (no disk I/O), the overhead is typically 1,000-10,000 cycles.

Optimizing for Fault Detection Performance

•Prefaulting: Proactively touch pages before they're needed, spreading faults over time instead of bursting them at a hot spot.
•Huge pages: Fewer PTEs means fewer potential faults. A 2MB page replaces 512 4KB pages.
•Memory locking (mlock): Prevent specific pages from being evicted, eliminating major faults for critical data.
•NUMA-aware allocation: Keeping pages local to the CPU that accesses them reduces remote access delays after faults.
•madvise() hints: Tell the kernel your access patterns so it can prefetch intelligently.

Monitoring Page Faults

On Linux, use perf stat -e page-faults,minor-faults,major-faults to monitor a process's fault behavior. High major fault rates indicate I/O-bound behavior due to memory pressure. High minor fault rates may indicate working set changes or excessive forking. Zero faults after warmup indicates optimal memory residency.

Summary: The Foundation of Virtual Memory

Page fault detection is the silent sentinel that enables virtual memory's fundamental promise: the illusion of unlimited, contiguous memory for each process. Let's consolidate the key concepts:

Key Takeaways

•The valid-invalid bit is the core detection mechanism—a single bit in each PTE that indicates whether the page is resident in physical memory.
•Detection occurs during address translation, before any memory access completes. This enables the fault to be handled and the instruction to be transparently restarted.
•The MMU performs detection as part of its normal operation. For present pages, there is zero overhead. For absent pages, an exception is raised.
•TLB entries only exist for valid pages, so TLB hits implicitly guarantee validity. Detection overhead only applies to TLB misses that discover invalid pages.
•Different fault types exist: minor faults (page in memory but not mapped), major faults (page on disk), invalid access faults (bug), and protection faults (permission denied or COW trigger).
•Multi-level page tables require validity checks at each level. A fault at any level triggers the same exception.
•Detection is hardware; handling is software. The hardware's job is to detect and preserve state. The OS's job is to analyze, remediate, and resume.

What's Next:

With detection understood, the next page explores what happens immediately after: Trap to OS. We'll examine how the CPU transitions from executing user code to executing the kernel's page fault handler, what state is saved, and how the handler begins its analysis of the fault.

Page Complete

You now understand how page faults are detected at the hardware level. The valid-invalid bit, checked during every address translation, is the gatekeeper that enables the OS to intercept absent pages and maintain the illusion of virtual memory. Next, we'll follow the fault into the operating system to see how the trap mechanism transfers control to the kernel.

1 / 5

Loading learning content...

Operating SystemsPage Fault Handling

Page Fault Handling: The Critical Path to Virtual Memory

LevelIntermediate

Duration90 mins

TopicPage Fault Handling

1 / 5

Page Fault Detection

The Moment Memory Fails—And How We Detect It

What You Will Learn

The Virtual Memory Contract

Before we can understand page fault detection, we must understand the fundamental contract that virtual memory provides:

The Contract: A process is given a virtual address space that appears complete and contiguous. The process can reference any valid virtual address, and the system guarantees that:

If the page is in memory: The access completes with the expected semantics (read returns data, write modifies data, execute runs code).
If the page is not in memory: The access appears to complete correctly—the page is transparently brought in from secondary storage.

Memory Access Scenarios and Detection Requirements
Scenario	Required Detection	Performance Constraint
Page present, access permitted	No fault detection needed	Must complete in ~1 cycle (TLB hit) or ~100 cycles (page table walk)
Page present, access denied	Protection violation detection	Must trap quickly; no disk I/O needed
Page absent (not in memory)	Page fault detection	Must trap and allow OS to handle; disk I/O expected
Invalid virtual address	Segmentation fault detection	Must trap; this is a program bug, not a recoverable condition

The Cost Asymmetry

The Valid-Invalid Bit: The Core Detection Mechanism

How it works:

When a page is loaded into a physical frame, the OS sets the valid bit to 1 and records the frame number in the PTE.
When a page is swapped out or has never been loaded, the OS sets the valid bit to 0.
During address translation, the MMU checks this bit.
If the bit is 0, the MMU generates a page fault exception before the memory access completes.

A critical distinction: The valid-invalid bit has two different meanings depending on context:

Semantic validity: Does this virtual address represent a legitimate part of the process's address space?
Residency validity: Is this page currently in physical memory?

Different systems handle these two concepts differently. Some use a single bit for both meanings; others use separate mechanisms.

page_table_entry.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Typical x86-64 Page Table Entry (PTE) structure (simplified)
// The actual format is defined by the processor architecture
 
struct PageTableEntry {
    uint64_t present        : 1;   // Bit 0:  Page is in physical memory (valid)
    uint64_t read_write     : 1;   // Bit 1:  0 = read-only, 1 = read/write
    uint64_t user_super     : 1;   // Bit 2:  0 = supervisor only, 1 = user accessible
    uint64_t write_through  : 1;   // Bit 3:  Write-through caching
    uint64_t cache_disabled : 1;   // Bit 4:  Disable caching
    uint64_t accessed       : 1;   // Bit 5:  Page has been accessed (set by MMU)
    uint64_t dirty          : 1;   // Bit 6:  Page has been written (set by MMU)
    uint64_t page_size      : 1;   // Bit 7:  Page size (0 = 4KB, 1 = larger)
    uint64_t global         : 1;   // Bit 8:  Global page (not flushed on context switch)
    uint64_t available      : 3;   // Bits 9-11: Available for OS use
    uint64_t frame_number   : 40;  // Bits 12-51: Physical frame number
    uint64_t reserved       : 11;  // Bits 52-62: Reserved
    uint64_t no_execute     : 1;   // Bit 63: No-execute bit (NX)
};
 
// Detection logic (conceptual - actually in hardware)
bool is_page_present(PageTableEntry pte) {
    return pte.present == 1;
}
 
// What the MMU does on each memory access (simplified)
PhysicalAddress translate(VirtualAddress vaddr) {
    PageTableEntry pte = walk_page_table(vaddr);
    
    if (!pte.present) {
        // Page fault! Transfer control to OS
        raise_exception(PAGE_FAULT, vaddr);
        // Does not return - OS handles this
    }
    
    if (!check_permissions(pte, access_type)) {
        // Protection violation!
        raise_exception(PROTECTION_FAULT, vaddr);
    }
    
    // Page is present and access is permitted
    return form_physical_address(pte.frame_number, get_offset(vaddr));
}

Why a Single Bit Suffices

Detection Timing: When Exactly Does Detection Occur?

The Address Translation Pipeline:

CPU generates    TLB        Page Table    Valid Bit    Physical    Memory
Virtual Address → Lookup  →  Walk (if miss) → Check   → Address  → Access
                    ↓           ↓              ↓
                  TLB Hit:   ~100-400       Fault or
                   ~1 cycle   cycles        Continue

Critical timing property: The valid bit check occurs before any memory access is performed. If the page is not present:

The faulting memory access has not occurred.
No data has been read from or written to an invalid location.
No registers have been modified by the faulting instruction (for simple instructions).
The CPU state can be captured and later restored.

Detection Points in the Address Translation Path
Stage	What's Checked	Possible Outcomes
Virtual Address Generation	Address calculated by instruction	May generate invalid address (NULL, out of range)
TLB Lookup	Is translation cached?	TLB hit (fast path) or TLB miss (initiate walk)
Page Table Walk	Traverse page table levels	Intermediate table entries may be invalid
Valid Bit Check	Is P bit = 1?	Page fault if P = 0
Permission Check	R/W, U/S, NX bits	Protection fault if access denied
Physical Address Formation	Combine frame + offset	Ready for memory access
Memory Access	Actual read/write to RAM	Access completes

The TLB Complication:

TLB hit: Page is guaranteed to be present (no fault)
TLB miss: Must walk page table; fault may occur during walk

This design ensures that the common path (TLB hit) involves no additional checking—the presence of a TLB entry is itself proof of page validity.

Complex Instructions and Multiple Faults

Hardware Architecture: The MMU and Fault Generation

MMU Components Relevant to Fault Detection:

TLB (Translation Lookaside Buffer): A small, fast cache of recent translations. Only valid pages have TLB entries.
Page Table Walker: Hardware that traverses the page table hierarchy on TLB misses. Checks each level's valid bit.
Permission Checker: Compares the access type (read/write/execute) against the page's permission bits.
Exception Generation Logic: When the valid bit is 0 or permissions are violated, this circuitry generates an exception.

The Exception Generation Process:

When the MMU detects a page fault:

The current instruction is aborted (if it hasn't completed).
The faulting virtual address is saved to a special register (e.g., CR2 on x86).
The type of fault is encoded and saved.
The CPU transitions to kernel mode.
Control is transferred to the page fault handler at a predefined address.

Converting Mermaid diagram...

Architecture-Specific Details:

ARM (AArch64): Page faults generate a Synchronous Abort. The Fault Address Register (FAR) holds the faulting address. The Exception Syndrome Register (ESR) encodes the fault type.

RISC-V: Page faults are classified as load page fault, store/AMO page fault, or instruction page fault. The stval CSR holds the faulting address.

Despite architectural differences, the core detection mechanism remains the same: the MMU checks the valid bit and raises an exception if the page is absent.

Types of Page Faults: Not All Faults Are Equal

Classification of Page Faults:

Page Fault Types and Their Meanings

•Minor (Soft) Page Fault: The page is in physical memory but not mapped into this process's page table. Common when multiple processes share read-only pages (like shared libraries). Resolution: Update page table entry—no disk I/O needed.
•Major (Hard) Page Fault: The page is not in physical memory and must be read from disk. This is the 'classic' page fault. Resolution: Find page on disk, allocate frame, read page, update PTE.
•Invalid Access Fault: The virtual address is not part of the process's valid address space. This is a bug (e.g., NULL pointer dereference, out-of-bounds access). Resolution: Terminate process with segmentation fault.
•Protection Fault: The page is present, but the access type is not permitted (e.g., writing to read-only page, executing non-executable page). Resolution depends on context—may be copy-on-write trigger or a genuine error.

Page Fault Type Detection and Resolution
Fault Type	Detection Method	Typical Resolution	Performance Impact
Minor fault	Valid bit = 0, but page in memory	Map page, update PTE	~1-10 µs
Major fault	Valid bit = 0, page on disk	Disk I/O to load page	~5-15 ms (SSD), ~10-50 ms (HDD)
Invalid access	Address not in valid ranges	Kill process (SIGSEGV)	N/A (process terminates)
Protection fault	Valid = 1, permission denied	COW trigger or kill process	~1-10 µs (COW) or terminate

The Copy-on-Write Insight

Multi-Level Page Table Considerations

Faults at Intermediate Levels:

When walking a multi-level page table, the MMU must check the valid bit of each intermediate table entry:

Level 4 (PML4)  →  Level 3 (PDPT)  →  Level 2 (PD)  →  Level 1 (PT)  →  Page
     ↓                  ↓                 ↓                ↓
  Valid?            Valid?            Valid?          Valid?

If any intermediate level entry has valid = 0, the walk terminates with a page fault. But what does this mean?

Interpretation of Intermediate Invalid Entries:

A whole region of the address space has no physical mapping.
The OS hasn't allocated page table pages for this region.
This is common—most of a 64-bit address space is unmapped.

The OS Perspective:

When the OS receives a page fault, it doesn't immediately know at which level the fault occurred. The OS must:

Look up the faulting address in its own data structures (VMAs, memory maps).
Determine if the address is valid for this process.
If valid, determine what action to take (load from disk, zero-fill, etc.).
Possibly allocate intermediate page table pages before the final mapping can be created.

multilevel_fault_handling.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Conceptual example: Handling faults in a 4-level page table
// This is what the OS might do when handling a page fault
 
void handle_page_fault(VirtualAddress faulting_addr) {
    Process *current = get_current_process();
    
    // Step 1: Is this address valid for this process?
    VMA *vma = find_vma(current->mm, faulting_addr);
    if (vma == NULL) {
        // Address is not in any valid memory region
        send_signal(current, SIGSEGV);
        return;
    }
    
    // Step 2: Check permissions
    if (!vma_permits(vma, current_access_type)) {
        // Might be COW or genuine protection violation
        if (is_cow_page(vma, faulting_addr)) {
            handle_cow(current, faulting_addr);
            return;
        }
        send_signal(current, SIGSEGV);
        return;
    }
    
    // Step 3: Walk page table, creating intermediate levels if needed
    pte_t *pte = walk_page_table_allocating(current->pgdir, faulting_addr);
    
    // Step 4: Determine source of page content
    if (vma_is_anonymous(vma)) {
        // Zero-fill page (anonymous memory like heap/stack)
        Page *page = allocate_zeroed_page();
        map_page(pte, page, vma->permissions);
    } else {
        // File-backed: read from file
        Page *page = allocate_page();
        read_page_from_file(vma->file, vma_offset(vma, faulting_addr), page);
        map_page(pte, page, vma->permissions);
    }
    
    // Page is now mapped, process can resume
}

Lazy Allocation at All Levels

Detection vs. Handling: The Critical Distinction

It's important to clearly distinguish detection from handling. This page focuses on detection—the remaining pages in this module cover handling.

Detection is a hardware operation:

Performed by the MMU
Takes nanoseconds
Results in an exception being raised
Saves minimal information (faulting address, fault type)

Handling is a software operation:

Performed by the OS kernel
Takes microseconds to milliseconds
Determines what to do about the fault
May involve complex policy decisions and I/O

The handoff between detection and handling:

Detection to Handling Transition
Step	Performed By	Time Scale	Action
Address translation	MMU hardware	nanoseconds	Walk page table, check valid bit
Fault detection	MMU hardware	nanoseconds	Valid bit = 0 detected
Exception generation	CPU hardware	nanoseconds	Save state, switch to kernel mode
Handler dispatch	CPU + OS	nanoseconds	Jump to page fault handler
Fault analysis	OS kernel	microseconds	Determine fault type and action
Page acquisition	OS kernel + I/O	milliseconds	Read page from disk if needed
Page table update	OS kernel	microseconds	Set valid bit, update mapping
Instruction restart	CPU hardware	nanoseconds	Resume faulting instruction

What detection provides to handling:

When the page fault handler begins execution, it has access to:

The faulting virtual address (from a special register like CR2)
The fault type (encoded in an error code)
The saved processor state (registers, flags, instruction pointer)
The access type (read, write, or execute)
The privilege level (user mode or kernel mode)

This information is everything the OS needs to begin its analysis. The hardware's job is done—it has detected the condition and preserved enough state for software to handle it.

Kernel Page Faults Are Different

Performance Implications of Fault Detection

The design of fault detection has significant performance implications. Understanding these helps in optimizing memory-intensive applications and in understanding system behavior.

The Zero-Overhead Principle:

Fault detection adds no overhead to the common case (page present). This is achieved because:

The valid bit is checked as part of the normal translation path.
TLB entries inherently encode validity—no separate check needed for TLB hits.
The exception path is only taken when truly necessary.

Overhead of Taking a Fault:

While detection itself is 'free,' taking a fault is expensive:

Mode switch: Transition from user to kernel mode
Pipeline flush: The CPU's instruction pipeline is emptied
Cache pollution: Kernel code may displace user data from caches
Context saving: Registers must be saved for later restoration
Handler execution: The fault handler must run

Even for a minor fault (no disk I/O), the overhead is typically 1,000-10,000 cycles.

Optimizing for Fault Detection Performance

•Prefaulting: Proactively touch pages before they're needed, spreading faults over time instead of bursting them at a hot spot.
•Huge pages: Fewer PTEs means fewer potential faults. A 2MB page replaces 512 4KB pages.
•Memory locking (mlock): Prevent specific pages from being evicted, eliminating major faults for critical data.
•NUMA-aware allocation: Keeping pages local to the CPU that accesses them reduces remote access delays after faults.
•madvise() hints: Tell the kernel your access patterns so it can prefetch intelligently.

Monitoring Page Faults

Summary: The Foundation of Virtual Memory

Page fault detection is the silent sentinel that enables virtual memory's fundamental promise: the illusion of unlimited, contiguous memory for each process. Let's consolidate the key concepts:

Key Takeaways

•The valid-invalid bit is the core detection mechanism—a single bit in each PTE that indicates whether the page is resident in physical memory.
•Detection occurs during address translation, before any memory access completes. This enables the fault to be handled and the instruction to be transparently restarted.
•The MMU performs detection as part of its normal operation. For present pages, there is zero overhead. For absent pages, an exception is raised.
•TLB entries only exist for valid pages, so TLB hits implicitly guarantee validity. Detection overhead only applies to TLB misses that discover invalid pages.
•Different fault types exist: minor faults (page in memory but not mapped), major faults (page on disk), invalid access faults (bug), and protection faults (permission denied or COW trigger).
•Multi-level page tables require validity checks at each level. A fault at any level triggers the same exception.
•Detection is hardware; handling is software. The hardware's job is to detect and preserve state. The OS's job is to analyze, remediate, and resume.

What's Next:

Page Complete

1 / 5