Page Fault Handling - Learning Module

Loading content...

0/227

Trap to OS

Crossing the Boundary: From User Code to Kernel Handler

The instant the MMU detects an invalid page access, a remarkable transformation occurs. The CPU, which was happily executing user code at high speed, must immediately and safely transfer control to the operating system kernel. This transfer—called a trap—is one of the most critical mechanisms in computer architecture.

The trap mechanism must satisfy seemingly contradictory requirements:

Speed: The transition must be fast—faults are expected to be handled quickly.
Safety: User code cannot be allowed to interfere with the kernel.
Completeness: Enough state must be preserved to resume execution later.
Atomicity: The transition must be indivisible—no race conditions or partial states.

This page explores the trap mechanism in exhaustive detail. You'll understand exactly what happens in the nanoseconds between fault detection and the first instruction of the page fault handler, and why this mechanism is fundamental to protected, multi-tasking operating systems.

What You Will Learn

By the end of this page, you will understand: (1) The trap mechanism and how it differs from other control transfers, (2) CPU state that must be saved during a trap, (3) The role of the Interrupt Descriptor Table (IDT) and exception vectors, (4) How the CPU switches to kernel mode and kernel stack, (5) The initial actions taken by the page fault handler entry point.

What is a Trap?

A trap is a synchronous, intentionally-triggered exception that transfers control to the operating system. It differs from other control flow mechanisms:

Comparisons:

Mechanism	Trigger	Timing	Return
Function call	Explicit `call` instruction	Synchronous	Returns to caller
Interrupt	External device signal	Asynchronous	Returns to interrupted instruction
Trap	Internal condition (syscall, fault)	Synchronous	May return to same instruction
Abort	Unrecoverable error	Synchronous	Does not return

Key characteristics of page fault traps:

Synchronous: The trap occurs as a direct result of the executing instruction, not some external event.
Precise: The processor state when the trap is taken corresponds exactly to having stopped before the faulting instruction completed.
Restartable: The faulting instruction can be re-executed after the fault is handled.
Privileged Transition: Control passes from user mode (ring 3) to kernel mode (ring 0) with elevated privileges.

Exception Classification in x86 Architecture
Category	Examples	Behavior	Use in Page Faults
Fault	Page fault, Divide by zero	Return to faulting instruction	This is a fault — instruction is restarted after handling
Trap	INT 3 (breakpoint), syscall	Return to next instruction	Not used for page faults
Abort	Machine check, Double fault	Cannot reliably return	Only if page fault handler itself faults

Terminology Clarification

Confusingly, 'trap' is used in two different ways: (1) As a general term for any exception that transfers control to the OS, and (2) As a specific exception category where the saved instruction pointer points to the next instruction. Page faults are technically 'faults' (returning to the same instruction) but the mechanism is commonly called 'trapping' to the OS.

CPU State Preservation: Saving the Critical Context

When a page fault occurs, the CPU must preserve enough state to later resume the faulting process as if nothing happened. This preservation is performed entirely by hardware—it's too fast and too critical to rely on software.

State Saved Automatically by Hardware:

On x86-64, the CPU pushes the following onto the kernel stack when a page fault occurs:

+------------------+
| SS               |  Stack Segment (if privilege change)
+------------------+
| RSP              |  Stack Pointer (if privilege change)
+------------------+
| RFLAGS           |  CPU flags register
+------------------+
| CS               |  Code Segment
+------------------+
| RIP              |  Instruction Pointer (address of faulting instruction)
+------------------+
| Error Code       |  Page fault specific info  ← Top of stack
+------------------+

This layout is dictated by the processor architecture and cannot be changed.

State NOT Saved Automatically:

General-purpose registers (RAX, RBX, RCX, etc.) are not saved by hardware. The page fault handler must save them if it needs to preserve them. This is typically done immediately upon handler entry.

exception_frame.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// x86-64 Exception Stack Frame
// This structure matches what the CPU pushes on page fault
 
struct ExceptionFrame {
    // Error code (pushed by CPU for certain exceptions including page faults)
    uint64_t error_code;
    
    // These are pushed by CPU for all exceptions
    uint64_t rip;     // Instruction pointer - points to faulting instruction
    uint64_t cs;      // Code segment selector
    uint64_t rflags;  // CPU flags (interrupt flag, direction flag, etc.)
    uint64_t rsp;     // Stack pointer (from user mode)
    uint64_t ss;      // Stack segment selector
};
 
// Page Fault Error Code Bits
#define PF_PRESENT   (1 << 0)  // 0 = not-present page, 1 = protection violation
#define PF_WRITE     (1 << 1)  // 0 = read access, 1 = write access
#define PF_USER      (1 << 2)  // 0 = supervisor mode, 1 = user mode
#define PF_RESERVED  (1 << 3)  // 1 = reserved bit set in page table entry
#define PF_INSTR     (1 << 4)  // 1 = instruction fetch (NX violation)
 
// Example: Decoding the error code
void decode_page_fault_error(uint64_t error_code) {
    printf("Page Fault Analysis:\n");
    printf("  %s page\n", (error_code & PF_PRESENT) ? "Protection violation on present" : "Non-present");
    printf("  %s access\n", (error_code & PF_WRITE) ? "Write" : "Read");
    printf("  %s mode\n", (error_code & PF_USER) ? "User" : "Supervisor");
    if (error_code & PF_RESERVED) printf("  Reserved bit violation\n");
    if (error_code & PF_INSTR) printf("  Instruction fetch (NX violation)\n");
}

Why RIP Points to the Faulting Instruction

The saved RIP points to the instruction that caused the fault, not the next instruction. This is essential for page fault handling: after the OS loads the page into memory, the CPU will retry the same instruction and this time it will succeed. This 'retry semantics' is what makes page faults transparent to the application.

The Interrupt Descriptor Table (IDT)

How does the CPU know where to transfer control when a page fault occurs? The answer is the Interrupt Descriptor Table (IDT)—a table of 256 entries established by the OS at boot time.

IDT Structure:

Each IDT entry (called a 'gate descriptor') contains:

The address of the handler function
The code segment selector for the handler
The privilege level required to invoke the handler
The gate type (interrupt gate, trap gate, etc.)
Optional Interrupt Stack Table (IST) index

Page Fault Vector:

Page faults are exception number 14 (0x0E). When a page fault occurs, the CPU:

Reads IDT entry 14
Loads CS and RIP from that entry
Begins executing the handler

The LIDT Instruction:

The OS tells the CPU where the IDT is located using the LIDT (Load IDT Register) instruction. This instruction is privileged—only kernel code can execute it.

idt_setup.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
// IDT Gate Descriptor (x86-64 format)
struct IDTGateDescriptor {
    uint16_t offset_low;      // Handler address, bits 0-15
    uint16_t segment;         // Code segment selector (typically kernel CS)
    uint8_t  ist;             // Interrupt Stack Table index (0 = no IST)
    uint8_t  type_attr;       // Gate type and attributes
    uint16_t offset_mid;      // Handler address, bits 16-31
    uint32_t offset_high;     // Handler address, bits 32-63
    uint32_t reserved;        // Reserved, must be zero
} __attribute__((packed));
 
// Gate types
#define IDT_INTERRUPT_GATE  0x8E  // Present, DPL=0, 64-bit interrupt gate
#define IDT_TRAP_GATE       0x8F  // Present, DPL=0, 64-bit trap gate
 
// Exception vector numbers
#define VECTOR_DIVIDE_ERROR     0
#define VECTOR_DEBUG            1
#define VECTOR_NMI              2
#define VECTOR_BREAKPOINT       3
#define VECTOR_PAGE_FAULT      14   // <-- Page fault handler lives here
#define VECTOR_GENERAL_PROTECTION 13
 
// IDT Register structure
struct IDTRegister {
    uint16_t limit;    // Size of IDT - 1
    uint64_t base;     // Linear address of IDT
} __attribute__((packed));
 
static struct IDTGateDescriptor idt[256];
static struct IDTRegister idtr;
 
// Set up a single IDT entry
void set_idt_gate(int vector, void (*handler)(void), uint8_t type) {
    uint64_t addr = (uint64_t)handler;
    
    idt[vector].offset_low  = addr & 0xFFFF;
    idt[vector].offset_mid  = (addr >> 16) & 0xFFFF;
    idt[vector].offset_high = (addr >> 32) & 0xFFFFFFFF;
    idt[vector].segment     = KERNEL_CS;  // Kernel code segment
    idt[vector].ist         = 0;          // No IST
    idt[vector].type_attr   = type;
    idt[vector].reserved    = 0;
}
 
// Initialize the IDT
void init_idt(void) {
    // Set up exception handlers
    set_idt_gate(VECTOR_PAGE_FAULT, page_fault_handler_entry, IDT_INTERRUPT_GATE);
    // ... other exception handlers ...
    
    // Load the IDT
    idtr.limit = sizeof(idt) - 1;
    idtr.base = (uint64_t)&idt;
    asm volatile("lidt %0" : : "m"(idtr));
}

Converting Mermaid diagram...

Interrupt Gate vs Trap Gate

Page fault handlers typically use an 'interrupt gate' rather than a 'trap gate'. The difference: interrupt gates automatically disable interrupts (clear IF flag) upon entry, preventing nested interrupts. This is important because the page fault handler must perform atomic operations on kernel data structures during early handling before it can safely re-enable interrupts.

Stack Switching: Entering the Kernel Safely

One of the most critical aspects of the trap mechanism is stack switching. When a page fault occurs in user mode, the CPU cannot continue using the user stack—it's untrusted and potentially compromised. The CPU must switch to a kernel stack.

Why Stack Switching is Essential:

Security: The user stack is in user-writable memory. Malicious code could manipulate it.
Safety: The user stack might itself be inaccessible (imagine a page fault while pushing to the stack).
Isolation: Kernel operations shouldn't depend on user memory state.

How Stack Switching Works (x86-64):

The CPU uses the Task State Segment (TSS) to find the kernel stack. Each CPU core has a TSS that contains:

RSP0: The kernel stack pointer for privilege level 0
RSP1, RSP2: Stack pointers for levels 1 and 2 (rarely used in modern OS)
IST1-IST7: Interrupt Stack Table pointers for special exceptions

When a page fault occurs in user mode:

CPU reads RSP0 from the current TSS
CPU switches to this kernel stack
CPU pushes the interrupted state (SS, RSP, RFLAGS, CS, RIP, error code)
CPU begins executing the handler

The kernel stack is already set up by the OS when the process was scheduled. Each thread typically has its own kernel stack.

tss_setup.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// Task State Segment (x86-64 format, relevant portions)
struct TaskStateSegment {
    uint32_t reserved0;
    
    // Stack pointers for privilege level transitions
    uint64_t rsp0;    // Kernel stack pointer (used on page faults from user mode)
    uint64_t rsp1;    // Not used in modern OS
    uint64_t rsp2;    // Not used in modern OS
    
    uint64_t reserved1;
    
    // Interrupt Stack Table pointers (for special exceptions)
    uint64_t ist1;    // E.g., double fault stack
    uint64_t ist2;    // E.g., NMI stack
    uint64_t ist3;
    uint64_t ist4;
    uint64_t ist5;
    uint64_t ist6;
    uint64_t ist7;
    
    uint64_t reserved2;
    uint16_t reserved3;
    uint16_t io_map_base;
} __attribute__((packed));
 
static struct TaskStateSegment tss[MAX_CPUS];
 
// Set up the kernel stack for the current CPU
void set_kernel_stack(int cpu_id, void *stack_top) {
    tss[cpu_id].rsp0 = (uint64_t)stack_top;
}
 
// During context switch, update the TSS with new thread's kernel stack
void context_switch_to(Thread *new_thread) {
    int cpu = get_current_cpu();
    
    // Set up kernel stack for new thread
    // If new thread triggers page fault, CPU will use this stack
    set_kernel_stack(cpu, new_thread->kernel_stack_top);
    
    // ... rest of context switch ...
}
 
// Each thread has its own kernel stack
#define KERNEL_STACK_SIZE 16384  // 16 KB typical
 
Thread *create_thread(void (*entry)(void)) {
    Thread *t = allocate_thread_struct();
    
    // Allocate kernel stack for this thread
    t->kernel_stack = allocate_pages(KERNEL_STACK_SIZE / PAGE_SIZE);
    t->kernel_stack_top = t->kernel_stack + KERNEL_STACK_SIZE;
    
    return t;
}

Why Each Thread Has Its Own Kernel Stack

Each thread needs its own kernel stack because a thread might be in the middle of a system call when it's preempted. The kernel stack holds the return path back to user space. With per-thread kernel stacks, preemption and resumption work correctly even when threads are executing kernel code.

Privilege Level Transition: Entering Ring 0

The transition from user mode to kernel mode involves changing the CPU's privilege level. On x86, this is represented by the Current Privilege Level (CPL), stored in the low two bits of the CS register.

Privilege Levels (Rings):

Ring	CPL	Usage	Capabilities
Ring 0	0	Kernel	Full access to all instructions and memory
Ring 1	1	(Historical)	Device drivers (rarely used today)
Ring 2	2	(Historical)	Device drivers (rarely used today)
Ring 3	3	User Mode	Restricted access, no privileged instructions

What Changes During Privilege Transition:

CPL: Changes from 3 to 0 (loaded from IDT gate's CS selector)
Stack: Switches from user stack to kernel stack (from TSS)
Instruction Restrictions: Privileged instructions become available
Memory Access: Kernel-only pages become accessible
Interrupt State: Interrupts may be disabled (interrupt gate)

Hardware Enforcement:

The privilege transition is enforced entirely by hardware. User code cannot:

Modify CS directly to change CPL
Jump directly into kernel code
Access kernel memory pages (marked supervisor-only)

The only way to enter kernel mode is through designated entry points in the IDT.

State Changes During User → Kernel Transition
Aspect	Before (User Mode)	After (Kernel Mode)
CPL	3 (user)	0 (supervisor)
Code Segment	User CS (e.g., 0x2B)	Kernel CS (e.g., 0x08)
Stack	User stack in user memory	Kernel stack in kernel memory
Privileged Ops	Cause #GP exception	Execute normally
Kernel Memory	Access causes #PF	Accessible
I/O Ports	Controlled by IOPL	Full access
Interrupts	As before	Disabled (interrupt gate)

The Security Guarantee

The privilege transition mechanism is the foundation of operating system security. No matter how clever user code is, it cannot bypass this transition. Every entry into kernel mode goes through hardware-controlled gates with well-defined semantics. This is why operating systems can safely run untrusted code—the hardware enforces the boundary.

The Handler Entry Point: First Kernel Instructions

When the CPU begins executing the page fault handler, it lands at an entry point—typically a small piece of assembly code that completes the state saving and then calls a C handler function.

Entry Point Responsibilities:

Save remaining registers: The CPU only saved the interrupt frame. General-purpose registers must be saved by software.
Set up kernel data segment: Ensure DS, ES are set to kernel data selectors.
Build a canonical stack frame: Create a well-defined structure that C code can use.
Read CR2: Capture the faulting address before it could potentially be overwritten.
Call C handler: Invoke the main handler with pointers to saved state.
On return: Restore registers and use iretq to return.

page_fault_entry.S
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Page Fault Handler Entry Point (x86-64 Assembly)
# This is the actual entry point registered in the IDT
 
.global page_fault_handler_entry
.type page_fault_handler_entry, @function
 
page_fault_handler_entry:
    # At this point, CPU has already:
    # - Switched to kernel stack
    # - Pushed SS, RSP, RFLAGS, CS, RIP, error_code
    # - Cleared IF (interrupts disabled)
    
    # Save all general-purpose registers (build trap frame)
    pushq %r15
    pushq %r14
    pushq %r13
    pushq %r12
    pushq %r11
    pushq %r10
    pushq %r9
    pushq %r8
    pushq %rbp
    pushq %rdi
    pushq %rsi
    pushq %rdx
    pushq %rcx
    pushq %rbx
    pushq %rax
    
    # Read CR2 (faulting address) BEFORE it could be overwritten
    # by a nested page fault (shouldn't happen, but be safe)
    movq %cr2, %r12    # Save in callee-saved register
    
    # Set up kernel data segments
    movw $KERNEL_DS, %ax
    movw %ax, %ds
    movw %ax, %es
    
    # Call C handler
    # Arguments: rdi = pointer to trap frame, rsi = faulting address
    movq %rsp, %rdi    # First arg: trap frame pointer
    movq %r12, %rsi    # Second arg: faulting address (CR2)
    call page_fault_handler   # C function
    
    # Handler returns here (if fault was handled successfully)
    
    # Restore general-purpose registers
    popq %rax
    popq %rbx
    popq %rcx
    popq %rdx
    popq %rsi
    popq %rdi
    popq %rbp
    popq %r8
    popq %r9
    popq %r10
    popq %r11
    popq %r12
    popq %r13
    popq %r14
    popq %r15
    
    # Skip error code (it was pushed by CPU, we need to remove it)
    addq $8, %rsp
    
    # Return from interrupt
    # This restores RIP, CS, RFLAGS, RSP, SS from stack
    # and returns to user mode (or kernel mode if fault was there)
    iretq

Reading CR2 Early is Critical

The faulting address is stored in CR2, but CR2 can be overwritten by subsequent page faults. While nested page faults during handler entry are unusual (the handler code should be resident), defensive programming dictates reading CR2 as early as possible and saving it in a register or on the stack. Some architectures push the faulting address on the stack automatically, avoiding this issue.

The Full Trap Frame: Complete Saved State

After the entry point completes its register saves, the kernel stack contains a complete trap frame—a snapshot of the CPU's state at the moment of the page fault. This frame is essential for:

Fault analysis: The error code and faulting address inform the handler.
State restoration: When returning, all registers are restored.
Debugging: If the fault is fatal, the trap frame provides diagnostic information.
Context switching: If the handler decides to switch to another process, this frame is saved with the process.

Linux's pt_regs Structure:

Linux defines the trap frame as struct pt_regs:

pt_regs.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Linux x86-64 pt_regs structure (simplified)
// This represents the complete trap frame on the kernel stack
 
struct pt_regs {
    // Saved by software (entry point assembly)
    unsigned long r15;
    unsigned long r14;
    unsigned long r13;
    unsigned long r12;
    unsigned long rbp;
    unsigned long rbx;
    
    // Saved by software, arguments or scratch
    unsigned long r11;
    unsigned long r10;
    unsigned long r9;
    unsigned long r8;
    unsigned long rax;
    unsigned long rcx;
    unsigned long rdx;
    unsigned long rsi;
    unsigned long rdi;
    
    // Exception vector number (for generic handler dispatch)
    unsigned long orig_rax;
    
    // Saved by hardware (CPU pushes these)
    unsigned long rip;       // Faulting instruction address
    unsigned long cs;        // Code segment (includes CPL)
    unsigned long eflags;    // CPU flags
    unsigned long rsp;       // User stack pointer
    unsigned long ss;        // Stack segment
};
 
// For page faults, error code is accessed separately
struct page_fault_info {
    unsigned long error_code;  // Page fault specific error code
    unsigned long cr2;         // Faulting virtual address
    struct pt_regs *regs;      // Pointer to saved registers
};
 
// The C page fault handler receives this information
void page_fault_handler(struct pt_regs *regs, unsigned long address) {
    unsigned long error_code = get_error_code(regs);
    
    // Now we have everything needed to analyze and handle the fault:
    // - address: the virtual address that faulted
    // - error_code: why it faulted (present?, write?, user?)
    // - regs->rip: what instruction caused the fault
    // - regs->{all registers}: complete CPU state
}

What the Trap Frame Enables

•Transparent Resume: After handling, iretq restores everything—the process never knows a fault occurred.
•Debugging: regs->rip points to the exact instruction that faulted. Stack traces can be built from regs->rbp.
•Signal Delivery: If the fault should become a signal (SIGSEGV), the trap frame provides the context for the signal handler.
•Core Dumps: Fatal faults can trigger core dumps with full register state from the trap frame.
•Auditing: Security logging can capture exact fault circumstances for forensic analysis.

Nested and Double Faults

What happens if a page fault occurs while handling a page fault? Or worse, what if the page fault handler itself causes another page fault? These scenarios require special handling.

Controllable Nested Faults:

Some page faults during kernel execution are intentional and expected:

Accessing user memory: The kernel might copy data from a user buffer. If that buffer isn't resident, a page fault occurs. This is normal and handled like any other page fault.
Demand-paged kernel modules: Some systems demand-page kernel modules. Page faults can occur accessing module code.

These are handled by re-entering the page fault handler, which works fine as long as the original fault's state is properly preserved.

Problematic Nested Faults:

Faulting in critical handler code: If the page fault handler itself isn't in memory, we have a problem.
Faulting on the kernel stack: If pushing to the kernel stack causes a fault, we can't save state.
Infinite recursion: A bug that causes the handler to repeatedly fault.

Double Fault (x86):

When certain exception combinations occur (e.g., couldn't push to stack during exception handling), the CPU generates a double fault (exception 8). The double fault handler uses its own dedicated stack (from the TSS's IST entries) that is guaranteed to be valid.

If the double fault handler faults, a triple fault occurs, which resets the CPU—the equivalent of a forced reboot.

Fault Escalation Scenarios
Scenario	Result	Recovery
Page fault on user memory access	Normal re-entrant handling	Handled, continues
Page fault on kernel stack during trap	Double fault	Use IST stack, likely kernel panic
Fault in double fault handler	Triple fault	CPU reset (reboot)
NMI during page fault handling	NMI takes priority, then returns	IST stack for NMI ensures safety

The Triple Fault: System Reset

A triple fault is unrecoverable—the CPU has no more fallback positions. This is why operating systems are extremely careful about kernel stack validity, handler code residency, and avoiding recursive fault scenarios. Any driver bug that corrupts the kernel stack can lead straight to triple fault and unexpected reboot.

Architecture Comparisons: ARM and RISC-V

While we've focused on x86-64, other architectures implement similar trap mechanisms with different details.

ARM AArch64:

ARM uses a different terminology and structure:

Exception Levels (EL): EL0 (user), EL1 (kernel), EL2 (hypervisor), EL3 (secure monitor)
Exception Vectors: Located at a configurable base address (VBAR_EL1)
Syndrome Register (ESR): Encodes exception type and details
Fault Address Register (FAR): Contains the faulting address (like x86's CR2)

When a page fault (called a Translation Fault or Permission Fault) occurs:

CPU switches to EL1 (or higher)
Saved state goes into ELR_EL1 (return address), SPSR_EL1 (saved status)
Execution begins at the appropriate vector offset

RISC-V:

RISC-V takes a minimalist approach:

Privilege Modes: U (user), S (supervisor), M (machine)
CSRs: Control and Status Registers hold exception info
- scause: Exception cause code
- stval: Faulting address or instruction
- sepc: Exception program counter (return address)
- stvec: Trap vector base address

Page faults are classified as Load Page Fault (code 13), Store Page Fault (code 15), or Instruction Page Fault (code 12).

Despite architectural differences, all systems share the core concepts: save state, identify the fault, transfer to handler, eventually restore and return.

Trap Mechanism Comparison Across Architectures
Aspect	x86-64	ARM AArch64	RISC-V
Vector table name	IDT	Exception Vector Table	Trap Vector (stvec)
Faulting address register	CR2	FAR_EL1	stval
Cause/type register	Error code on stack	ESR_EL1	scause
Return address	RIP on stack	ELR_EL1	sepc
Saved flags/status	RFLAGS on stack	SPSR_EL1	sstatus (partial)
Privilege levels	Ring 0-3 (CPL)	EL0-EL3	U/S/M modes
Stack switching	TSS → RSP0	SP_ELn per level	Software managed

Conceptual Unity, Implementation Diversity

While the specifics differ, the fundamental concepts are universal: the hardware must detect the fault, save sufficient state for later restoration, identify the fault type and address, and transfer control to a predefined handler location. Understanding these concepts on one architecture transfers readily to others.

Summary: The Bridge to Kernel Space

The trap mechanism is the precisely-engineered bridge between user and kernel space. When a page fault is detected, this mechanism ensures a safe, complete, and reversible transfer of control. Let's consolidate the key concepts:

Key Takeaways

•A trap is a synchronous exception that transfers control to the OS in response to an instruction's execution (here, a memory access that faults).
•The CPU automatically saves critical state: instruction pointer, stack pointer, flags, and segment registers. The saved RIP points to the faulting instruction for retry.
•The IDT maps exception numbers to handlers. Page faults are exception 14, and the IDT entry specifies the handler address and privilege level.
•Stack switching protects the kernel. The CPU loads the kernel stack pointer from the TSS, ensuring user code can't corrupt kernel stack operations.
•Privilege transition is hardware-enforced. Moving from ring 3 to ring 0 is only possible through designated IDT gates, never by direct manipulation.
•The entry point completes state saving. Assembly code saves general-purpose registers and reads CR2 before calling the C handler.
•The trap frame captures complete CPU state. This enables transparent resume, debugging, signal delivery, and core dumps.
•Double and triple faults handle escalation. When fault handling itself fails, special handlers or CPU reset provide last-resort responses.

What's Next:

With control now in the page fault handler and complete state information available, the OS must determine what action to take. The next page explores Find Page on Disk—how the OS determines where the page's content resides and initiates the I/O to retrieve it.

Page Complete

You now understand the trap mechanism that transfers control from a faulting instruction to the kernel's page fault handler. This precisely-engineered hardware/software handoff is the critical path that enables virtual memory, process isolation, and protected operating systems. Next, we'll explore how the OS locates the page content that must be loaded.