Page Tables - Learning Module

Loading content...

0/227

Page Table Location

Finding the Map to Find the Territory

Every virtual memory access requires translating through the page table—but the page table itself must be stored somewhere in memory. This creates a fascinating bootstrapping question: How do we access the page table if we need the page table to access memory?

The answer lies in hardwired CPU registers, carefully designed memory layouts, and a fundamental asymmetry: page tables are stored at physical addresses that the CPU can access directly, bypassing the translation mechanism entirely for this specific purpose.

Understanding page table location is essential for OS kernel development, hypervisor implementation, and deep performance optimization. It connects abstract virtual memory concepts to concrete hardware mechanisms.

What You Will Learn

By the end of this page, you will understand CR3 and equivalent registers, how page tables are allocated and managed in kernel memory, the relationship between page table physical addresses and virtual mappings, and strategies for efficient page table memory management.

The Page Table Base Register

Every architecture has a dedicated register that holds the physical address of the root page table. The MMU reads this register on each memory access (or TLB miss) to begin the page table walk.

Architecture-Specific Registers:

Page Table Base Registers Across Architectures
Architecture	Register	Width	Contents	Additional Features
x86 (32-bit)	CR3	32 bits	Page Directory physical address	PWT, PCD flags for caching
x86-64	CR3	64 bits	PML4 physical address (bits 12-51)	PCID (12 bits) if CR4.PCIDE=1
ARM AArch64	TTBR0_EL1 / TTBR1_EL1	64 bits	Translation table base	Separate tables for user/kernel
RISC-V	SATP	64 bits (Sv39/48)	Sv39: mode + ASID + PPN	Mode selects page table format
MIPS	EntryHi/Context	32/64 bits	Part of TLB management	Software-managed TLB

cr3_register.txt
x86-64 CR3 Register Format:
 
Without PCID (CR4.PCIDE = 0):
┌────────────────────────────────────────────────────────────────┐
│ 63       52 │ 51                                  12 │ 11    0 │
├─────────────┼────────────────────────────────────────┼─────────┤
│   Reserved  │     PML4 Table Physical Address       │ Flags   │
│    (MBZ)    │        (40 bits = 1TB addressable)    │PWT│PCD│ │
└─────────────┴────────────────────────────────────────┴─────────┘
 
With PCID (CR4.PCIDE = 1):
┌────────────────────────────────────────────────────────────────┐
│ 63       52 │ 51                                  12 │ 11    0 │
├─────────────┼────────────────────────────────────────┼─────────┤
│   Reserved  │     PML4 Table Physical Address       │  PCID   │
│    (MBZ)    │        (40 bits)                      │(12 bits)│
└─────────────┴────────────────────────────────────────┴─────────┘
 
Key Points:
• Bits 0-11 contain flags or PCID (not part of address)
• PML4 must be 4KB-aligned (bits 0-11 would be 0 in address anyway)
• Hardware uses bits 12-51 as the physical address
• Writing to CR3 triggers TLB flush (unless PCID is used and noflush bit set)
 
Example:
  CR3 = 0x00000001234000  (PCID disabled)
  
  This means:
  - PML4 table is at physical address 0x1234000
  - That's 19,144,704 in decimal, about 18MB into physical memory
  - The PML4 table occupies physical addresses 0x1234000 - 0x1234FFF (4KB)

Critical Property: Physical Address

The value in CR3 is a physical address, not a virtual address. This is essential—if it were virtual, we'd need a page table to translate CR3 to find the page table, creating infinite regress.

The MMU accesses CR3's contents directly through the memory bus, bypassing its own translation logic for this one read. All subsequent page table entries also contain physical addresses for the same reason.

ARM's Split Design

ARM AArch64 uses two separate base registers: TTBR0_EL1 for user space (lower addresses) and TTBR1_EL1 for kernel space (upper addresses). This allows completely independent user and kernel page tables without the upper-level entries needing to match. It's the hardware foundation for KPTI-like separation.

Page Table Memory Allocation

Page tables are themselves allocated from physical memory. The kernel must manage this allocation carefully—page tables can't be swapped (that would make accessing swap impossible!), and their physical addresses must be known.

Key Requirements:

Physical Contiguity: Each page table level must be contiguous in physical memory (typically one 4KB page)
Alignment: Tables must be page-aligned (address divisible by page size)
No Swapping: Page tables must be pinned—can't be paged out
Fast Allocation: Creating processes requires allocating many tables quickly
Physical Address Tracking: Kernel must know physical addresses to populate entries

page_table_alloc.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
/* Linux kernel page table allocation (simplified) */
 
/* Allocate a single page table (one page of PTEs) */
pte_t *alloc_pte_table(void) {
    struct page *page;
    pte_t *pte;
    
    /* Allocate from kernel's page allocator
     * GFP_KERNEL: Can sleep, normal allocation
     * __GFP_ZERO: Zero the page (security: don't leak old PTEs)
     */
    page = alloc_page(GFP_KERNEL | __GFP_ZERO);
    if (!page)
        return NULL;
    
    /* Get virtual address for kernel to manipulate */
    pte = (pte_t *)page_address(page);
    
    /* Track that this page is a page table */
    page->flags |= PG_table;
    
    return pte;
}
 
/* Free a page table */
void free_pte_table(pte_t *pte) {
    struct page *page = virt_to_page(pte);
    free_page(page);
}
 
/* Create complete page table hierarchy for a new process */
pgd_t *create_page_table(void) {
    pgd_t *pgd;
    
    /* Allocate top-level table (PGD/PML4) */
    pgd = (pgd_t *)alloc_page(GFP_KERNEL | __GFP_ZERO);
    if (!pgd)
        return NULL;
    
    /* Copy kernel mappings from init_mm's page table
     * All processes share the same kernel mapping
     */
    memcpy(pgd + KERNEL_PGD_BOUNDARY, 
           init_mm.pgd + KERNEL_PGD_BOUNDARY,
           (PTRS_PER_PGD - KERNEL_PGD_BOUNDARY) * sizeof(pgd_t));
    
    /* User portion starts empty - will be populated on demand */
    
    return pgd;
}
 
/* Get physical address for CR3 */
unsigned long pgd_to_cr3(pgd_t *pgd) {
    /* Convert kernel virtual address to physical address */
    return __pa(pgd);  /* __pa = physical address of */
}
 
/* Switch to a different page table */
void switch_page_table(struct mm_struct *mm) {
    unsigned long cr3_value;
    
    cr3_value = pgd_to_cr3(mm->pgd);
    
    /* On PCID systems, include PCID and possibly noflush */
    if (cpu_has_pcid) {
        cr3_value |= mm->context.pcid;
        if (can_skip_flush)
            cr3_value |= X86_CR3_PCID_NOFLUSH;
    }
    
    write_cr3(cr3_value);  /* Atomic write to CR3 */
}

Memory Pool for Page Tables:

Linux uses the regular page allocator for page tables, but some systems maintain a dedicated pool:

Faster allocation: Pre-allocated pages avoid allocator overhead
NUMA awareness: Allocate tables on same node as process
Physical memory coloring: Reduce cache conflicts
Zeroing policy: Pre-zero tables in background for security

Page tables collectively can consume significant memory. A process with 4GB of mapped memory might need:

1 PML4 (4KB)
Up to 8 PDPTs (32KB)
Up to 4,096 PDs (16MB)
Up to 1,048,576 PTs (4GB)

In practice, due to multi-level sparsity, overhead is typically 0.1-1% of mapped size.

Page Tables Can't Be Swapped

Page tables must remain in physical memory. If we swapped out a page table, we'd need the page table to find it in swap—circular dependency. This makes page table memory overhead a 'real' cost that can't be reclaimed under memory pressure, unlike user pages which can be evicted.

Virtual Mapping of Page Tables

While page table entries contain physical addresses, the kernel often needs to manipulate page tables using virtual addresses. This creates an interesting requirement: the kernel must have a way to access any physical page through virtual memory.

Common Approaches:

1. Direct Mapping (Linux, FreeBSD): The kernel maintains a direct (identity-like) mapping of all physical memory:

Virtual = Physical + PAGE_OFFSET

For example, physical address 0x1234000 might be accessible at virtual address 0xFFFF888001234000. Any physical address can be accessed by adding a constant.

direct_mapping.txt
Linux x86-64 Virtual Memory Layout (simplified):
 
Virtual Address                           Purpose
─────────────────────────────────────────────────────────────────────
0xFFFFFFFFFFFFFFFF  ┌────────────────┐
                    │  Unused/Guard  │
0xFFFFFFFF80000000  ├────────────────┤
                    │  Kernel Text   │  Kernel code, loaded here
                    │  (.text)       │
0xFFFFFFFF00000000  ├────────────────┤
                    │  Kernel Module │  Loadable modules
                    │  Space         │
0xFFFFFFFE80000000  ├────────────────┤
                    │  ...           │
0xFFFF888000000000  ├────────────────┤
                    │   DIRECT MAP   │  ← All of physical memory mapped here!
                    │   (up to 64TB) │
                    │                │  Virtual = Physical + 0xFFFF888000000000
0xFFFF880000000000  ├────────────────┤
                    │  ...vmalloc... │
0x0000800000000000  ├────────────────┤  ← Non-canonical hole
0x00007FFFFFFFFFFF  ├────────────────┤
                    │   User Space   │  Per-process user mappings
                    │  (~128 TB)     │
0x0000000000000000  └────────────────┘
 
Page Table Access via Direct Map:
  
  pgd_t *pgd = current->mm->pgd;  // Virtual address (in direct map)
  
  // To get physical address for CR3:
  phys_addr_t pgd_phys = __pa(pgd);  // = pgd - PAGE_OFFSET
  
  // To access a physical address as virtual:
  void *virt = __va(phys);  // = phys + PAGE_OFFSET

2. Temporary Mappings (if no direct map): On systems without full direct mapping (e.g., 32-bit with >4GB RAM):

void *kmap(struct page *page);     // Create temporary mapping
void kunmap(struct page *page);    // Remove mapping

3. Recursive Page Table Mapping: A clever trick: make one entry in the top-level table point to itself!

PML4[self_entry] = physical_address_of_PML4

This creates a virtual address range where the MMU's table walk equals the tables themselves.

Why Direct Mapping is Preferred:

No mapping/unmapping overhead
Any physical address immediately accessible
Virtual address = Physical + constant (simple arithmetic)
PTE writes don't require special setup
Scales well to large physical memory

Recursive Mapping Elegance

With recursive mapping, accessing virtual address 0xFFFFFFFFFFFFF000 (with self-entry at index 511) causes the MMU to use PML4[511]→PML4[511]→PML4[511]→PML4[511]→PML4[0], giving you access to the first entry of your own PML4! This was common in educational OS implementations.

Kernel vs User Page Tables

Most operating systems map the kernel into every process's virtual address space. This raises important questions about layout, sharing, and the isolation required after attacks like Meltdown.

The Traditional Model (pre-KPTI):

Each process has one page table. The upper portion (kernel space) is identical across all processes—same virtual addresses map to same physical frames. The lower portion (user space) is unique per process.

kernel_user_split.txt
Traditional Virtual Address Space Layout:
 
Process A's Page Table:              Process B's Page Table:
┌────────────────────────┐           ┌────────────────────────┐
│ Kernel Space (shared)  │   ====    │ Kernel Space (shared)  │
│ PML4[256-511] →        │ same PTs  │ PML4[256-511] →        │
│ (all point to same     │           │ (all point to same     │
│  physical kernel)      │           │  physical kernel)      │
├────────────────────────┤           ├────────────────────────┤
│ User Space (unique)    │           │ User Space (unique)    │
│ PML4[0-255] →          │ separate  │ PML4[0-255] →          │
│ (Process A's mappings) │   PTs     │ (Process B's mappings) │
└────────────────────────┘           └────────────────────────┘
 
The PML4 itself is per-process, but entries 256-511 are copies 
that point to the SAME physical kernel page tables.
 
Context Switch (A → B):
1. Save A's state
2. write_cr3(B's PML4 physical address)
3. TLB flushed (non-global entries)
4. Kernel entries may be marked Global (stay in TLB)
5. Resume B's execution

Kernel Page Table Isolation (KPTI):

After the Meltdown attack (2018), operating systems implemented KPTI. The idea: when running user code, the kernel pages should be completely unmapped, not just marked supervisor-only.

KPTI Model:

Two page tables per process:
- User page table: User mappings + minimal kernel (entry/exit trampolines)
- Kernel page table: User mappings + full kernel
Switching on syscall/interrupt:
- User → Kernel: Switch to kernel page table (write CR3)
- Kernel → User: Switch to user page table (write CR3)
Cost: Extra CR3 writes, potential TLB flushes
Mitigation: PCID allows keeping both tables' entries in TLB

Traditional vs KPTI Page Table Layout
Aspect	Traditional	KPTI
Tables per process	1	2 (user table + kernel table)
Kernel visible when in user mode	Yes (U/S=0)	No (not mapped)
CR3 changes on syscall	No	Yes
Meltdown vulnerable	Yes	No
Performance impact	Baseline	1-5% overhead typical

ARM's Inherent Separation

ARM's dual TTBR design (TTBR0 for user, TTBR1 for kernel) naturally supports KPTI-like separation without duplicating full page tables. The kernel simply doesn't map its pages in TTBR0. This is one reason ARM systems were less affected by Meltdown-class vulnerabilities.

CR3 and Context Switching

When the operating system switches from one process to another, it changes CR3 to point to the new process's page table. This seemingly simple operation has profound implications for performance and correctness.

What Happens on CR3 Write:

TLB Flush: By default, all non-global TLB entries are invalidated
Pipeline drain: CPU may need to wait for in-flight memory ops
New translations: Subsequent accesses use new page table
Global entries persist: Entries marked Global (like kernel) stay

context_switch_cr3.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
/* Linux context switch (simplified from arch/x86/mm/tlb.c) */
 
void switch_mm(struct mm_struct *prev, struct mm_struct *next,
               struct task_struct *tsk) {
    unsigned long cr3;
    
    if (prev == next)
        return;  /* Same address space, no switch needed */
    
    /* Prepare CR3 value */
    cr3 = __sme_pa(next->pgd);  /* Physical address of new PGD */
    
    if (static_cpu_has(X86_FEATURE_PCID)) {
        /* With PCID: tag entries instead of flushing */
        u16 pcid = next->context.ctx_id;
        
        if (next->context.is_fresh) {
            /* First time using this PCID - must flush old entries */
            cr3 |= pcid;
        } else {
            /* Reusing PCID - may have cached entries */
            cr3 |= pcid | X86_CR3_PCID_NOFLUSH;
        }
    }
    
    /* The actual switch */
    write_cr3(cr3);
    
    /* Track that we're now in a different address space */
    this_cpu_write(cpu_tlbstate.loaded_mm, next);
    this_cpu_write(cpu_tlbstate.loaded_mm_asid, next->context.ctx_id);
}
 
/*
 * Cost analysis of CR3 write:
 *
 * WITHOUT PCID:
 *   - ~150-300 cycles for CR3 write itself
 *   - TLB completely flushed (except global entries)
 *   - First memory accesses after switch cause TLB misses
 *   - Each miss: 4 memory accesses (4-level page walk)
 *   - Effective cost can be thousands of cycles
 *
 * WITH PCID (and NOFLUSH):
 *   - ~50-100 cycles for CR3 write
 *   - TLB entries from previous use of this PCID may be valid!
 *   - Dramatically reduced miss rate after switch
 *   - Only works if cycling through limited processes
 */

PCID Strategy:

With only 4096 PCIDs available, the OS must manage them carefully:

Simple rotation: Assign PCIDs 0, 1, 2, ... to new processes. When exhausted, reuse from 0 with a global flush.
Per-CPU pools: Each CPU tracks which PCIDs are in use. Context switches on same CPU can reuse.
Priority-based: Hot processes (high-priority, frequently scheduled) get dedicated PCIDs.
Lazy invalidation: Don't flush immediately on free; just mark stale. Flush when PCID is reassigned.

Kernel/User Transitions with KPTI:

With KPTI, CR3 switches happen not just on context switch but on every syscall:

User calls read():
  SYSCALL instruction
  Trampoline code in user page table
  Switch CR3 to kernel page table
  Execute sys_read() in kernel
  Switch CR3 back to user page table
  SYSRET to user

This doubles CR3 switches, making PCID even more critical for performance.

Lazy TLB Mode

Kernel threads often don't need user-space mappings. Linux uses 'lazy TLB' mode: when a CPU is running kernel-only work, it doesn't switch CR3 to the new process's page table. It uses the previous process's tables but only accesses kernel addresses. This saves TLB flush overhead.

Page Table Memory Layout

Page tables are allocated from kernel memory, but their layout matters for performance. Cache effects, NUMA placement, and memory fragmentation all impact translation speed.

Cache Considerations:

A 4-level page table walk with cold cache requires:

4 memory accesses (one per level)
Each access potentially misses L1/L2/L3 cache
Worst case: 4 × 100+ cycles = 400+ cycles

To improve cache behavior:

Temporal locality: Recently-used PTEs stay hot in cache
Spatial locality: Entries for adjacent pages are in same cache line
Prefetching: Some hardware prefetches adjacent PTEs during walk

pte_cache_behavior.txt
Page Table Cache Behavior Analysis:
 
4KB Page Table Structure:
┌──────────────────────────────────────────────────┐
│ PTE[0]   PTE[1]   PTE[2]   ...   PTE[7]         │ ← Cache line 0 (64 bytes, 8 PTEs)
├──────────────────────────────────────────────────┤
│ PTE[8]   PTE[9]   PTE[10]  ...   PTE[15]        │ ← Cache line 1
├──────────────────────────────────────────────────┤
│ ...                                              │
├──────────────────────────────────────────────────┤
│ PTE[504] PTE[505] PTE[506] ...   PTE[511]       │ ← Cache line 63
└──────────────────────────────────────────────────┘
 
Each cache line holds 8 PTEs (8 bytes × 8 = 64 bytes).
Each PTE covers one 4KB page.
Each cache line covers 8 × 4KB = 32KB of virtual addresses.
 
Implications:
• Accessing addresses within 32KB likely hits same PT cache line
• Sequential access benefits from prefetching/spatial locality
• Random access across large ranges thrashes PT cache entries
 
NUMA Considerations:
• Page tables for a process should be on same NUMA node as process
• Poor placement: PT on node 0, data on node 1 → every access crosses interconnect
• Linux: MPOL_BIND or numactl can control placement

Huge Page Impact on Page Tables:

Using 2MB or 1GB pages dramatically reduces page table depth and entry count:

Page Size	Address Bits	PT Levels	Entries per GB
4KB	12	4	262,144
2MB	21	3	512
1GB	30	2	1

With 2MB pages:

One fewer level to walk (skip PT level)
PD entry directly contains the 2MB frame number
512× fewer entries for the same memory

Page Table Fragmentation:

Over time, physical memory for page tables can become fragmented:

Many small tables scattered across physical memory
Can't allocate contiguous ranges for huge pages
Solutions: Compact page tables, balloon allocation, THP

Page Table Memory Pressure

Under extreme memory pressure, page table memory becomes a real constraint. A fork-heavy workload might create thousands of processes, each needing ~20KB minimum for page tables. With 10,000 processes, that's 200MB just for page tables. The kernel must track and potentially limit page table memory consumption.

Bootstrapping Page Tables

When a computer boots, paging is initially disabled (CR0.PG=0). The CPU runs in real mode or protected mode without paging, accessing physical addresses directly. The bootloader and early kernel must set up initial page tables before enabling paging.

Boot Sequence (x86-64 Linux, simplified):

Page Table Bootstrap Sequence

•BIOS/UEFI runs in real mode, loads bootloader
•Bootloader loads compressed kernel, runs in protected mode (32-bit, no paging)
•Decompressor runs, may enable basic paging to decompress to high addresses
•Early kernel builds initial page tables at compile-time known physical addresses
•Enable paging: Set CR3 to initial tables, set CR0.PG=1
•Jump to high addresses: Now running at kernel virtual addresses
•Build final page tables: Map all of physical memory, kernel, etc.
•Normal operation: Page table creation for processes follows standard allocation

early_page_tables.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Linux x86-64 Early Page Tables (arch/x86/kernel/head_64.S):
 
/*
 * Build initial page tables to map:
 *   - Identity map for first few megabytes (for transition)
 *   - Kernel at its linked virtual address (__START_KERNEL_map)
 *
 * Tables are built at compile-time at known physical locations:
 *   level4_pgt, level3_kernel_pgt, level2_kernel_pgt, etc.
 */
 
SYM_DATA_START_PAGE_ALIGNED(level4_pgt)
    .quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
    .fill   510, 8, 0
    .quad   level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
SYM_DATA_END(level4_pgt)
 
/*
 * Enabling paging (simplified):
 *
 *   movq  $level4_pgt - __START_KERNEL_map, %rax   # Physical addr of PML4
 *   movq  %rax, %cr3                                # Load into CR3
 *   
 *   movq  %cr0, %rax
 *   orq   $CR0_PG, %rax                            # Set paging bit
 *   movq  %rax, %cr0                                # Enable paging!
 *
 *   # IMMEDIATELY after this, all addresses are virtual!
 *   # We're still executing from identity-mapped low addresses
 *   
 *   # Jump to high (kernel) virtual address:
 *   movabs $1f, %rax
 *   jmp   *%rax
 * 1:
 *   # Now running at proper kernel address
 */

The Transition Moment:

The instant CR0.PG is set, all instruction fetches go through paging. But the CPU is still executing code at a low physical address. The solution:

Create an identity mapping for current code location (VA = PA)
Create a kernel mapping for the kernel's intended virtual address
Enable paging—fetches use identity map
Jump to kernel virtual address
Running at kernel VA; can remove identity map

This is one of the most delicate moments in OS bootstrap—any mistake means triple fault and reset.

Firmware Help

UEFI firmware can set up initial paging before transferring to the OS (UEFI runs in long mode with paging enabled). The OS must understand and potentially modify these tables rather than building from scratch. This makes UEFI boot simpler in some ways but requires handling existing page tables.

Debugging Page Table Issues

Page table bugs are among the most difficult to diagnose—symptoms may be subtle or catastrophic, and the debugging tools themselves need working page tables.

Common Page Table Bugs:

Page Table Bug Categories

•Wrong physical address in PTE: Maps to wrong frame, corrupts other data
•Use-after-free of page table: PTE points to freed memory, reused for different page
•Missing TLB flush: Stale TLB entry causes access to old frame
•Permission bits wrong: Security hole or spurious faults
•Corrupt page table: Random bit flips cause wild translations
•Race conditions: Concurrent updates without proper locking
•Double mapping: Same frame mapped at two VAs with different permissions

pt_debugging.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Debugging page tables in Linux
 
# 1. Read current CR3 and page tables (requires debugger or crash dump)
$ crash vmlinux vmcore
crash> p $cr3
CR3: 0x0000000123456789
 
crash> vtop 0x7fffc3400000   # Virtual-to-physical translation
VIRTUAL     PHYSICAL
0x7fffc3400000 -> 0x456789000
 
# 2. Walk page tables manually in crash/gdb
crash> rd -64 0xFFFF888123456000 8   # Read PML4 entries
 
# 3. Check /proc for process mappings
$ cat /proc/$(pidof myprogram)/maps
00400000-00452000 r-xp 00000000 08:01 1234567 /usr/bin/myprogram
7fffc3200000-7fffc3221000 rw-p 00000000 00:00 0  [stack]
 
# 4. Check /proc for page table info
$ cat /proc/$(pidof myprogram)/smaps | grep -A 5 "^7fffc"
7fffc3200000-7fffc3221000 rw-p 00000000 00:00 0  [stack]
Size:                132 kB
Rss:                  16 kB
Pss:                  16 kB
 
# 5. Use pagewalk debug (kernel config)
# CONFIG_PAGE_TABLE_CHECK=y adds validation of page table operations
 
# 6. Hardware performance counters for TLB statistics
$ perf stat -e dTLB-loads,dTLB-load-misses,iTLB-loads,iTLB-load-misses ./myprogram
 Performance counter stats:
   1,234,567,890      dTLB-loads
      12,345,678      dTLB-load-misses  # High miss rate = PT walking a lot

Hardware Debug Support:

VMX page-walk debug: In VM environments, hypervisor can log all page walks
Page-modification logging (PML): Log all dirty pages for analysis
Hardware breakpoints on PTE addresses: Break when specific PTEs change
CR3-target value list: VMX feature to avoid VM exits on expected CR3 changes

Software Hardening:

// Linux: Page table integrity checking
CONFIG_DEBUG_WX=y          // Warn on W+X mappings
CONFIG_PAGE_TABLE_CHECK=y  // Validate PT operations
CONFIG_DEBUG_PAGEALLOC=y   // Detect use-after-free of pages

Triple Fault = Reset

If a page fault handler itself faults, that's a double fault. If the double fault handler faults, that's a triple fault—the CPU resets. Page table bugs during early boot or in fault handlers often manifest as mysterious system resets with no error message.

Summary: Page Table Location

Page tables must be stored somewhere, and understanding that 'somewhere'—physically in memory, logically organized, and found via hardware registers—is essential for OS implementation. Let's consolidate:

Key Takeaways

•CR3/TTBR holds physical address — The hardware register contains a physical address to avoid circular dependency.
•Page tables are kernel-allocated — Allocated from kernel memory, pinned (can't be swapped), and properly aligned.
•Direct map enables manipulation — Kernel maps all physical memory to access page tables as virtual addresses.
•Kernel shared across processes — Upper page table entries point to shared kernel page tables.
•KPTI separates user/kernel — Two page tables per process prevent Meltdown-class leaks.
•CR3 switch is expensive — TLB flush overhead; PCID helps mitigate by allowing cached entries across switches.
•Bootstrap is delicate — Initial tables built at known locations; identity map enables paging transition.

Module Complete:

With this page, we've completed our deep dive into Page Tables—from overall structure through PTEs, valid bits, protection bits, and finally location. You now understand the data structures that enable paging-based virtual memory, forming the foundation for address translation, protection, and efficient memory management.

The next module covers Address Translation—the actual process of converting virtual addresses to physical addresses using these page table structures.

Page Complete

You now understand where page tables live in memory, how the hardware finds them, how the kernel manages them, and the bootstrap process that creates the first page tables. This knowledge is essential for kernel development, hypervisor implementation, and understanding memory-related security mechanisms.