Loading content...
When you launch a modern web browser, it springs to life in under a second despite being hundreds of megabytes in size. When you open a 10 GB video file in an editor, it appears instantly rather than making you wait for the entire file to load. These experiences—so fundamental to modern computing that we take them for granted—are enabled by demand paging (and historically, demand segmentation).
Demand paging is the mechanism that makes virtual memory practical. Rather than loading an entire program and all its data into memory before execution can begin, the operating system loads pages on demand—only when they're actually accessed. This lazy loading strategy is the key that unlocks virtual memory's power to exceed physical memory limits while providing responsive program startup.
This page provides a comprehensive examination of demand paging and its segmentation counterpart, covering the mechanisms, variants, design decisions, and practical implications that every systems engineer should understand.
By the end of this page, you will deeply understand how demand paging works, the difference between pure demand paging and prepaging, how demand segmentation differs from demand paging, the role of valid-invalid bits, and the complete life cycle of a demand-paged memory access.
Demand loading is a resource management strategy based on a simple principle: don't load something until you actually need it. This principle applies broadly in computing (lazy evaluation, just-in-time compilation), but in memory management, it takes a specific form.
The alternative: Eager loading
Eager (or anticipatory) loading means loading all resources before they're used. For memory, this would mean:
This approach is simple but wasteful—most programs never touch the majority of their address space.
| Aspect | Eager Loading | Demand Loading |
|---|---|---|
| Startup Time | Slow (load everything) | Fast (load almost nothing) |
| Memory Usage | High (everything resident) | Low (only working set resident) |
| First Access | Fast (already loaded) | Slow (triggers page fault) |
| Unused Code | Wastes memory | Never loaded |
| Implementation | Simple | Requires page fault handling |
| Predictability | Consistent (no faults) | Variable (faults on new access) |
The demand paging strategy:
Demand paging makes the opposite choice:
This strategy trades consistent performance for better average-case performance and dramatically lower memory usage.
The demand loading principle appears throughout computing: lazy initialization (create objects when first used), JavaScript module loading (import modules on first reference), database query materialization (fetch rows on iteration). Memory demand paging is just one instance of this powerful pattern.
Let's trace through exactly what happens when a process accesses a page that isn't currently in physical memory. This is the core mechanism of demand paging.
Prerequisites for demand paging:
Demand Paging Step-by-Step:═══════════════════════════ STEP 1: Memory Reference─────────────────────────Process executes: mov eax, [0x7fff0040] ↓CPU needs to read from virtual address 0x7fff0040 STEP 2: TLB Lookup──────────────────CPU checks TLB for VPN = 0x7fff0 → TLB MISS (not cached) STEP 3: Page Table Walk───────────────────────MMU walks page table hierarchy: CR3 → PML4[...] → PDPT[...] → PD[...] → PT[...] ↓ Page Table Entry for VPN 0x7fff0: ┌──────────────────────────────────────────┐ │ Present = 0 │ Swap Location = Block 42 │ └──────────────────────────────────────────┘ ↓ Present bit is 0! ← This is a PAGE FAULT STEP 4: Page Fault Exception────────────────────────────CPU pushes state to kernel stack: - Faulting address (CR2 = 0x7fff0040) - Error code (read access, not present) - Return address (instruction to restart) ↓Jumps to page fault handler (interrupt vector 14 on x86) STEP 5: Kernel Page Fault Handler─────────────────────────────────a) Read CR2 to get faulting address: 0x7fff0040 b) Look up address in process's memory map: → Found: Valid area, file-backed/anonymous, permissions OK → If NOT found: Send SIGSEGV (segmentation fault) c) Find a free physical frame: → If none free, run page replacement algorithm → Evict a victim page (possibly to swap) d) Determine page source: → If anonymous and never written: Zero-fill frame → If file-backed: Read from file at offset → If in swap: Read from swap block 42 e) Read page content (blocking I/O): → Initiate disk read → Block current process → Schedule another process → When I/O completes, wake this process STEP 6: Update Page Table─────────────────────────Set page table entry: ┌──────────────────────────────────────────┐ │ Present = 1 │ Frame = 0x1234 │ Access...│ └──────────────────────────────────────────┘ Update TLB (or let hardware refill on next access) STEP 7: Return from Exception─────────────────────────────IRET instruction returns to saved address: → Re-executes: mov eax, [0x7fff0040] → This time: TLB/PT has valid mapping → Access completes normally!Crucially, the process is unaware that a page fault occurred. From its perspective, the memory access just took a bit longer than usual. This transparency is essential—existing programs work without modification under demand paging.
The valid-invalid bit (also called the present bit) is a single bit in each page table entry that indicates whether the page is currently in physical memory. It's the fundamental trigger for demand paging.
Terminology clarification:
Different systems use different terms:
These all refer to the same concept: does the page table entry point to a physical frame that currently holds this page's data?
x86-64 Page Table Entry (PTE) Structure:════════════════════════════════════════ Bit Name Meaning────────────────────────────────────────────────────────────────0 P (Present) 1 = in physical memory, 0 = not present1 R/W 1 = writable, 0 = read-only2 U/S 1 = user accessible, 0 = supervisor only3 PWT Page-Level Write-Through4 PCD Page-Level Cache Disable5 A (Accessed) Hardware sets to 1 when page is read/written6 D (Dirty) Hardware sets to 1 when page is written7 PS Page Size (for huge pages in PD entry)8 G Global (don't flush from TLB on CR3 change)11-9 AVL Available for OS use12-51 PFN Physical Frame Number (40 bits = 1 TB addressable)62 PKE Protection Key (if enabled)63 NX No-Execute bit (if enabled) When P = 0, other bits can be repurposed:─────────────────────────────────────────The OS can store auxiliary information: - Swap location (which swap slot) - File offset (for file-backed pages) - Copy-on-write indicator - Page state flagsWhat happens when P = 0:
When the CPU's MMU encounters a page table entry with P = 0 during address translation:
Why P = 0 is not always an error:
A page with P = 0 might be:
The OS must examine its own data structures to determine which case applies.
A page marked 'invalid' (P = 0) is not necessarily an error. It means the page isn't in RAM right now. The OS must check if the address is within a valid memory region—if so, it's a demand paging opportunity; if not, it's a genuine memory access violation (segfault).
Pure demand paging is the extreme form of lazy loading: no pages are loaded until they generate a fault. When a process starts under pure demand paging, it has zero pages in memory—even the first instruction of the program must be faulted in.
How pure demand paging works at startup:
Pure Demand Paging Process Lifecycle:══════════════════════════════════════ 1. PROCESS CREATION (exec syscall): ───────────────────────────────── • Parse executable header (ELF, PE, Mach-O) • Create virtual memory regions: - Text segment: virtual 0x400000-0x4ff000 → file offset 0x1000 - Data segment: virtual 0x500000-0x510000 → file offset 0x100000 - BSS: virtual 0x510000-0x520000 → zero-fill on demand - Stack: virtual 0x7fff0000-0x80000000 → zero-fill on demand • All pages marked NOT PRESENT in page table • NO ACTUAL LOADING OCCURS 2. SET INSTRUCTION POINTER: ───────────────────────── Set initial PC to program entry point (e.g., 0x400100) Transfer control to user mode ↓ 3. FIRST INSTRUCTION FAULT: ───────────────────────── CPU tries to fetch instruction at 0x400100 Page table lookup: 0x400xxx → NOT PRESENT! PAGE FAULT #1! Handler loads page containing 0x400100 from executable file Returns to 0x400100, now instruction executes 4. SUBSEQUENT FAULTS: ─────────────────── mov rax, [0x500000] ; Access data segment → PAGE FAULT #2 (load data page from file) call printf ; Call into libc → PAGE FAULT #3 (load libc text page) push rbp ; Use stack → PAGE FAULT #4 (zero-fill stack page) ... and so on until working set stabilizes 5. STEADY STATE: ────────────── After initial burst of faults, most accessed pages are resident New faults only for: • First access to previously-unused code paths • Growing stack/heap • After page eviction under memory pressureAdvantages of pure demand paging:
Disadvantages of pure demand paging:
When pure demand paging makes sense:
Use tools like 'perf record -e page-faults' on Linux to observe the initial burst of page faults when a program starts under pure demand paging. You'll see hundreds or thousands of faults in the first few milliseconds, then the rate drops dramatically as the working set becomes resident.
Pure demand paging, while theoretically elegant, is rarely used in practice. Real systems employ various optimizations to reduce the initial fault storm while retaining demand paging's benefits.
Prepaging (Anticipatory Paging):
Prepaging loads some pages before they're accessed, anticipating that they'll be needed soon. This trades some memory usage for reduced fault rate.
| Strategy | Description | Use Case |
|---|---|---|
| Startup prepaging | Load entry point region at exec() | Reduce initial fault storm |
| Clustered paging | On fault, load adjacent pages too | Exploit spatial locality |
| Working set prepaging | Load saved working set from previous run | Reduce warm-up time |
| Read-ahead | Prefetch sequential pages for sequential access | File I/O optimization |
| Swap prefetch | Load pages from swap proactively | Anticipate reuse |
Clustered paging (fault clustering):
When a page fault occurs, the OS loads not just the faulted page but also several adjacent pages. The rationale: if a program is accessing page N, it likely will access pages N+1, N+2, etc. (spatial locality).
Linux's readahead mechanism is a form of clustered paging for file-backed pages:
Default cluster size: 128-256 KB (32-64 pages for 4K pages)
Adaptive: Increases cluster size if sequential access detected
Benefit: Single I/O operation loads many pages
Risk: Wastes memory if prediction wrong
Working set prepaging:
Some systems track each process's working set and save it when the process is swapped out or hibernated. On resume, the entire working set is preloaded, eliminating the warm-up period.
Windows SuperFetch/Prefetch and macOS's similar features use this approach:
12345678910111213
# View current readahead settings$ cat /sys/block/sda/queue/read_ahead_kb128 # Set readahead to 256 KB (64 pages)$ echo 256 > /sys/block/sda/queue/read_ahead_kb # View per-file readahead with posix_fadvise():# POSIX_FADV_SEQUENTIAL - Increase readahead# POSIX_FADV_RANDOM - Disable readahead # In code:posix_fadvise(fd, 0, 0, POSIX_FADV_SEQUENTIAL);Prepaging is speculative—it guesses which pages will be needed. If the guess is right, it saves time. If wrong, it wastes memory and I/O bandwidth. Systems must balance aggressive prepaging (better hit rate, more waste) against conservative prepaging (less waste, more faults).
While demand paging is the dominant technique today, an alternative approach—demand segmentation—played an important historical role and still informs understanding of virtual memory.
What is demand segmentation?
Demand segmentation applies the lazy loading principle to segments rather than pages. Instead of dividing memory into fixed-size pages, segmentation divides it into variable-size logical units (code segment, data segment, stack segment, etc.).
Under demand segmentation:
| Aspect | Demand Paging | Demand Segmentation |
|---|---|---|
| Unit of transfer | Fixed-size page (4 KB typical) | Variable-size segment (KB to MB) |
| Granularity | Fine (single page) | Coarse (entire segment) |
| Internal fragmentation | Up to page_size-1 per allocation | None within segment |
| External fragmentation | None | Significant (variable sizes) |
| Loading time | Fast (small unit) | Slow (large segment) |
| Memory utilization | Precise | Coarse (all or nothing per segment) |
| Sharing | Page-level | Segment-level |
| Modern usage | Universal | Rare (mostly obsolete) |
Why demand paging won:
Demand paging has largely replaced demand segmentation for several reasons:
Where segmentation still matters:
While pure demand segmentation is obsolete, segmentation concepts persist:
Multics (1969) famously combined segmentation and paging. Each segment was divided into pages, providing both the logical structure of segments and the physical efficiency of pages. This 'segmented paging' influenced later designs, though modern systems mostly use 'paged segmentation' where paging is primary.
Not all page faults are equal. Understanding the different types helps in performance analysis and system tuning.
Classification by severity:
| Type | Also Called | Description | Typical Latency |
|---|---|---|---|
| Minor fault | Soft fault | Page is in RAM but not mapped to this process | ~1 μs |
| Major fault | Hard fault | Page must be loaded from disk | ~100 μs - 10 ms |
| Invalid fault | Segfault | Genuinely invalid access (unmapped, protection) | N/A (signal/exception) |
Minor (soft) faults:
A minor fault occurs when the page exists in physical memory (perhaps in the page cache or another process's mapping) but isn't yet mapped in the faulting process's page table. The handler just needs to update the page table—no disk I/O required.
Common causes:
Major (hard) faults:
A major fault requires disk I/O to bring the page into memory. This is the expensive case—potentially 10,000× slower than a minor fault.
Common causes:
Monitoring page faults:
1234567891011121314151617181920212223242526
# Per-process page faults (majflt = major, minflt = minor)$ ps -o pid,min_flt,maj_flt,cmd $(pgrep -f myapp) PID MINFL MAJFL CMD12345 5243 122 myapp --arg # System-wide page fault rate$ vmstat 1procs -----------memory---------- ---swap-- -----io---- -system-- r b swpd free buff cache si so bi bo in cs 1 0 10240 500000 50000 200000 0 0 52 10 200 150 ↑ ↑ │ └─ Pages swapped out (may cause future major faults) └─ Pages swapped in (major faults resolved) # Detailed page fault statistics$ time -v ls... Minor (reclaiming a frame) page faults: 93 Major (requiring I/O) page faults: 0... # Using perf to trace page faults$ perf stat -e page-faults,minor-faults,major-faults ./myapp 1,234 page-faults 1,200 minor-faults 34 major-faultsFor performance-sensitive applications, monitor major fault rate closely. A high major fault rate indicates insufficient RAM for the working set, excessive swap usage, or poor locality. Minor faults are usually acceptable—they're just bookkeeping.
For demand paging to work, the hardware must support restartable instructions: after a page fault is resolved, the faulting instruction must execute as if nothing happened. This seemingly simple requirement has subtle implications.
The restart requirement:
When a page fault occurs mid-instruction, the CPU must:
Problematic instruction scenarios:
Challenging Cases for Instruction Restart:═══════════════════════════════════════════ 1. BLOCK MOVE INSTRUCTIONS (e.g., REP MOVSB on x86) ───────────────────────────────────────────────── rep movsb ; Copy ECX bytes from [ESI] to [EDI] Problem: What if fault occurs after copying 50 of 100 bytes? Solution: CPU tracks partial progress in registers; ECX/ESI/EDI updated per byte, so restart continues from where it left off. 2. INSTRUCTIONS THAT MODIFY OPERANDS (auto-increment) ────────────────────────────────────────────────── mov eax, [ebx]++ ; (hypothetical: load and increment pointer) Problem: If ebx is incremented before the load faults, restart would load from wrong address. Solution: Modern ISAs avoid such designs, or precisely define when side-effects occur relative to potential faults. 3. INSTRUCTIONS WITH MULTIPLE MEMORY ACCESSES ────────────────────────────────────────── push [address] ; Read from memory, then write to stack Problem: If read succeeds but stack write faults, must restart the read too. Solution: CPU checks for potential faults before committing, or can restart the entire instruction cleanly. 4. COMPARE-AND-SWAP / ATOMIC OPERATIONS ───────────────────────────────────── lock cmpxchg [mem], ebx Problem: Atomicity must be preserved across restart. Solution: If fault occurs, no memory modification has happened; entire atomic operation restarts.Historical challenges:
Some historical architectures (like certain models of the Motorola 68000) had instructions that couldn't be restarted cleanly. This made demand paging impossible or required special handling.
The Motorola 68010 added the ability to save and restore instruction execution state mid-flight, fixing this issue. Modern architectures are designed with restartability in mind.
The x86 approach:
x86 instructions are designed to be restartable. The processor:
This design ensures that after a page fault is handled, re-executing the same instruction produces the correct result.
Modern out-of-order CPUs speculate past memory operations. When a page fault occurs, all speculative state must be discarded and the CPU resumed from a precise architectural state. This rollback capability, needed for exceptions anyway, also enables demand paging.
Demand paging is the mechanism that transforms virtual memory from an addressing abstraction into a powerful resource management system. Let's consolidate what we've learned:
What's next:
We've seen how pages are loaded on demand. The next page examines the benefits of virtual memory holistically—not just the ability to exceed physical memory, but also memory protection, simplified programming, efficient use of RAM, and enabling advanced features like copy-on-write and memory-mapped files.
You now have a comprehensive understanding of demand paging and demand segmentation—the lazy loading mechanisms that make virtual memory practical. These mechanisms are the foundation for all the advanced virtual memory features we'll explore in subsequent pages.