Loading learning content...
In our exploration of lazy loading, we established that loading pages only when needed is generally beneficial. But how far should we take this principle? Pure demand paging represents the extreme interpretation: load absolutely nothing until the CPU explicitly generates a memory reference that requires a page.
Under pure demand paging, a process begins execution with zero pages in memory. Not a single byte of code, not a single byte of data. The very first instruction the CPU attempts to execute causes a page fault. That page is loaded, and execution continues—only to likely fault again for data or stack accesses. Gradually, as the program runs, the pages it actually uses accumulate in memory.
This approach embodies a philosophical commitment: never do work that might not be needed. It's the logical conclusion of lazy loading, but like many extremes, it comes with trade-offs that make it more of a conceptual ideal than a universally optimal strategy.
By the end of this page, you will understand pure demand paging as a strategy: how it works, its behavior during process startup, the conditions where it excels, and the situations where its strict no-prefetch policy becomes a liability. You'll also see how pure demand paging relates to real-world operating system implementations.
Pure demand paging is a memory management strategy with a simple rule: never load a page into memory until a CPU access explicitly requires it. No speculation, no prefetching, no read-ahead—just reactive loading in response to faults.
Formal Definition:
For all pages P in process address space:
load(P) occurs if and only if
∃ instruction I where I accesses address A ∧ A ∈ P ∧ P ∉ RAM
In plain terms: a page is loaded if and only if an instruction tries to access an address within that page and the page isn't already in physical memory.
Characteristics of Pure Demand Paging:
Comparison: Pure vs. Modified Demand Paging
| Aspect | Pure Demand Paging | Modified (Practical) Demand Paging |
|---|---|---|
| Initial pages | Zero | May preload entry point, libraries |
| Read-ahead | Never | When sequential access detected |
| Startup time | Slow (many faults) | Fast (fewer initial faults) |
| Memory efficiency | Optimal (only accessed pages) | Near-optimal (some speculation) |
| I/O scheduling | Many small reads | Fewer, larger reads |
| First instruction | Always faults | May not fault |
Pure demand paging is primarily a theoretical concept used as a baseline for analysis. Real operating systems implement modified demand paging that retains the core benefit (load pages reactively) while adding optimizations (prefetching, read-ahead) that improve practical performance.
The most distinctive aspect of pure demand paging is how processes start. Let's trace through the startup sequence in detail.
The exec() Sequence:
When a process calls exec() to load a new program:
_start)The First Fault Cascade:
Time →
0 ms: CPU jumps to entry point address (e.g., 0x401000)
→ Page fault #1: Code page containing entry point
→ Load page from executable file
→ ~10 ms for SSD, longer for HDD
10 ms: First instruction executes (e.g., PUSH RBP)
→ Page fault #2: Stack page
→ Allocate zero-filled frame
→ ~1 ms (minor fault)
11 ms: Next instruction accesses global data
→ Page fault #3: Data page
→ Load page from executable file
→ ~10 ms
21 ms: Program calls a library function (printf)
→ Page fault #4: PLT/GOT page
→ ~10 ms
31 ms: Library code executes
→ Page fault #5: libc code page
→ ~10 ms
...and so on for every new page touched
A simple "Hello, World" program under pure demand paging might experience 10-20 page faults before producing any output. Larger programs can experience hundreds or thousands of faults during initialization.
Program: int main() { printf("Hello\n"); return 0; }Approximately 50-100 major page faults before output appearsBreakdown of faults:
• Entry point code (1-2 pages) - crt0 startup code • Main function code (1 page) • printf implementation (3-5 pages) - formatting, buffering • Memory allocation for I/O buffers (1-2 pages) • Global libc data structures (2-3 pages) • Dynamic linker code (if dynamically linked) (5-10 pages) • Symbol resolution tables (2-5 pages) • Stack pages (1-3 pages) • Additional library dependencies (10-50 pages)
With pure demand paging, each of these is a separate page fault. On an SSD with 100μs random read latency, 100 faults = 10ms minimum startup time just for I/O—not including CPU processing.
Pure demand paging creates terrible startup performance for complex applications. A browser or IDE might touch thousands of pages during initialization. With pure demand paging, that's thousands of serial disk operations, turning what could be a sub-second startup into multiple seconds of waiting.
After the startup storm of page faults, the process enters a steadier state as its working set becomes established in memory. Pure demand paging has interesting implications for this phase.
Working Set Definition:
The working set W(t, Δ) is the set of pages referenced during the time interval [t - Δ, t]. Under pure demand paging, the physical pages present in memory are exactly those that have been accessed (minus any that were evicted due to memory pressure).
Working Set Building Phase:
Fault Rate Over Time (pure demand paging):
Faults/sec
▲
1000 │▓▓
│▓▓▓▓
500 │▓▓▓▓▓▓▓
│▓▓▓▓▓▓▓▓▓▓
100 │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
│ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓───────
10 │ Steady state
│ (new pages rare)
0 └────────────────────────────────────────────────────────────▶
1s 5s 10s 30s Time
←───────────────────→ Working set establishment
Phase 1: High Fault Rate (0-10s)
Phase 2: Decreasing Fault Rate (10-30s)
Phase 3: Steady State (30s+)
Locality Determines Steady-State Behavior:
Programs with good locality stabilize quickly:
Programs with poor locality never stabilize:
Pure demand paging's effectiveness depends entirely on locality. Programs with strong locality experience a brief startup penalty, then run smoothly. Programs with poor locality experience continuous page faults. Understanding your application's access patterns is crucial for predicting demand paging behavior.
Pure demand paging achieves optimal memory utilization in a specific sense: it never loads pages that aren't accessed. Let's analyze this efficiency in detail.
Memory Efficiency Theorem:
Under pure demand paging with no eviction (unlimited memory):
|Pages in Memory| = |Unique Pages Accessed|
This is optimal because any page loaded was definitively needed. No memory is wasted on speculative loads that turn out to be unnecessary.
Comparison with Eager Loading:
| Scenario | Executable Size | Pages Accessed | Pure Demand | Eager Load |
|---|---|---|---|---|
| Simple CLI tool | 1 MB | ~100 KB | 25 pages | 250 pages |
| Office suite | 500 MB | ~50 MB | 12,500 pages | 125,000 pages |
| Web browser | 200 MB | ~30 MB | 7,500 pages | 50,000 pages |
| IDE | 400 MB | ~60 MB | 15,000 pages | 100,000 pages |
| Database server | 100 MB | ~80 MB (hot) | 20,000 pages | 25,000 pages |
Memory Savings Calculation:
For a typical office suite:
With eager loading: 500 MB per instance With pure demand: 50 MB per instance
If running 3 instances:
The cost of this efficiency: Startup time and initial response latency.
When Memory Efficiency Matters Most:
In practice, shared pages (libraries, code segments) mean that multiple processes using the same library share physical frames. This reduces the advantage of pure demand paging because those shared pages are typically loaded once (when the first process needs them) and then shared without additional faults.
Pure demand paging has distinctive I/O characteristics that significantly impact performance, particularly on mechanical storage.
I/O Pattern Under Pure Demand Paging:
Comparison of I/O Patterns:
| Metric | Pure Demand | With Read-Ahead | Eager Loading |
|---|---|---|---|
| Transfer size | 4 KB | 16-128 KB | Entire file |
| I/O operations | Many small | Fewer larger | Few very large |
| Access pattern | Random | Sequential+random | Sequential |
| HDD friendliness | Poor | Good | Excellent |
| SSD suitability | Acceptable | Good | Good |
| Total bytes read | Minimal | Exceeds accessed | Maximum |
The HDD Problem:
Mechanical hard drives have a significant random access penalty:
HDD Performance:
- Sequential read: ~100-150 MB/s
- Random read (4 KB): ~100 IOPS → 400 KB/s
Loading 10 MB of random pages:
- Sequential: 10 MB / 100 MB/s = 100 ms
- Random 4 KB: 2500 IOPS / 100 IOPS = 25 seconds (!)
Pure demand paging on HDDs can be 250x slower than sequential loading for random access patterns. This is why read-ahead is so important for HDD-based systems.
SSD Improvements:
SSD Performance:
- Sequential read: ~500-3000 MB/s (NVMe)
- Random read (4 KB): ~50,000-500,000 IOPS → 200-2000 MB/s
Loading 10 MB of random pages:
- Random 4 KB: 2500 pages / 100,000 IOPS = 25 ms
SSDs make pure demand paging far more practical, but read-ahead still helps by exploiting the even higher sequential throughput.
The viability of pure demand paging depends heavily on storage technology. On HDDs, it's often unacceptably slow. On NVMe SSDs, it's practical. In-memory or RAM-disk scenarios eliminate the I/O penalty entirely, making pure demand paging very efficient.
Despite its drawbacks, there are scenarios where the pure demand paging approach (or something close to it) is optimal.
Scenario 1: Large Programs with Small Working Sets
Consider an IDE with 100 MB of code but only 10 MB active at any time:
The key insight: if usage is sparse relative to size, demand paging wins.
Scenario 2: Rarely-Run Utilities
Command-line tool invoked once:
Eager loading:
Load 5 MB → 500 ms
Execute 10 ms
Exit -
Total: 510 ms (5 MB loaded, ~1 MB used)
Pure demand:
Load ~1 MB pages → 100 ms
Execute 10 ms
Exit -
Total: 110 ms (1 MB loaded)
For short-lived processes, pure demand paging can actually be faster because it loads less data.
Scenario 3: Fork and Exec Patterns
Traditional Unix pattern:
fork() → Create child with copy of parent
exec() → Replace child with new program
With pure demand paging:
fork():
- No copying! Just create page table with COW
- Instantaneous (no memory touched)
exec():
- Discard all COW references
- Set up new page table (everything non-present)
- Child's parent-inherited pages were never loaded
Pure demand paging + COW makes fork+exec extremely efficient. The child never pays for pages it will immediately discard.
The Unix fork-exec pattern was designed with demand paging in mind. A process forks, creating a child with (virtually) the same memory, then immediately execs a new program. With demand paging and COW, both operations are nearly free because no pages are actually copied or loaded until accessed.
There are equally important scenarios where pure demand paging performs poorly, and practical systems must use alternatives.
Scenario 1: Long-Running Startup-Intensive Applications
Web browser startup:
Pure demand paging:
1000+ page faults during initialization
User stares at blank window for 5+ seconds
Every click during warmup causes new faults
With prefetching:
Predict common startup pages
Stream 50 MB during splash screen display
Application responsive in 1-2 seconds
Scenario 2: Predictable Access Patterns
Sequential file processing:
Pure demand paging:
Read page 0: fault, load, process
Read page 1: fault, load, process
Read page 2: fault, load, process
... (one fault per page, I/O latency on each)
With read-ahead:
Read page 0: fault, load pages 0-15, process page 0
Read page 1: already loaded, process immediately
Read pages 2-15: already loaded
Read page 16: fault, load pages 16-31
... (one fault per 16 pages)
Sequential access with read-ahead is an order of magnitude faster.
Scenario 3: Real-Time Audio/Video Processing
Audio processing requirements:
- Buffer size: 128 samples
- Sample rate: 48 kHz
- Deadline: 128/48000 = 2.7 ms per buffer
Page fault during processing:
- SSD latency: 50-200 μs (acceptable... barely)
- HDD latency: 5-15 ms (BUFFER UNDERRUN! Audio glitch!)
Solution: Wire pages in memory (mlock) to prevent any faults
Real-time systems cannot tolerate the unpredictability of demand faults.
For interactive and real-time applications, pure demand paging's problem isn't average performance—it's variance. Most memory accesses are instantaneous, but occasionally one takes milliseconds. This unpredictability creates a poor user experience and can cause real-time failures.
No production operating system uses pure demand paging exactly as described. However, understanding the pure model helps us appreciate what optimizations real systems add.
Linux Approach:
Linux uses demand paging as the foundation but adds significant optimizations:
Windows Approach:
Windows uses a more aggressive prefetching strategy:
1234567891011121314151617181920212223242526272829
/* Using madvise to guide demand paging behavior */ #include <sys/mman.h> void process_file(void *mapped, size_t length) { /* Tell kernel we'll access sequentially */ madvise(mapped, length, MADV_SEQUENTIAL); /* Kernel will use aggressive read-ahead */ /* Or: tell kernel we'll access randomly */ madvise(mapped, length, MADV_RANDOM); /* Kernel disables read-ahead (closer to pure demand) */ /* Request immediate prefetch */ madvise(mapped, length, MADV_WILLNEED); /* Kernel initiates background loading */ /* Signal we're done with this region */ madvise(mapped, length, MADV_DONTNEED); /* Kernel can evict these pages preferentially */} void lock_critical_pages(void *addr, size_t length) { /* Wire pages in memory - no demand faults allowed */ mlock(addr, length); /* For real-time: lock everything */ mlockall(MCL_CURRENT | MCL_FUTURE);}| Operating System | Base Strategy | Key Optimizations |
|---|---|---|
| Linux | Demand paging | Read-ahead, fault-around, madvise hints |
| Windows | Demand paging | Prefetcher, SuperFetch, working set manager |
| macOS | Demand paging | Dynamic pager, compressed memory |
| FreeBSD | Demand paging | Clustered page fault handling |
| Real-time OS | Wired pages | No demand faults for determinism |
Application developers can use madvise() to communicate access patterns to the kernel. This allows applications to approach pure demand paging behavior (MADV_RANDOM) when appropriate or request aggressive prefetching (MADV_SEQUENTIAL, MADV_WILLNEED) when access patterns are predictable.
Pure demand paging represents one extreme on a spectrum of paging strategies. Understanding its trade-offs helps us make informed decisions.
The Core Trade-off:
Memory Efficiency ◄─────────────────────► Startup Performance
▲ ▲
│ │
Pure Demand Eager Loading
Paging + Prefetch
Each position on this spectrum has costs and benefits:
| Strategy | Memory Use | Startup Time | Steady State | Predictability |
|---|---|---|---|---|
| Pure Demand | Optimal | Poor | Excellent | Variable |
| Moderate Prefetch | Good | Fair | Excellent | Good |
| Aggressive Prefetch | Fair | Good | Excellent | Good |
| Full Eager Load | Poor | Fair | Excellent | Excellent |
| Always Wired | Poor | Good | Excellent | Excellent |
Decision Framework:
Choose more toward pure demand when:
Choose more toward prefetching when:
Choose wired pages when:
Don't assume pure demand paging is the problem. Profile your application: measure major fault rate, working set size, and access patterns. The answer might be that demand paging is already optimal, or that simple madvise hints can solve your issues without architectural changes.
We've examined pure demand paging—the extreme lazy loading strategy. Let's consolidate the key insights:
What's Next:
Having explored the extreme of pure demand paging, we'll now examine its counterweight: prepaging. Prepaging attempts to preemptively load pages before they're needed, trading memory efficiency for reduced fault rates and faster initial performance. Understanding both extremes helps us appreciate the practical middle ground that real systems occupy.
You now understand pure demand paging as a theoretical baseline and its practical implications. This knowledge enables you to reason about memory system behavior, understand OS design choices, and make informed decisions about when lazy loading is beneficial versus when prefetching might be preferred.