Operating SystemsVirtual Memory

Demand Paging

LevelIntermediate

Duration75 mins

TopicVirtual Memory

4 / 5

Pure Demand Paging

The Extreme Lazy Approach

In our exploration of lazy loading, we established that loading pages only when needed is generally beneficial. But how far should we take this principle? Pure demand paging represents the extreme interpretation: load absolutely nothing until the CPU explicitly generates a memory reference that requires a page.

Under pure demand paging, a process begins execution with zero pages in memory. Not a single byte of code, not a single byte of data. The very first instruction the CPU attempts to execute causes a page fault. That page is loaded, and execution continues—only to likely fault again for data or stack accesses. Gradually, as the program runs, the pages it actually uses accumulate in memory.

This approach embodies a philosophical commitment: never do work that might not be needed. It's the logical conclusion of lazy loading, but like many extremes, it comes with trade-offs that make it more of a conceptual ideal than a universally optimal strategy.

What You Will Learn

By the end of this page, you will understand pure demand paging as a strategy: how it works, its behavior during process startup, the conditions where it excels, and the situations where its strict no-prefetch policy becomes a liability. You'll also see how pure demand paging relates to real-world operating system implementations.

Definition and Characteristics

Pure demand paging is a memory management strategy with a simple rule: never load a page into memory until a CPU access explicitly requires it. No speculation, no prefetching, no read-ahead—just reactive loading in response to faults.

Formal Definition:

For all pages P in process address space:
  load(P) occurs if and only if
    ∃ instruction I where I accesses address A ∧ A ∈ P ∧ P ∉ RAM

In plain terms: a page is loaded if and only if an instruction tries to access an address within that page and the page isn't already in physical memory.

Characteristics of Pure Demand Paging:

Defining Characteristics

•Zero initial resident set: Process starts with no pages in memory whatsoever.
•Fault-driven loading: Every page is loaded as a result of a page fault.
•No speculation: The OS makes no predictions about future page needs.
•No read-ahead: Loading page N does NOT trigger loading of pages N+1, N+2, etc.
•Minimal memory footprint: At any time, only pages that have been explicitly accessed are present.
•Maximum fault count: Every distinct page access results in at least one fault.

Comparison: Pure vs. Modified Demand Paging

Aspect	Pure Demand Paging	Modified (Practical) Demand Paging
Initial pages	Zero	May preload entry point, libraries
Read-ahead	Never	When sequential access detected
Startup time	Slow (many faults)	Fast (fewer initial faults)
Memory efficiency	Optimal (only accessed pages)	Near-optimal (some speculation)
I/O scheduling	Many small reads	Fewer, larger reads
First instruction	Always faults	May not fault

Theoretical vs. Practical

Pure demand paging is primarily a theoretical concept used as a baseline for analysis. Real operating systems implement modified demand paging that retains the core benefit (load pages reactively) while adding optimizations (prefetching, read-ahead) that improve practical performance.

Process Startup Under Pure Demand Paging

The most distinctive aspect of pure demand paging is how processes start. Let's trace through the startup sequence in detail.

The exec() Sequence:

When a process calls exec() to load a new program:

Kernel parses executable header - Determines code, data, BSS sections without loading them
Creates empty page table - All entries set to not-present
Sets up VMAs - Records which address ranges map to which file regions
Sets instruction pointer - Points to program entry (e.g., _start)
Returns to user mode - Execution 'begins' with zero pages loaded

The First Fault Cascade:

Time →

0 ms:   CPU jumps to entry point address (e.g., 0x401000)
        → Page fault #1: Code page containing entry point
        → Load page from executable file
        → ~10 ms for SSD, longer for HDD

10 ms:  First instruction executes (e.g., PUSH RBP)
        → Page fault #2: Stack page
        → Allocate zero-filled frame
        → ~1 ms (minor fault)

11 ms:  Next instruction accesses global data
        → Page fault #3: Data page
        → Load page from executable file
        → ~10 ms

21 ms:  Program calls a library function (printf)
        → Page fault #4: PLT/GOT page
        → ~10 ms

31 ms:  Library code executes
        → Page fault #5: libc code page
        → ~10 ms

...and so on for every new page touched

A simple "Hello, World" program under pure demand paging might experience 10-20 page faults before producing any output. Larger programs can experience hundreds or thousands of faults during initialization.

Example: Startup Faults for a Simple ProgramCounting page faults during initialization of a minimal C program

Input

Program: int main() { printf("Hello\n"); return 0; }

Output

Approximately 50-100 major page faults before output appears

Explanation

Breakdown of faults:

• Entry point code (1-2 pages) - crt0 startup code • Main function code (1 page) • printf implementation (3-5 pages) - formatting, buffering • Memory allocation for I/O buffers (1-2 pages) • Global libc data structures (2-3 pages) • Dynamic linker code (if dynamically linked) (5-10 pages) • Symbol resolution tables (2-5 pages) • Stack pages (1-3 pages) • Additional library dependencies (10-50 pages)

With pure demand paging, each of these is a separate page fault. On an SSD with 100μs random read latency, 100 faults = 10ms minimum startup time just for I/O—not including CPU processing.

Startup Performance Problem

Pure demand paging creates terrible startup performance for complex applications. A browser or IDE might touch thousands of pages during initialization. With pure demand paging, that's thousands of serial disk operations, turning what could be a sub-second startup into multiple seconds of waiting.

Working Set Establishment

After the startup storm of page faults, the process enters a steadier state as its working set becomes established in memory. Pure demand paging has interesting implications for this phase.

Working Set Definition:

The working set W(t, Δ) is the set of pages referenced during the time interval [t - Δ, t]. Under pure demand paging, the physical pages present in memory are exactly those that have been accessed (minus any that were evicted due to memory pressure).

Working Set Building Phase:

Fault Rate Over Time (pure demand paging):

     Faults/sec
          ▲
     1000 │▓▓
          │▓▓▓▓
      500 │▓▓▓▓▓▓▓
          │▓▓▓▓▓▓▓▓▓▓
      100 │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
          │               ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓───────
       10 │                                                    Steady state
          │                                                    (new pages rare)
        0 └────────────────────────────────────────────────────────────▶
               1s    5s    10s         30s              Time
          ←───────────────────→ Working set establishment

Phase 1: High Fault Rate (0-10s)

Every new code path or data structure causes faults
Dominated by major faults (disk I/O)
Application feels sluggish

Phase 2: Decreasing Fault Rate (10-30s)

Core working set established
Fewer new pages touched
Major faults become rare

Phase 3: Steady State (30s+)

Working set stable in memory
Only occasional faults for rarely-used code
Application runs at near-native speed

Locality Determines Steady-State Behavior:

Programs with good locality stabilize quickly:

Text editors: Working set established in seconds
Compilers: Working set per compilation phase, then stable
Web servers: Working set per request type, frequently accessed

Programs with poor locality never stabilize:

Database scans: Touch different pages continuously
Virtual machines: Guest OS has unpredictable access patterns
Machine learning: Training data access may span huge datasets

The Locality Assumption

Pure demand paging's effectiveness depends entirely on locality. Programs with strong locality experience a brief startup penalty, then run smoothly. Programs with poor locality experience continuous page faults. Understanding your application's access patterns is crucial for predicting demand paging behavior.

Memory Efficiency Analysis

Pure demand paging achieves optimal memory utilization in a specific sense: it never loads pages that aren't accessed. Let's analyze this efficiency in detail.

Memory Efficiency Theorem:

Under pure demand paging with no eviction (unlimited memory):

|Pages in Memory| = |Unique Pages Accessed|

This is optimal because any page loaded was definitively needed. No memory is wasted on speculative loads that turn out to be unnecessary.

Comparison with Eager Loading:

Memory Utilization: Pure Demand vs. Eager Loading
Scenario	Executable Size	Pages Accessed	Pure Demand	Eager Load
Simple CLI tool	1 MB	~100 KB	25 pages	250 pages
Office suite	500 MB	~50 MB	12,500 pages	125,000 pages
Web browser	200 MB	~30 MB	7,500 pages	50,000 pages
IDE	400 MB	~60 MB	15,000 pages	100,000 pages
Database server	100 MB	~80 MB (hot)	20,000 pages	25,000 pages

Memory Savings Calculation:

For a typical office suite:

Executable + libraries: 500 MB
Typical usage touches: 50 MB
Memory usage ratio: 50/500 = 10%

With eager loading: 500 MB per instance With pure demand: 50 MB per instance

If running 3 instances:

Eager: 1500 MB (with sharing: ~600 MB if code shared)
Pure demand: 150 MB (or less with code sharing)

The cost of this efficiency: Startup time and initial response latency.

When Memory Efficiency Matters Most:

Premium on Memory Efficiency

•Embedded systems: Limited RAM where every kilobyte matters
•Cloud instances: Paying per GB-hour of memory used
•Multi-tenant servers: Many processes competing for RAM
•Large address spaces: 64-bit processes with sparse usage
•Container density: Maximizing containers per host

Sharing Complicates the Picture

In practice, shared pages (libraries, code segments) mean that multiple processes using the same library share physical frames. This reduces the advantage of pure demand paging because those shared pages are typically loaded once (when the first process needs them) and then shared without additional faults.

I/O Behavior and Throughput

Pure demand paging has distinctive I/O characteristics that significantly impact performance, particularly on mechanical storage.

I/O Pattern Under Pure Demand Paging:

Random access pattern: Pages may be loaded from anywhere in the file
Small transfer size: Each fault loads one page (typically 4 KB)
Serial dependencies: Cannot start loading next page until current instruction restarts
No aggregation: Each page is a separate I/O operation

Comparison of I/O Patterns:

I/O Characteristics by Loading Strategy
Metric	Pure Demand	With Read-Ahead	Eager Loading
Transfer size	4 KB	16-128 KB	Entire file
I/O operations	Many small	Fewer larger	Few very large
Access pattern	Random	Sequential+random	Sequential
HDD friendliness	Poor	Good	Excellent
SSD suitability	Acceptable	Good	Good
Total bytes read	Minimal	Exceeds accessed	Maximum

The HDD Problem:

Mechanical hard drives have a significant random access penalty:

HDD Performance:
- Sequential read: ~100-150 MB/s
- Random read (4 KB): ~100 IOPS → 400 KB/s

Loading 10 MB of random pages:
- Sequential: 10 MB / 100 MB/s = 100 ms
- Random 4 KB: 2500 IOPS / 100 IOPS = 25 seconds (!)

Pure demand paging on HDDs can be 250x slower than sequential loading for random access patterns. This is why read-ahead is so important for HDD-based systems.

SSD Improvements:

SSD Performance:
- Sequential read: ~500-3000 MB/s (NVMe)
- Random read (4 KB): ~50,000-500,000 IOPS → 200-2000 MB/s

Loading 10 MB of random pages:
- Random 4 KB: 2500 pages / 100,000 IOPS = 25 ms

SSDs make pure demand paging far more practical, but read-ahead still helps by exploiting the even higher sequential throughput.

Storage Technology Matters

The viability of pure demand paging depends heavily on storage technology. On HDDs, it's often unacceptably slow. On NVMe SSDs, it's practical. In-memory or RAM-disk scenarios eliminate the I/O penalty entirely, making pure demand paging very efficient.

When Pure Demand Paging Excels

Despite its drawbacks, there are scenarios where the pure demand paging approach (or something close to it) is optimal.

Scenario 1: Large Programs with Small Working Sets

Consider an IDE with 100 MB of code but only 10 MB active at any time:

Eager loading: 100 MB always occupied
Pure demand: 10 MB after warmup
Benefit: 90 MB saved for other processes

The key insight: if usage is sparse relative to size, demand paging wins.

Scenario 2: Rarely-Run Utilities

Command-line tool invoked once:

Eager loading:
  Load 5 MB          →  500 ms
  Execute               10 ms
  Exit                      -
  Total:                 510 ms (5 MB loaded, ~1 MB used)

Pure demand:
  Load ~1 MB pages  →  100 ms
  Execute               10 ms
  Exit                      -
  Total:                 110 ms (1 MB loaded)

For short-lived processes, pure demand paging can actually be faster because it loads less data.

Scenarios Favoring Pure Demand Paging

•Short-lived processes: Less total data loaded than with eager loading
•Memory-constrained systems: Maximum efficiency in RAM utilization
•Sparse access programs: Only small fraction of code/data ever touched
•Interactive applications: Immediate startup even if initial interactions are slow
•Fork-heavy workloads: Children with COW rarely touch all parent pages
•Fast storage (NVMe, RAM): I/O penalty minimized or eliminated

Scenario 3: Fork and Exec Patterns

Traditional Unix pattern:
  fork()   → Create child with copy of parent
  exec()   → Replace child with new program

With pure demand paging:
  fork():
    - No copying! Just create page table with COW
    - Instantaneous (no memory touched)
  
  exec():
    - Discard all COW references
    - Set up new page table (everything non-present)
    - Child's parent-inherited pages were never loaded

Pure demand paging + COW makes fork+exec extremely efficient. The child never pays for pages it will immediately discard.

The Fork-Exec Pattern

The Unix fork-exec pattern was designed with demand paging in mind. A process forks, creating a child with (virtually) the same memory, then immediately execs a new program. With demand paging and COW, both operations are nearly free because no pages are actually copied or loaded until accessed.

When Pure Demand Paging Struggles

There are equally important scenarios where pure demand paging performs poorly, and practical systems must use alternatives.

Scenario 1: Long-Running Startup-Intensive Applications

Web browser startup:

  Pure demand paging:
    1000+ page faults during initialization
    User stares at blank window for 5+ seconds
    Every click during warmup causes new faults
    
  With prefetching:
    Predict common startup pages
    Stream 50 MB during splash screen display
    Application responsive in 1-2 seconds

Scenario 2: Predictable Access Patterns

Sequential file processing:

  Pure demand paging:
    Read page 0: fault, load, process
    Read page 1: fault, load, process
    Read page 2: fault, load, process
    ... (one fault per page, I/O latency on each)
    
  With read-ahead:
    Read page 0: fault, load pages 0-15, process page 0
    Read page 1: already loaded, process immediately
    Read pages 2-15: already loaded
    Read page 16: fault, load pages 16-31
    ... (one fault per 16 pages)

Sequential access with read-ahead is an order of magnitude faster.

Scenarios Where Pure Demand Paging Fails

•Complex application startup: Thousands of faults create unacceptable delays
•Sequential file processing: Read-ahead would eliminate most faults
•Real-time systems: Cannot tolerate page fault latency
•Latency-sensitive servers: Each request might need different pages
•HDD-based systems: Random 4 KB reads are extremely slow
•Highly concurrent workloads: Faults serialize execution on each thread

Scenario 3: Real-Time Audio/Video Processing

Audio processing requirements:
- Buffer size: 128 samples
- Sample rate: 48 kHz
- Deadline: 128/48000 = 2.7 ms per buffer

Page fault during processing:
- SSD latency: 50-200 μs (acceptable... barely)
- HDD latency: 5-15 ms (BUFFER UNDERRUN! Audio glitch!)

Solution: Wire pages in memory (mlock) to prevent any faults

Real-time systems cannot tolerate the unpredictability of demand faults.

Latency Variance is the Enemy

For interactive and real-time applications, pure demand paging's problem isn't average performance—it's variance. Most memory accesses are instantaneous, but occasionally one takes milliseconds. This unpredictability creates a poor user experience and can cause real-time failures.

Pure Demand Paging in Real Systems

No production operating system uses pure demand paging exactly as described. However, understanding the pure model helps us appreciate what optimizations real systems add.

Linux Approach:

Linux uses demand paging as the foundation but adds significant optimizations:

Read-ahead: When a page fault occurs, the fault handler may initiate async loading of adjacent pages
Fault-around: Map additional pages that are already in page cache, even if not faulted
madvise hints: Applications can request specific prefetching behavior
Page cache prefetch: Background readahead based on detected patterns

Windows Approach:

Windows uses a more aggressive prefetching strategy:

Prefetcher: Monitors access patterns, preloads at application startup
SuperFetch: Machine learning to predict and preload pages
Working set estimation: Proactive working set establishment
Boot optimization: Prefetch files accessed during boot

linux_madvise_hints.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/* Using madvise to guide demand paging behavior */
 
#include <sys/mman.h>
 
void process_file(void *mapped, size_t length) {
    /* Tell kernel we'll access sequentially */
    madvise(mapped, length, MADV_SEQUENTIAL);
    /* Kernel will use aggressive read-ahead */
    
    /* Or: tell kernel we'll access randomly */
    madvise(mapped, length, MADV_RANDOM);
    /* Kernel disables read-ahead (closer to pure demand) */
    
    /* Request immediate prefetch */
    madvise(mapped, length, MADV_WILLNEED);
    /* Kernel initiates background loading */
    
    /* Signal we're done with this region */
    madvise(mapped, length, MADV_DONTNEED);
    /* Kernel can evict these pages preferentially */
}
 
void lock_critical_pages(void *addr, size_t length) {
    /* Wire pages in memory - no demand faults allowed */
    mlock(addr, length);
    
    /* For real-time: lock everything */
    mlockall(MCL_CURRENT | MCL_FUTURE);
}

OS Demand Paging Implementations
Operating System	Base Strategy	Key Optimizations
Linux	Demand paging	Read-ahead, fault-around, madvise hints
Windows	Demand paging	Prefetcher, SuperFetch, working set manager
macOS	Demand paging	Dynamic pager, compressed memory
FreeBSD	Demand paging	Clustered page fault handling
Real-time OS	Wired pages	No demand faults for determinism

madvise is Your Friend

Application developers can use madvise() to communicate access patterns to the kernel. This allows applications to approach pure demand paging behavior (MADV_RANDOM) when appropriate or request aggressive prefetching (MADV_SEQUENTIAL, MADV_WILLNEED) when access patterns are predictable.

Analysis and Trade-offs

Pure demand paging represents one extreme on a spectrum of paging strategies. Understanding its trade-offs helps us make informed decisions.

The Core Trade-off:

         Memory Efficiency ◄─────────────────────► Startup Performance
                ▲                                          ▲
                │                                          │
        Pure Demand                               Eager Loading
           Paging                                   + Prefetch

Each position on this spectrum has costs and benefits:

Trade-off Analysis
Strategy	Memory Use	Startup Time	Steady State	Predictability
Pure Demand	Optimal	Poor	Excellent	Variable
Moderate Prefetch	Good	Fair	Excellent	Good
Aggressive Prefetch	Fair	Good	Excellent	Good
Full Eager Load	Poor	Fair	Excellent	Excellent
Always Wired	Poor	Good	Excellent	Excellent

Decision Framework:

Choose more toward pure demand when:

Memory is severely constrained
Process lifetime is short
Working set is small relative to executable size
Storage is fast (NVMe SSD, RAM disk)
Multiple short-lived processes compete for memory

Choose more toward prefetching when:

Fast startup is critical for user experience
Access patterns are predictable
Working set is close to executable size
Storage is slow (HDD)
Long-running processes that will access most pages anyway

Choose wired pages when:

Real-time guarantees required
Latency variance is unacceptable
Failed page fault would be catastrophic

Profile Before Optimizing

Don't assume pure demand paging is the problem. Profile your application: measure major fault rate, working set size, and access patterns. The answer might be that demand paging is already optimal, or that simple madvise hints can solve your issues without architectural changes.

Summary: Pure Demand Paging

We've examined pure demand paging—the extreme lazy loading strategy. Let's consolidate the key insights:

Key Takeaways

•Pure demand paging loads pages only on fault, with no speculation or prefetching. Processes start with zero pages in memory.
•Memory efficiency is optimal—only pages that are actually accessed occupy physical memory.
•Startup performance suffers because every distinct page access causes a fault and potential disk I/O.
•Working set establishment causes a storm of initial faults before settling into steady state.
•I/O patterns are unfavorable (random, small) particularly for mechanical storage.
•Ideal scenarios include short-lived processes, sparse access patterns, and memory-constrained systems.
•Poor scenarios include complex startup, predictable sequential access, and real-time requirements.
•Real systems use modified demand paging with read-ahead, prefetching, and application hints.

What's Next:

Having explored the extreme of pure demand paging, we'll now examine its counterweight: prepaging. Prepaging attempts to preemptively load pages before they're needed, trading memory efficiency for reduced fault rates and faster initial performance. Understanding both extremes helps us appreciate the practical middle ground that real systems occupy.

Strategy Understood

You now understand pure demand paging as a theoretical baseline and its practical implications. This knowledge enables you to reason about memory system behavior, understand OS design choices, and make informed decisions about when lazy loading is beneficial versus when prefetching might be preferred.

4 / 5

Loading learning content...

Operating SystemsVirtual Memory

Demand Paging

LevelIntermediate

Duration75 mins

TopicVirtual Memory

4 / 5

Pure Demand Paging

The Extreme Lazy Approach

What You Will Learn

Definition and Characteristics

Formal Definition:

For all pages P in process address space:
  load(P) occurs if and only if
    ∃ instruction I where I accesses address A ∧ A ∈ P ∧ P ∉ RAM

In plain terms: a page is loaded if and only if an instruction tries to access an address within that page and the page isn't already in physical memory.

Characteristics of Pure Demand Paging:

Defining Characteristics

•Zero initial resident set: Process starts with no pages in memory whatsoever.
•Fault-driven loading: Every page is loaded as a result of a page fault.
•No speculation: The OS makes no predictions about future page needs.
•No read-ahead: Loading page N does NOT trigger loading of pages N+1, N+2, etc.
•Minimal memory footprint: At any time, only pages that have been explicitly accessed are present.
•Maximum fault count: Every distinct page access results in at least one fault.

Comparison: Pure vs. Modified Demand Paging

Aspect	Pure Demand Paging	Modified (Practical) Demand Paging
Initial pages	Zero	May preload entry point, libraries
Read-ahead	Never	When sequential access detected
Startup time	Slow (many faults)	Fast (fewer initial faults)
Memory efficiency	Optimal (only accessed pages)	Near-optimal (some speculation)
I/O scheduling	Many small reads	Fewer, larger reads
First instruction	Always faults	May not fault

Theoretical vs. Practical

Process Startup Under Pure Demand Paging

The most distinctive aspect of pure demand paging is how processes start. Let's trace through the startup sequence in detail.

The exec() Sequence:

When a process calls exec() to load a new program:

Kernel parses executable header - Determines code, data, BSS sections without loading them
Creates empty page table - All entries set to not-present
Sets up VMAs - Records which address ranges map to which file regions
Sets instruction pointer - Points to program entry (e.g., _start)
Returns to user mode - Execution 'begins' with zero pages loaded

The First Fault Cascade:

Time →

0 ms:   CPU jumps to entry point address (e.g., 0x401000)
        → Page fault #1: Code page containing entry point
        → Load page from executable file
        → ~10 ms for SSD, longer for HDD

10 ms:  First instruction executes (e.g., PUSH RBP)
        → Page fault #2: Stack page
        → Allocate zero-filled frame
        → ~1 ms (minor fault)

11 ms:  Next instruction accesses global data
        → Page fault #3: Data page
        → Load page from executable file
        → ~10 ms

21 ms:  Program calls a library function (printf)
        → Page fault #4: PLT/GOT page
        → ~10 ms

31 ms:  Library code executes
        → Page fault #5: libc code page
        → ~10 ms

...and so on for every new page touched

Example: Startup Faults for a Simple ProgramCounting page faults during initialization of a minimal C program

Input

Program: int main() { printf("Hello\n"); return 0; }

Output

Approximately 50-100 major page faults before output appears

Explanation

Breakdown of faults:

With pure demand paging, each of these is a separate page fault. On an SSD with 100μs random read latency, 100 faults = 10ms minimum startup time just for I/O—not including CPU processing.

Startup Performance Problem

Working Set Establishment

After the startup storm of page faults, the process enters a steadier state as its working set becomes established in memory. Pure demand paging has interesting implications for this phase.

Working Set Definition:

Working Set Building Phase:

Fault Rate Over Time (pure demand paging):

     Faults/sec
          ▲
     1000 │▓▓
          │▓▓▓▓
      500 │▓▓▓▓▓▓▓
          │▓▓▓▓▓▓▓▓▓▓
      100 │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
          │               ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓───────
       10 │                                                    Steady state
          │                                                    (new pages rare)
        0 └────────────────────────────────────────────────────────────▶
               1s    5s    10s         30s              Time
          ←───────────────────→ Working set establishment

Phase 1: High Fault Rate (0-10s)

Every new code path or data structure causes faults
Dominated by major faults (disk I/O)
Application feels sluggish

Phase 2: Decreasing Fault Rate (10-30s)

Core working set established
Fewer new pages touched
Major faults become rare

Phase 3: Steady State (30s+)

Working set stable in memory
Only occasional faults for rarely-used code
Application runs at near-native speed

Locality Determines Steady-State Behavior:

Programs with good locality stabilize quickly:

Text editors: Working set established in seconds
Compilers: Working set per compilation phase, then stable
Web servers: Working set per request type, frequently accessed

Programs with poor locality never stabilize:

Database scans: Touch different pages continuously
Virtual machines: Guest OS has unpredictable access patterns
Machine learning: Training data access may span huge datasets

The Locality Assumption

Memory Efficiency Analysis

Pure demand paging achieves optimal memory utilization in a specific sense: it never loads pages that aren't accessed. Let's analyze this efficiency in detail.

Memory Efficiency Theorem:

Under pure demand paging with no eviction (unlimited memory):

|Pages in Memory| = |Unique Pages Accessed|

This is optimal because any page loaded was definitively needed. No memory is wasted on speculative loads that turn out to be unnecessary.

Comparison with Eager Loading:

Memory Utilization: Pure Demand vs. Eager Loading
Scenario	Executable Size	Pages Accessed	Pure Demand	Eager Load
Simple CLI tool	1 MB	~100 KB	25 pages	250 pages
Office suite	500 MB	~50 MB	12,500 pages	125,000 pages
Web browser	200 MB	~30 MB	7,500 pages	50,000 pages
IDE	400 MB	~60 MB	15,000 pages	100,000 pages
Database server	100 MB	~80 MB (hot)	20,000 pages	25,000 pages

Memory Savings Calculation:

For a typical office suite:

Executable + libraries: 500 MB
Typical usage touches: 50 MB
Memory usage ratio: 50/500 = 10%

With eager loading: 500 MB per instance With pure demand: 50 MB per instance

If running 3 instances:

Eager: 1500 MB (with sharing: ~600 MB if code shared)
Pure demand: 150 MB (or less with code sharing)

The cost of this efficiency: Startup time and initial response latency.

When Memory Efficiency Matters Most:

Premium on Memory Efficiency

•Embedded systems: Limited RAM where every kilobyte matters
•Cloud instances: Paying per GB-hour of memory used
•Multi-tenant servers: Many processes competing for RAM
•Large address spaces: 64-bit processes with sparse usage
•Container density: Maximizing containers per host

Sharing Complicates the Picture

I/O Behavior and Throughput

Pure demand paging has distinctive I/O characteristics that significantly impact performance, particularly on mechanical storage.

I/O Pattern Under Pure Demand Paging:

Random access pattern: Pages may be loaded from anywhere in the file
Small transfer size: Each fault loads one page (typically 4 KB)
Serial dependencies: Cannot start loading next page until current instruction restarts
No aggregation: Each page is a separate I/O operation

Comparison of I/O Patterns:

I/O Characteristics by Loading Strategy
Metric	Pure Demand	With Read-Ahead	Eager Loading
Transfer size	4 KB	16-128 KB	Entire file
I/O operations	Many small	Fewer larger	Few very large
Access pattern	Random	Sequential+random	Sequential
HDD friendliness	Poor	Good	Excellent
SSD suitability	Acceptable	Good	Good
Total bytes read	Minimal	Exceeds accessed	Maximum

The HDD Problem:

Mechanical hard drives have a significant random access penalty:

HDD Performance:
- Sequential read: ~100-150 MB/s
- Random read (4 KB): ~100 IOPS → 400 KB/s

Loading 10 MB of random pages:
- Sequential: 10 MB / 100 MB/s = 100 ms
- Random 4 KB: 2500 IOPS / 100 IOPS = 25 seconds (!)

Pure demand paging on HDDs can be 250x slower than sequential loading for random access patterns. This is why read-ahead is so important for HDD-based systems.

SSD Improvements:

SSD Performance:
- Sequential read: ~500-3000 MB/s (NVMe)
- Random read (4 KB): ~50,000-500,000 IOPS → 200-2000 MB/s

Loading 10 MB of random pages:
- Random 4 KB: 2500 pages / 100,000 IOPS = 25 ms

SSDs make pure demand paging far more practical, but read-ahead still helps by exploiting the even higher sequential throughput.

Storage Technology Matters

When Pure Demand Paging Excels

Despite its drawbacks, there are scenarios where the pure demand paging approach (or something close to it) is optimal.

Scenario 1: Large Programs with Small Working Sets

Consider an IDE with 100 MB of code but only 10 MB active at any time:

Eager loading: 100 MB always occupied
Pure demand: 10 MB after warmup
Benefit: 90 MB saved for other processes

The key insight: if usage is sparse relative to size, demand paging wins.

Scenario 2: Rarely-Run Utilities

Command-line tool invoked once:

Eager loading:
  Load 5 MB          →  500 ms
  Execute               10 ms
  Exit                      -
  Total:                 510 ms (5 MB loaded, ~1 MB used)

Pure demand:
  Load ~1 MB pages  →  100 ms
  Execute               10 ms
  Exit                      -
  Total:                 110 ms (1 MB loaded)

For short-lived processes, pure demand paging can actually be faster because it loads less data.

Scenarios Favoring Pure Demand Paging

•Short-lived processes: Less total data loaded than with eager loading
•Memory-constrained systems: Maximum efficiency in RAM utilization
•Sparse access programs: Only small fraction of code/data ever touched
•Interactive applications: Immediate startup even if initial interactions are slow
•Fork-heavy workloads: Children with COW rarely touch all parent pages
•Fast storage (NVMe, RAM): I/O penalty minimized or eliminated

Scenario 3: Fork and Exec Patterns

Traditional Unix pattern:
  fork()   → Create child with copy of parent
  exec()   → Replace child with new program

With pure demand paging:
  fork():
    - No copying! Just create page table with COW
    - Instantaneous (no memory touched)
  
  exec():
    - Discard all COW references
    - Set up new page table (everything non-present)
    - Child's parent-inherited pages were never loaded

Pure demand paging + COW makes fork+exec extremely efficient. The child never pays for pages it will immediately discard.

The Fork-Exec Pattern

When Pure Demand Paging Struggles

There are equally important scenarios where pure demand paging performs poorly, and practical systems must use alternatives.

Scenario 1: Long-Running Startup-Intensive Applications

Web browser startup:

  Pure demand paging:
    1000+ page faults during initialization
    User stares at blank window for 5+ seconds
    Every click during warmup causes new faults
    
  With prefetching:
    Predict common startup pages
    Stream 50 MB during splash screen display
    Application responsive in 1-2 seconds

Scenario 2: Predictable Access Patterns

Sequential file processing:

  Pure demand paging:
    Read page 0: fault, load, process
    Read page 1: fault, load, process
    Read page 2: fault, load, process
    ... (one fault per page, I/O latency on each)
    
  With read-ahead:
    Read page 0: fault, load pages 0-15, process page 0
    Read page 1: already loaded, process immediately
    Read pages 2-15: already loaded
    Read page 16: fault, load pages 16-31
    ... (one fault per 16 pages)

Sequential access with read-ahead is an order of magnitude faster.

Scenarios Where Pure Demand Paging Fails

•Complex application startup: Thousands of faults create unacceptable delays
•Sequential file processing: Read-ahead would eliminate most faults
•Real-time systems: Cannot tolerate page fault latency
•Latency-sensitive servers: Each request might need different pages
•HDD-based systems: Random 4 KB reads are extremely slow
•Highly concurrent workloads: Faults serialize execution on each thread

Scenario 3: Real-Time Audio/Video Processing

Audio processing requirements:
- Buffer size: 128 samples
- Sample rate: 48 kHz
- Deadline: 128/48000 = 2.7 ms per buffer

Page fault during processing:
- SSD latency: 50-200 μs (acceptable... barely)
- HDD latency: 5-15 ms (BUFFER UNDERRUN! Audio glitch!)

Solution: Wire pages in memory (mlock) to prevent any faults

Real-time systems cannot tolerate the unpredictability of demand faults.

Latency Variance is the Enemy

Pure Demand Paging in Real Systems

No production operating system uses pure demand paging exactly as described. However, understanding the pure model helps us appreciate what optimizations real systems add.

Linux Approach:

Linux uses demand paging as the foundation but adds significant optimizations:

Read-ahead: When a page fault occurs, the fault handler may initiate async loading of adjacent pages
Fault-around: Map additional pages that are already in page cache, even if not faulted
madvise hints: Applications can request specific prefetching behavior
Page cache prefetch: Background readahead based on detected patterns

Windows Approach:

Windows uses a more aggressive prefetching strategy:

Prefetcher: Monitors access patterns, preloads at application startup
SuperFetch: Machine learning to predict and preload pages
Working set estimation: Proactive working set establishment
Boot optimization: Prefetch files accessed during boot

linux_madvise_hints.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/* Using madvise to guide demand paging behavior */
 
#include <sys/mman.h>
 
void process_file(void *mapped, size_t length) {
    /* Tell kernel we'll access sequentially */
    madvise(mapped, length, MADV_SEQUENTIAL);
    /* Kernel will use aggressive read-ahead */
    
    /* Or: tell kernel we'll access randomly */
    madvise(mapped, length, MADV_RANDOM);
    /* Kernel disables read-ahead (closer to pure demand) */
    
    /* Request immediate prefetch */
    madvise(mapped, length, MADV_WILLNEED);
    /* Kernel initiates background loading */
    
    /* Signal we're done with this region */
    madvise(mapped, length, MADV_DONTNEED);
    /* Kernel can evict these pages preferentially */
}
 
void lock_critical_pages(void *addr, size_t length) {
    /* Wire pages in memory - no demand faults allowed */
    mlock(addr, length);
    
    /* For real-time: lock everything */
    mlockall(MCL_CURRENT | MCL_FUTURE);
}

OS Demand Paging Implementations
Operating System	Base Strategy	Key Optimizations
Linux	Demand paging	Read-ahead, fault-around, madvise hints
Windows	Demand paging	Prefetcher, SuperFetch, working set manager
macOS	Demand paging	Dynamic pager, compressed memory
FreeBSD	Demand paging	Clustered page fault handling
Real-time OS	Wired pages	No demand faults for determinism

madvise is Your Friend

Analysis and Trade-offs

Pure demand paging represents one extreme on a spectrum of paging strategies. Understanding its trade-offs helps us make informed decisions.

The Core Trade-off:

         Memory Efficiency ◄─────────────────────► Startup Performance
                ▲                                          ▲
                │                                          │
        Pure Demand                               Eager Loading
           Paging                                   + Prefetch

Each position on this spectrum has costs and benefits:

Trade-off Analysis
Strategy	Memory Use	Startup Time	Steady State	Predictability
Pure Demand	Optimal	Poor	Excellent	Variable
Moderate Prefetch	Good	Fair	Excellent	Good
Aggressive Prefetch	Fair	Good	Excellent	Good
Full Eager Load	Poor	Fair	Excellent	Excellent
Always Wired	Poor	Good	Excellent	Excellent

Decision Framework:

Choose more toward pure demand when:

Memory is severely constrained
Process lifetime is short
Working set is small relative to executable size
Storage is fast (NVMe SSD, RAM disk)
Multiple short-lived processes compete for memory

Choose more toward prefetching when:

Fast startup is critical for user experience
Access patterns are predictable
Working set is close to executable size
Storage is slow (HDD)
Long-running processes that will access most pages anyway

Choose wired pages when:

Real-time guarantees required
Latency variance is unacceptable
Failed page fault would be catastrophic

Profile Before Optimizing

Summary: Pure Demand Paging

We've examined pure demand paging—the extreme lazy loading strategy. Let's consolidate the key insights:

Key Takeaways

•Pure demand paging loads pages only on fault, with no speculation or prefetching. Processes start with zero pages in memory.
•Memory efficiency is optimal—only pages that are actually accessed occupy physical memory.
•Startup performance suffers because every distinct page access causes a fault and potential disk I/O.
•Working set establishment causes a storm of initial faults before settling into steady state.
•I/O patterns are unfavorable (random, small) particularly for mechanical storage.
•Ideal scenarios include short-lived processes, sparse access patterns, and memory-constrained systems.
•Poor scenarios include complex startup, predictable sequential access, and real-time requirements.
•Real systems use modified demand paging with read-ahead, prefetching, and application hints.

What's Next:

Strategy Understood

4 / 5