Multi Level Paging - Learning Module

Loading content...

0/227

Two-Level Paging

The Page Table Size Problem

Paging elegantly solves the external fragmentation problem of contiguous allocation. By dividing both virtual and physical memory into fixed-size pages and frames, the operating system can place any page in any available frame, achieving flexible, non-contiguous memory allocation.

But as virtual address spaces grew from 16 bits to 32 bits to 64 bits, a fundamental problem emerged: page tables became unmanageably large.

Consider a straightforward 32-bit system with 4KB pages. The virtual address space spans 2³² bytes (4GB), divided into 2³²/2¹² = 2²⁰ pages—over one million pages. Each page requires a page table entry (PTE), typically 4 bytes. That means every process needs a 4MB page table, even if the process only uses a tiny fraction of its address space. With hundreds or thousands of processes, page tables alone could consume all available memory.

The Memory Overhead Crisis

A single-level page table for a 32-bit address space with 4KB pages requires 4MB per process. If 100 processes run simultaneously, page tables alone consume 400MB—a significant fraction of physical memory on many systems. For 64-bit systems, the situation is exponentially worse: a flat page table would require petabytes of memory.

The insight that changed everything:

Most processes use only a small portion of their virtual address space. A typical application might use:

A code segment at low addresses
A heap growing upward from a moderate address
A stack growing downward from high addresses
A vast unused region in between

Allocating page table entries for the unused region wastes memory. The solution? Make the page table itself hierarchical. Instead of a flat array with an entry for every possible page, we create a tree of smaller tables, allocating only the branches we actually need. This is the essence of multi-level paging.

What You Will Learn

By the end of this page, you will understand the two-level paging scheme: how the virtual address is decomposed, how the outer and inner page tables work together, how address translation proceeds through the hierarchy, and the precise memory savings this structure achieves.

Single-Level Page Table: The Baseline

Before diving into two-level paging, let's precisely understand what a single-level (flat) page table looks like and why it becomes problematic.

Structure of a flat page table:

In a single-level paging system, the page table is a contiguous array of page table entries (PTEs). Each entry maps one virtual page to a physical frame. The virtual page number (VPN), extracted from the virtual address, directly indexes into this array.

flat_page_table.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Single-Level (Flat) Page Table Structure
// 32-bit virtual address, 4KB pages
 
#define PAGE_SIZE       4096        // 4KB = 2^12 bytes
#define PAGE_SHIFT      12          // log2(PAGE_SIZE)
#define NUM_PAGES       (1 << 20)   // 2^20 = 1,048,576 pages
 
// Page Table Entry (PTE) - 32 bits
typedef struct {
    uint32_t present     : 1;   // Page is in physical memory
    uint32_t writable    : 1;   // Page is writable
    uint32_t user        : 1;   // Page accessible from user mode
    uint32_t accessed    : 1;   // Page has been read
    uint32_t dirty       : 1;   // Page has been written
    uint32_t unused      : 7;   // Available for OS use
    uint32_t frame_number: 20;  // Physical frame number
} __attribute__((packed)) PTE;
 
// The entire page table: 4MB contiguous array
PTE page_table[NUM_PAGES];  // 1,048,576 entries × 4 bytes = 4MB
 
// Address translation
uint32_t translate(uint32_t virtual_addr) {
    // Extract virtual page number (top 20 bits)
    uint32_t vpn = virtual_addr >> PAGE_SHIFT;
    
    // Extract page offset (bottom 12 bits)
    uint32_t offset = virtual_addr & (PAGE_SIZE - 1);
    
    // Look up the VPN in the page table
    PTE entry = page_table[vpn];
    
    if (!entry.present) {
        // Page fault - page not in physical memory
        trigger_page_fault(virtual_addr);
    }
    
    // Combine frame number with offset
    uint32_t physical_addr = (entry.frame_number << PAGE_SHIFT) | offset;
    return physical_addr;
}

The problem visualized:

Imagine a simple process with 10MB of code, 5MB of heap, and 1MB of stack. Its virtual address space layout might look like:

Virtual Address Space (4GB total):
┌─────────────────────────────────────────┐ 0xFFFFFFFF (4GB)
│                                         │
│              Stack (1MB)                │
│                                         │
├─────────────────────────────────────────┤ ~0xFFF00000
│                                         │
│                                         │
│                                         │
│        UNUSED REGION (~3.98GB)          │
│                                         │
│                                         │
│                                         │
├─────────────────────────────────────────┤ ~0x00A00000  
│               Heap (5MB)                │
├─────────────────────────────────────────┤ ~0x00500000
│               Code (10MB)               │
├─────────────────────────────────────────┤ ~0x00100000
│           Reserved/OS (1MB)             │
└─────────────────────────────────────────┘ 0x00000000

The process uses only about 17MB of its 4GB address space—less than 0.5%. Yet a flat page table allocates entries for all 4GB. We're storing 4 million entries to describe fewer than 5,000 used pages.

Flat Page Table Memory Overhead
Address Space	Page Size	Number of PTEs	PTE Size	Page Table Size
32-bit (4GB)	4KB	2²⁰ = 1M	4 bytes	4MB per process
32-bit (4GB)	8KB	2¹⁹ = 512K	4 bytes	2MB per process
48-bit (256TB)	4KB	2³⁶ = 64G	8 bytes	512GB per process
64-bit (16EB)	4KB	2⁵² = 4P	8 bytes	32PB per process

Infeasibility at Scale

A 64-bit flat page table would require 32 petabytes per process—more storage than exists on Earth. Even for 32-bit systems, dedicating 4MB per process becomes prohibitive when running hundreds of processes. We need a fundamentally different approach.

The Two-Level Solution

Two-level paging (also called hierarchical paging or multi-level paging) addresses the page table size problem by breaking the flat table into a tree structure with two levels:

Outer Page Table (Page Directory): A single table that contains pointers to inner page tables. Each entry points to an inner table or is marked invalid.
Inner Page Tables (Page Tables): Multiple smaller tables, each covering a portion of the address space. Only the inner tables needed for the process's actual memory regions are allocated.

The key insight:

If a region of the virtual address space is unused, we don't allocate an inner page table for it. The outer table entry for that region is simply marked invalid. This transforms the space requirement from O(address space size) to O(used memory size).

two_level_structure.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Two-Level Page Table Structure
// 32-bit virtual address, 4KB pages
 
#define PAGE_SIZE           4096        // 4KB
#define PAGE_SHIFT          12
#define ENTRIES_PER_TABLE   1024        // 2^10 entries per table
#define OUTER_SHIFT         22          // Bits 22-31 for outer index
#define INNER_SHIFT         12          // Bits 12-21 for inner index
 
// Page Directory Entry (PDE) - points to inner page table
typedef struct {
    uint32_t present     : 1;   // Inner table is allocated
    uint32_t writable    : 1;   // Region is writable
    uint32_t user        : 1;   // Accessible from user mode
    uint32_t unused      : 9;   // Available bits
    uint32_t table_addr  : 20;  // Physical address of inner table (aligned)
} __attribute__((packed)) PDE;
 
// Page Table Entry (PTE) - maps to physical frame
typedef struct {
    uint32_t present     : 1;   // Page is in memory
    uint32_t writable    : 1;   // Page is writable  
    uint32_t user        : 1;   // Accessible from user mode
    uint32_t accessed    : 1;   // Page has been accessed
    uint32_t dirty       : 1;   // Page has been modified
    uint32_t unused      : 7;   // Available bits
    uint32_t frame_number: 20;  // Physical frame number
} __attribute__((packed)) PTE;
 
// Each inner page table is one page (4KB)
typedef PTE InnerPageTable[ENTRIES_PER_TABLE];  // 1024 × 4 = 4KB
 
// The outer page table (page directory) is also one page
typedef PDE OuterPageTable[ENTRIES_PER_TABLE];  // 1024 × 4 = 4KB
 
// Per-process page directory
OuterPageTable* cr3;  // CR3 register points to page directory

The elegance of page-sized tables:

Notice that both the outer and inner tables are exactly 4KB—the same as a page. This is not coincidental:

Allocation simplicity: The OS allocates page tables using the same mechanism it uses for regular pages.
Alignment: Tables are naturally page-aligned, simplifying hardware address calculations.
Swapping: Page tables themselves can be swapped to disk if memory pressure is extreme.

With 1024 entries in the outer table, and each inner table covering 1024 pages × 4KB = 4MB, we get:

Outer table coverage: 1024 inner tables × 4MB each = 4GB ✓

The entire 32-bit address space is covered by a 4KB outer table plus any needed 4KB inner tables.

Why 10-10-12 Split?

The 32-bit address is divided into three fields: 10 bits for outer index, 10 bits for inner index, and 12 bits for page offset. This gives 2^10 = 1024 entries in each level, and 2^12 = 4096 bytes per page. The 10-10 split is optimal because it makes both table levels exactly one page in size.

Virtual Address Decomposition

In two-level paging, the virtual address is split into three distinct fields, each serving a specific purpose in the translation process:

┌────────────────────────────────────────────────────────────────┐
│                    32-bit Virtual Address                      │
├─────────────┬─────────────┬────────────────────────────────────┤
│   Bits 31-22│   Bits 21-12│           Bits 11-0                │
│   (10 bits) │   (10 bits) │           (12 bits)                │
├─────────────┼─────────────┼────────────────────────────────────┤
│    P1       │      P2     │            D (Offset)              │
│ (Outer idx) │ (Inner idx) │         (Page Offset)              │
└─────────────┴─────────────┴────────────────────────────────────┘

Field breakdown:

P1 (Outer Index / Directory Index): Selects one of 1024 entries in the page directory. This entry points to an inner page table.
P2 (Inner Index / Page Table Index): Selects one of 1024 entries in the inner page table. This entry contains the physical frame number.
D (Offset): The position within the 4KB page. This is copied directly to the physical address without translation.

address_decomposition.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Extracting address components from a 32-bit virtual address
 
#define OFFSET_MASK     0x00000FFF  // Bits 0-11: page offset
#define P2_MASK         0x003FF000  // Bits 12-21: inner table index
#define P1_MASK         0xFFC00000  // Bits 22-31: outer table index
 
#define PAGE_SHIFT      12
#define P2_SHIFT        12
#define P1_SHIFT        22
 
// Extract page offset (12 bits)
static inline uint32_t get_offset(uint32_t vaddr) {
    return vaddr & OFFSET_MASK;           // Mask: 0x00000FFF
}
 
// Extract inner table index (10 bits)  
static inline uint32_t get_p2(uint32_t vaddr) {
    return (vaddr >> P2_SHIFT) & 0x3FF;   // Shift right 12, mask 10 bits
}
 
// Extract outer table index (10 bits)
static inline uint32_t get_p1(uint32_t vaddr) {
    return (vaddr >> P1_SHIFT) & 0x3FF;   // Shift right 22, mask 10 bits
}
 
// Example: Virtual address 0x12345678
// Binary: 0001 0010 0011 0100 0101 0110 0111 1000
//         ├─────────┤├─────────┤├────────────────┤
//         P1 (10b)   P2 (10b)    Offset (12b)
//
// P1 = 0001001000 = 72  (outer index)
// P2 = 1101000101 = 837 (inner index)  
// D  = 011001111000 = 1656 (offset within page)
 
void decompose_example() {
    uint32_t vaddr = 0x12345678;
    
    printf("Virtual Address: 0x%08X\n", vaddr);
    printf("P1 (Directory Index): %u\n", get_p1(vaddr));
    printf("P2 (Table Index): %u\n", get_p2(vaddr));
    printf("Offset: %u\n", get_offset(vaddr));
}

Virtual Address Field Purposes
Field	Bits	Range	Purpose
P1 (Outer)	22-31	0-1023	Index into page directory to find inner table pointer
P2 (Inner)	12-21	0-1023	Index into inner page table to find frame number
D (Offset)	0-11	0-4095	Byte position within the 4KB page (unchanged)

Visualizing the decomposition:

Consider accessing memory at virtual address 0xC0123456. Let's decompose this address:

0xC0123456 = 1100 0000 0001 0010 0011 0100 0101 0110
              └───────────┘└───────────┘└───────────────┘
                   P1           P2           Offset
              = 768         = 291         = 1110
              (0x300)       (0x123)       (0x456)

This address refers to:

The 768th entry in the page directory
Which points to an inner page table
Where the 291st entry gives the frame number
And byte 1110 within that frame

The hierarchical lookup transforms a single flat array access into two sequential table lookups, but enables dramatic memory savings.

Step-by-Step Address Translation

The complete translation process in two-level paging involves accessing two separate tables in sequence. Let's trace through the entire process:

Two-Level Translation Steps

•Extract address components — Parse the virtual address into P1 (outer index), P2 (inner index), and D (offset).
•Access the page directory — Use the CR3 register to find the page directory base, then read entry PageDirectory[P1].
•Check outer entry validity — If present bit is 0, no inner table exists for this region → page fault.
•Locate inner page table — Extract the physical address of the inner page table from the directory entry.
•Access inner page table — Read entry InnerTable[P2] from the inner table.
•Check inner entry validity — If present bit is 0, the specific page is not mapped → page fault.
•Extract frame number — Get the physical frame number from the inner table entry.
•Form physical address — Concatenate frame number with offset D to create the physical address.

two_level_translation.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
// Complete Two-Level Address Translation
 
typedef uint32_t* PageDirectory;
typedef uint32_t* PageTable;
 
#define PAGE_SHIFT      12
#define PAGE_SIZE       (1 << PAGE_SHIFT)
#define ENTRY_MASK      0x3FF        // 10-bit mask for P1/P2
 
#define PDE_PRESENT     0x001        // Directory entry present
#define PTE_PRESENT     0x001        // Page table entry present
#define PDE_ADDR_MASK   0xFFFFF000   // Physical address of inner table
#define PTE_FRAME_MASK  0xFFFFF000   // Physical frame number
 
// Hardware CR3 register (page directory base)
extern PageDirectory cr3;
 
// Translate virtual address to physical address
// Returns physical address, or triggers page fault
uint32_t translate_two_level(uint32_t virtual_address) {
    
    // STEP 1: Decompose virtual address into components
    uint32_t p1 = (virtual_address >> 22) & ENTRY_MASK;     // Outer index
    uint32_t p2 = (virtual_address >> 12) & ENTRY_MASK;     // Inner index  
    uint32_t offset = virtual_address & (PAGE_SIZE - 1);     // Page offset
    
    // STEP 2: Read page directory entry (first memory access)
    uint32_t pde = cr3[p1];
    
    // STEP 3: Check if inner page table exists
    if (!(pde & PDE_PRESENT)) {
        // Inner table not allocated - page fault
        // OS will either allocate page table or terminate process
        page_fault(virtual_address, FAULT_NOT_PRESENT);
        return 0;  // Not reached
    }
    
    // STEP 4: Extract physical address of inner page table
    uint32_t inner_table_phys = pde & PDE_ADDR_MASK;
    PageTable inner_table = physical_to_virtual(inner_table_phys);
    
    // STEP 5: Read page table entry (second memory access)
    uint32_t pte = inner_table[p2];
    
    // STEP 6: Check if page is present
    if (!(pte & PTE_PRESENT)) {
        // Page not in memory - page fault
        // OS will load from disk or allocate new frame
        page_fault(virtual_address, FAULT_PAGE_NOT_PRESENT);
        return 0;  // Not reached
    }
    
    // Additional checks (simplified)
    check_protection(pte, access_type);
    update_accessed_bit(&inner_table[p2]);
    
    // STEP 7: Extract frame number from PTE
    uint32_t frame_number = pte & PTE_FRAME_MASK;
    
    // STEP 8: Form physical address
    uint32_t physical_address = frame_number | offset;
    
    return physical_address;
}

Memory access overhead:

The critical observation is that two-level paging requires two memory accesses to translate a single virtual address (compared to one for a flat page table):

Read from the page directory
Read from the inner page table

Only after both reads can we access the actual data at a third memory location. This 3× memory access overhead would be devastating for performance if not mitigated by the Translation Lookaside Buffer (TLB), which caches recently used translations.

Performance Without TLB

Without TLB caching, every memory access would require three physical memory reads. With 100ns RAM latency, a single memory access would take 300ns—a 3× slowdown. The TLB is essential for making multi-level paging viable. We cover TLB mechanics in detail in Module 4.

Quantifying Memory Savings

Let's rigorously analyze the memory savings achieved by two-level paging for realistic workloads.

Baseline: Flat Page Table

Address space: 4GB (32-bit)
Page size: 4KB
PTE size: 4 bytes
Total PTEs: 2²⁰ = 1,048,576
Page table size: 4MB per process

Two-Level Page Table: Sparse Process

Consider a process using:

10MB of code (starting at 0x00100000)
5MB of heap (starting at 0x00A00000)
1MB of stack (ending at 0xFFFFFFFF)

Total memory used: 16MB = 4,096 pages

memory_analysis.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Two-Level Paging Memory Analysis
 
# System parameters
ADDRESS_BITS = 32
PAGE_SIZE = 4096           # 4KB pages
PDE_SIZE = 4               # 4 bytes per directory entry
PTE_SIZE = 4               # 4 bytes per page table entry
ENTRIES_PER_TABLE = 1024   # 2^10 entries
 
# Each inner table covers 4MB of virtual address space
INNER_TABLE_COVERAGE = ENTRIES_PER_TABLE * PAGE_SIZE  # 4MB
 
def calculate_two_level_overhead(memory_regions):
    """
    Calculate page table memory for given memory regions.
    Each region is (start_addr, size_bytes).
    """
    # Always need the page directory
    outer_table_size = ENTRIES_PER_TABLE * PDE_SIZE  # 4KB
    
    # Count which inner tables are needed
    inner_tables_needed = set()
    
    for start_addr, size_bytes in memory_regions:
        end_addr = start_addr + size_bytes - 1
        
        # Calculate which 4MB regions are touched
        start_p1 = start_addr // INNER_TABLE_COVERAGE
        end_p1 = end_addr // INNER_TABLE_COVERAGE
        
        for p1 in range(start_p1, end_p1 + 1):
            inner_tables_needed.add(p1)
    
    inner_tables_size = len(inner_tables_needed) * ENTRIES_PER_TABLE * PTE_SIZE
    
    total_overhead = outer_table_size + inner_tables_size
    return outer_table_size, len(inner_tables_needed), inner_tables_size, total_overhead
 
 
# Example: Typical process layout
regions = [
    (0x00100000, 10 * 1024 * 1024),   # 10MB code
    (0x00A00000, 5 * 1024 * 1024),    # 5MB heap
    (0xFFF00000, 1 * 1024 * 1024),    # 1MB stack
]
 
directory, count, inner, total = calculate_two_level_overhead(regions)
 
print("Two-Level Page Table Analysis")
print("=" * 40)
print(f"Process memory used: 16MB")
print(f"Page directory: {directory:,} bytes (4KB)")
print(f"Inner tables needed: {count}")
print(f"Inner tables total: {inner:,} bytes ({inner // 1024}KB)")
print(f"Total overhead: {total:,} bytes ({total // 1024}KB)")
print()
print(f"Flat table would need: 4,194,304 bytes (4MB)")
print(f"Savings: {(4194304 - total):,} bytes ({(1 - total/4194304)*100:.1f}%)")

Memory Savings Comparison
Memory Layout	Flat Table	Two-Level Tables	Savings
16MB sparse (code/heap/stack)	4MB	~20KB (1 dir + 5 inner)	99.5%
100MB contiguous	4MB	~104KB (1 dir + 25 inner)	97.5%
512MB contiguous	4MB	~516KB (1 dir + 128 inner)	87.3%
2GB contiguous	4MB	~2.05MB (1 dir + 512 inner)	48.8%
4GB fully mapped	4MB	~4.01MB (all tables)	-0.2%

Key observations:

Sparse processes save dramatically: The typical case—small code, small heap, small stack—achieves 99%+ savings.
Savings scale with sparsity: The more unused virtual address space, the greater the benefit.
Dense usage reduces benefit: If a process uses most of its address space, two-level paging adds slight overhead due to the extra directory.
The worst case is rare: Few processes actually use 4GB of virtual address space.

The critical insight:

Two-level paging trades time for space. We add an extra memory access during translation, but we can dramatically reduce the total memory devoted to page tables. This trade-off is overwhelmingly favorable because:

The TLB hides most translation latency
Memory savings are substantial for typical workloads
The alternative (flat tables) is simply infeasible for large address spaces

Real-World Impact

On a system running 500 processes, each using 20MB of RAM: Flat tables would consume 2GB just for page tables. Two-level tables might use only 15-20MB total—a 100× reduction. This is why multi-level paging is universally adopted in modern systems.

Page Directory Entry Details

The Page Directory Entry (PDE) in a two-level system serves as a pointer to an inner page table. Its structure must encode both the physical location of the inner table and control information for the entire 4MB region that table covers.

pde_structure.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// Page Directory Entry (PDE) - 32-bit x86 Format
//
// Bit layout:
// ┌───┬─┬─┬─┬───┬─┬─┬─┬─┬─┬─┬─────────────────────────┐
// │31              12│11  9│8│7│6│5│4│3│2│1│0│ Bit   │
// ├───────────────────┼─────┼─┼─┼─┼─┼─┼─┼─┼─┼─┤       │
// │ Page Table Addr   │Avail│G│S│0│A│D│W│U│W│P│       │
// │ (Frame Number)    │     │ │ │ │ │ │T│S│ │ │       │
// └───────────────────┴─────┴─┴─┴─┴─┴─┴─┴─┴─┴─┘       │
 
typedef struct {
    union {
        uint32_t raw;
        struct {
            uint32_t present        : 1;  // P: Entry is valid
            uint32_t read_write     : 1;  // R/W: 0=read-only, 1=read-write
            uint32_t user_supervisor: 1;  // U/S: 0=supervisor only, 1=user accessible
            uint32_t write_through  : 1;  // PWT: Write-through caching
            uint32_t cache_disable  : 1;  // PCD: Page caching disabled
            uint32_t accessed       : 1;  // A: Entry has been used for translation
            uint32_t reserved       : 1;  // (Reserved, must be 0)
            uint32_t page_size      : 1;  // PS: 0=4KB pages, 1=4MB page (PSE)
            uint32_t global         : 1;  // G: Global entry (TLB behavior)
            uint32_t available      : 3;  // Available for OS use
            uint32_t table_address  : 20; // Physical frame number of inner table
        } fields;
    };
} PDE;
 
// PDE operations
#define PDE_PRESENT      0x001
#define PDE_WRITE        0x002
#define PDE_USER         0x004
#define PDE_ACCESSED     0x020
#define PDE_LARGE_PAGE   0x080
#define PDE_ADDR_MASK    0xFFFFF000
 
// Create a PDE pointing to an inner page table
PDE make_pde(uint32_t table_phys_addr, uint32_t flags) {
    PDE pde;
    // Table address must be 4KB aligned
    pde.raw = (table_phys_addr & PDE_ADDR_MASK) | (flags & 0xFFF);
    return pde;
}
 
// Get physical address of inner page table from PDE
uint32_t pde_get_table_addr(PDE pde) {
    return pde.raw & PDE_ADDR_MASK;
}
 
// Example: Create a present, user-accessible, writable PDE
// Inner page table at physical address 0x00345000
void example_pde() {
    uint32_t flags = PDE_PRESENT | PDE_WRITE | PDE_USER;
    PDE pde = make_pde(0x00345000, flags);
    // pde.raw = 0x00345007
}

PDE Bit Fields Explained

•Present (P) — If set, the inner page table exists and is mapped. If clear, accessing any address in this 4MB region triggers a page fault.
•Read/Write (R/W) — Controls write permission for the entire 4MB region. Inner PTEs can further restrict but not expand this permission.
•User/Supervisor (U/S) — Determines if user-mode code can access this region. Supervisor (kernel) can always access.
•Write-Through (PWT) — Controls caching behavior: write-through vs. write-back for the inner table.
•Cache Disable (PCD) — Disables caching for the inner page table access. Used for memory-mapped I/O regions.
•Accessed (A) — Set by hardware when any address in this 4MB region is translated. Used by OS for page table management.
•Page Size (PS) — When set (with PSE enabled), this PDE directly maps a 4MB large page instead of pointing to an inner table.
•Table Address — Physical frame number of the inner page table. Bits 12-31 directly give the address (tables are 4KB-aligned).

Protection Inheritance

Protection bits in the PDE apply to the entire 4MB region. Inner PTEs can only further restrict access—never grant more. For example, if the PDE is read-only, no PTE in that inner table can make a page writable. This hierarchical protection simplifies permission management.

Efficient Handling of Sparse Address Spaces

The true power of two-level paging emerges in handling sparse address spaces—where a process uses small, scattered regions of a large virtual address space.

How sparse mappings work:

When a process is created, the only page tables allocated are:

The page directory (always required)
Inner tables for regions containing code, data, heap, and stack

Vast regions of unused address space require no memory at all—only an invalid entry in the directory.

sparse_allocation.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
// Managing Sparse Address Space with Two-Level Paging
 
// Process creation: minimal page table allocation
void create_process_page_tables(Process* proc) {
    // Allocate ONLY the page directory (4KB)
    proc->page_directory = allocate_physical_page();
    memset(proc->page_directory, 0, PAGE_SIZE);  // All entries invalid
    
    // No inner tables allocated yet!
    // Total initial overhead: 4KB (just the directory)
}
 
// Map a virtual address on demand
int map_page(Process* proc, uint32_t vaddr, uint32_t paddr, uint32_t flags) {
    uint32_t p1 = (vaddr >> 22) & 0x3FF;
    uint32_t p2 = (vaddr >> 12) & 0x3FF;
    
    PDE* pde = &proc->page_directory[p1];
    
    // Check if inner page table exists
    if (!pde->present) {
        // Allocate new inner table on demand
        uint32_t new_table = allocate_physical_page();
        memset(phys_to_virt(new_table), 0, PAGE_SIZE);
        
        // Point directory entry to new table
        pde->raw = new_table | PDE_PRESENT | PDE_WRITE | PDE_USER;
        
        // Now we've allocated 8KB total (directory + 1 inner table)
    }
    
    // Get inner table
    PTE* inner_table = phys_to_virt(pde_get_table_addr(*pde));
    
    // Create mapping
    inner_table[p2] = (paddr & 0xFFFFF000) | flags | PTE_PRESENT;
    
    // Invalidate TLB entry for this address
    invlpg(vaddr);
    
    return 0;
}
 
// Example: Load segments of an executable
void load_executable(Process* proc, ELF_Header* elf) {
    // Code segment at 0x00400000 (4MB-aligned for simplicity)
    // Uses directory entry 1, allocates 1 inner table
    for (int i = 0; i < code_pages; i++) {
        map_page(proc, 0x00400000 + i * PAGE_SIZE, 
                 alloc_frame(), PTE_USER);  // Read-only code
    }
    
    // Data segment at 0x00800000
    // Uses directory entry 2, allocates 1 inner table
    for (int i = 0; i < data_pages; i++) {
        map_page(proc, 0x00800000 + i * PAGE_SIZE,
                 alloc_frame(), PTE_USER | PTE_WRITE);
    }
    
    // Stack at 0xBFFFFFFF (grows down)
    // Uses directory entry 767, allocates 1 inner table
    for (int i = 0; i < INITIAL_STACK_PAGES; i++) {
        map_page(proc, 0xC0000000 - (i + 1) * PAGE_SIZE,
                 alloc_frame(), PTE_USER | PTE_WRITE);
    }
    
    // Total: 4KB directory + 3 * 4KB inner tables = 16KB
    // Flat table would be 4MB regardless of actual memory used
}

Dynamic growth with lazy allocation:

The heap and stack grow dynamically. With two-level paging:

Heap growth — When malloc() exhausts the current heap and calls sbrk() or mmap(), the kernel maps new pages. If the new pages fall in an already-allocated inner table, only new PTEs are filled. If they cross into a new 4MB region, a new inner table is allocated.
Stack growth — When the stack pointer moves below currently-mapped pages, a page fault occurs. The kernel allocates a new frame and creates a PTE. New inner tables are allocated only when the stack grows into a new 4MB region.

This lazy allocation strategy minimizes memory overhead by allocating page table structures only when they're actually needed.

Memory-Mapped Files and Shared Libraries

Memory-mapped files and shared libraries often map at high virtual addresses (e.g., 0x7F000000 on some systems). Each shared library might touch a different 4MB region, requiring its own inner page table. However, these tables can be shared between processes mapping the same library at the same address, further reducing per-process overhead.

Summary: Two-Level Paging

Two-level paging is the foundational hierarchical page table design that makes large virtual address spaces practical. Let's consolidate the key concepts:

Key Takeaways

•Flat page tables don't scale — A 32-bit flat page table requires 4MB per process; 64-bit addresses make flat tables physically impossible.
•Two-level paging creates a tree — An outer directory points to inner page tables. Only inner tables for used regions are allocated.
•Address decomposition drives lookup — The virtual address is split into outer index (10 bits), inner index (10 bits), and offset (12 bits).
•Translation requires two memory accesses — One to read the directory, another to read the inner table. The TLB is essential for performance.
•Memory savings are dramatic for sparse mappings — Typical processes save 99%+ compared to flat tables.
•Lazy allocation minimizes waste — Inner tables are allocated on-demand as processes touch new regions of their address space.
•All tables are page-sized — Both the directory and inner tables are exactly 4KB, simplifying memory management.

What's next:

Two-level paging suffices for 32-bit systems, but 64-bit address spaces are vastly larger. Even with two-level paging, a 64-bit system would need an enormous outer table. The solution? Add more levels. In the next page, we'll explore three-level paging and understand how modern 64-bit systems use four or even five levels to manage their immense address spaces.

Page Complete

You now understand two-level paging: the motivation for hierarchical page tables, the structure of the page directory and inner tables, the complete translation process, and the memory savings achieved. Next, we'll extend these concepts to three-level paging for larger address spaces.