Operating SystemsMulti-Level Paging

Multi-Level Paging

LevelIntermediate

Duration75 mins

TopicMulti-Level Paging

2 / 5

Three-Level Paging

Beyond 32 Bits: The Need for Deeper Hierarchies

Two-level paging elegantly solves the page table size problem for 32-bit systems. With a 10-10-12 address split, we achieve page-sized tables at both levels and dramatic memory savings for sparse address spaces.

But computing has evolved. Modern systems use 64-bit addresses, providing a theoretical address space of 2⁶⁴ bytes—16 exabytes. Even though current implementations typically use only 48 or 57 bits of this space, the scale difference is staggering:

32-bit: 4GB address space → 2²⁰ pages → 4MB flat page table
48-bit: 256TB address space → 2³⁶ pages → 256GB flat page table
64-bit: 16EB address space → 2⁵² pages → 32PB flat page table

Two-level paging cannot handle these address spaces efficiently. The outer table alone would require billions of entries. We need deeper hierarchies—more levels in our page table tree to keep each level manageable.

What You Will Learn

This page explores three-level paging as the conceptual bridge between two-level and modern four/five-level systems. You'll understand when three levels become necessary, how the address is decomposed across three indices, the translation process through three tables, and why modern systems extended to four or more levels.

Why Two Levels Become Insufficient

To understand why we need three levels (and beyond), let's analyze what happens when we apply two-level paging to larger address spaces.

The scaling problem:

In a two-level system with 4KB pages, we split the virtual address into:

Page offset: 12 bits (fixed by page size)
Remaining bits split between outer and inner indices

For the inner table to be page-sized (4KB = 1024 entries × 4 bytes), it uses 10 bits of index. This leaves all remaining bits for the outer index.

Two-Level Page Table Scaling
Address Bits	Outer Index	Inner Index	Offset	Outer Table Size	Entries
32-bit	10 bits	10 bits	12 bits	4KB	1,024
36-bit	14 bits	10 bits	12 bits	64KB	16,384
40-bit	18 bits	10 bits	12 bits	1MB	262,144
48-bit	26 bits	10 bits	12 bits	256MB	67M
57-bit	35 bits	10 bits	12 bits	128GB	34B

The outer table explosion:

As the address space grows, the outer table grows exponentially. For a 48-bit address space:

Inner tables remain 4KB (1024 entries)
Outer table needs 2²⁶ = 67 million entries
Outer table size: 67M × 4 bytes = 256MB per process

This is worse than a flat page table for 32-bit systems! The outer table must be fully allocated even for sparse address spaces because it's a single contiguous array.

The fundamental problem:

Two-level paging only saves memory when the inner tables can be sparse. But if the outer table is huge, we've just moved the problem rather than solving it. We need the outer table itself to be sparse.

The Sparsity Requirement

For multi-level paging to save memory, EVERY level except the innermost must be sparse. If any level requires a huge contiguous array, the memory savings are lost. This is why we need enough levels to keep each level's table page-sized.

The solution: Add another level

By adding a third level, we can split the outer table into smaller chunks:

Two-Level (48-bit):
  Level 1 (26 bits) → Level 2 (10 bits) → Offset (12 bits)
  Problem: Level 1 is 256MB

Three-Level (48-bit):
  Level 1 (16 bits) → Level 2 (10 bits) → Level 3 (10 bits) → Offset (12 bits)
  Improvement: Level 1 is now 256KB

Four-Level (48-bit):
  L1 (9 bits) → L2 (9 bits) → L3 (9 bits) → L4 (9 bits) → Offset (12 bits)
  Each level: 512 entries × 8 bytes = 4KB ✓

This is exactly what modern 64-bit systems do: use four levels (or five for 57-bit addresses) to keep each table page-sized.

Three-Level Page Table Structure

A three-level page table introduces an additional layer of indirection between the top-level directory and the page tables that directly map physical frames.

Hierarchical structure:

┌─────────────────────┐
│  Level 1 Directory  │  ← CR3 register points here
│    (Root Table)     │     1 table always allocated
└─────────┬───────────┘
          │ L1 index selects entry
          ▼
┌─────────────────────┐
│  Level 2 Directory  │  ← Multiple possible tables
│   (Middle Tables)   │     Only allocated if L1 entry is valid
└─────────┬───────────┘
          │ L2 index selects entry
          ▼
┌─────────────────────┐
│  Level 3 Tables     │  ← Multiple possible tables
│   (Page Tables)     │     Only allocated if L2 entry is valid
└─────────┬───────────┘
          │ L3 index selects entry
          ▼
┌─────────────────────┐
│   Physical Frame    │  ← Frame number from L3 entry
│   + Offset          │     Offset from virtual address
└─────────────────────┘

three_level_structure.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// Three-Level Page Table Structure for Larger Address Spaces
// Example: 40-bit virtual address with 4KB pages
 
#define PAGE_SIZE           4096
#define PAGE_SHIFT          12
#define ENTRIES_PER_TABLE   1024        // 2^10 entries per table
 
// Address field sizes
#define L1_BITS             8           // Level 1 index bits
#define L2_BITS             10          // Level 2 index bits
#define L3_BITS             10          // Level 3 index bits
#define OFFSET_BITS         12          // Page offset bits
// Total: 8 + 10 + 10 + 12 = 40 bits
 
// Address field masks and shifts
#define OFFSET_MASK         0xFFF       // Bits 0-11
#define L3_MASK             0x3FF       // 10 bits
#define L2_MASK             0x3FF       // 10 bits
#define L1_MASK             0xFF        // 8 bits
 
#define L3_SHIFT            12
#define L2_SHIFT            22
#define L1_SHIFT            32
 
// Entry types (all 64 bits for alignment)
typedef uint64_t L1Entry;   // Points to L2 table
typedef uint64_t L2Entry;   // Points to L3 table  
typedef uint64_t L3Entry;   // Contains frame number
 
// Entry flags (same as x86-64)
#define ENTRY_PRESENT       (1ULL << 0)
#define ENTRY_WRITE         (1ULL << 1)
#define ENTRY_USER          (1ULL << 2)
#define ENTRY_ACCESSED      (1ULL << 5)
#define ENTRY_DIRTY         (1ULL << 6)
#define ENTRY_ADDR_MASK     0x000FFFFFFFFFF000ULL
 
// Table types
typedef L1Entry Level1Table[256];       // 2^8 = 256 entries
typedef L2Entry Level2Table[1024];      // 2^10 = 1024 entries
typedef L3Entry Level3Table[1024];      // 2^10 = 1024 entries
 
// Level 1: 256 entries × 8 bytes = 2KB (not full page, suboptimal)
// Level 2: 1024 entries × 8 bytes = 8KB (2 pages, also suboptimal)
// Level 3: 1024 entries × 8 bytes = 8KB (2 pages)
 
// NOTE: Real systems adjust bit splits to make each level page-sized.
// x86-64 uses 9-9-9-9-12 for exactly 512 entries × 8 bytes = 4KB per table.

Naming conventions:

Different architectures and texts use various names for the levels:

Level	Intel x86-64	Generic Name	Role
1	PML4 (Page Map Level 4)	Page Global Directory	Root table, pointed to by CR3
2	PDPT (Page Directory Pointer Table)	Page Upper Directory	Second-level directory
3	PD (Page Directory)	Page Middle Directory	Third-level directory
4	PT (Page Table)	Page Table	Contains frame numbers

Note: Intel's naming is for four-level paging. For pure three-level, levels 1-3 would be PDPT, PD, and PT.

The design principle:

Each level serves as a directory for the level below it. Only the bottommost level (L3 in three-level, L4 in four-level) contains actual frame mappings. All upper levels contain pointers to tables at the next level down, with validity bits indicating whether those tables exist.

Why 9-9-9-9-12 in x86-64?

With 8-byte entries (64-bit pointers), a 4KB page holds exactly 512 entries. log₂(512) = 9 bits per level. Four levels × 9 bits = 36 bits for page table indexing, plus 12 bits for offset = 48-bit addresses. This elegant design makes every table exactly one page.

Three-Level Address Decomposition

In three-level paging, the virtual address is decomposed into four fields:

┌────────────────────────────────────────────────────────────────────────────────┐
│                         Virtual Address (e.g., 40 bits)                        │
├──────────────┬──────────────┬──────────────┬───────────────────────────────────┤
│  Bits 39-32  │  Bits 31-22  │  Bits 21-12  │           Bits 11-0              │
│   (8 bits)   │  (10 bits)   │  (10 bits)   │           (12 bits)              │
├──────────────┼──────────────┼──────────────┼───────────────────────────────────┤
│     L1       │      L2      │      L3      │            Offset                │
│   (Root)     │   (Middle)   │   (Bottom)   │          (Page offset)           │
└──────────────┴──────────────┴──────────────┴───────────────────────────────────┘

Each field serves a specific purpose in navigating the three-level hierarchy:

Address Field Roles

•L1 Index (Root) — Selects an entry in the root table (pointed to by CR3). This entry either points to an L2 table or is marked invalid.
•L2 Index (Middle) — Selects an entry in the chosen L2 table. This entry either points to an L3 table or is marked invalid.
•L3 Index (Bottom) — Selects an entry in the chosen L3 table. This entry contains the physical frame number or is marked invalid.
•Offset — Specifies the byte position within the 4KB page. This field is not used for table indexing; it's appended to the frame address to form the complete physical address.

three_level_decomposition.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// Three-Level Address Decomposition and Extraction
 
// Example configuration for 40-bit virtual addresses
#define VA_BITS         40
#define OFFSET_BITS     12
#define L3_BITS         10
#define L2_BITS         10
#define L1_BITS         (VA_BITS - OFFSET_BITS - L3_BITS - L2_BITS)  // 8 bits
 
// Extraction macros
#define GET_OFFSET(va)  ((va) & ((1UL << OFFSET_BITS) - 1))
#define GET_L3(va)      (((va) >> OFFSET_BITS) & ((1UL << L3_BITS) - 1))
#define GET_L2(va)      (((va) >> (OFFSET_BITS + L3_BITS)) & ((1UL << L2_BITS) - 1))
#define GET_L1(va)      (((va) >> (OFFSET_BITS + L3_BITS + L2_BITS)) & ((1UL << L1_BITS) - 1))
 
// Demonstration function
void decompose_address(uint64_t virtual_address) {
    uint64_t l1_index = GET_L1(virtual_address);
    uint64_t l2_index = GET_L2(virtual_address);
    uint64_t l3_index = GET_L3(virtual_address);
    uint64_t offset   = GET_OFFSET(virtual_address);
    
    printf("Virtual Address: 0x%010lX
", virtual_address);
    printf("Decomposition (40-bit address, 8-10-10-12 split):
");
    printf("  L1 Index (Root):   %3lu  (0x%02lX)
", l1_index, l1_index);
    printf("  L2 Index (Middle): %4lu (0x%03lX)
", l2_index, l2_index);
    printf("  L3 Index (Bottom): %4lu (0x%03lX)
", l3_index, l3_index);
    printf("  Page Offset:       %4lu (0x%03lX)
", offset, offset);
}
 
/*
 * Example: Virtual address 0x00_AB_CD_EF_12
 * Binary: 0000 0000 1010 1011 1100 1101 1110 1111 0001 0010
 *         └─────────┘└──────────┘└──────────┘└────────────┘
 *          L1 (8b)    L2 (10b)    L3 (10b)    Offset (12b)
 * 
 * L1 Index:     0x02 = 2
 * L2 Index:    0x2F3 = 755
 * L3 Index:    0x1EF = 495
 * Offset:      0x112 = 274
 */

Bit Allocation Trade-offs

The number of bits per level determines both table sizes and maximum coverage. More bits at a level means larger tables but fewer tables needed. Fewer bits means smaller tables but deeper nesting. Modern systems optimize for page-sized tables at every level.

Three-Level Translation Process

The translation process in three-level paging extends the two-level approach with an additional table lookup. Each level must be traversed sequentially, as each entry provides the address of the next table.

Three-Level Translation Steps

•Decompose virtual address — Extract L1 index, L2 index, L3 index, and page offset from the virtual address.
•Access L1 table (root) — Read the CR3/TTBR register to find the physical address of the L1 table. Access entry L1[l1_index].
•Validate L1 entry — Check the present bit. If not present, trigger page fault (L2 table doesn't exist).
•Access L2 table — Extract physical address of L2 table from L1 entry. Access entry L2[l2_index].
•Validate L2 entry — Check the present bit. If not present, trigger page fault (L3 table doesn't exist).
•Access L3 table — Extract physical address of L3 table from L2 entry. Access entry L3[l3_index].
•Validate L3 entry — Check the present bit. If not present, trigger page fault (page not in memory).
•Extract frame number — Get the physical frame number from the L3 entry.
•Form physical address — Combine frame number with page offset to produce final physical address.

three_level_translation.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// Complete Three-Level Address Translation
 
#define ENTRY_PRESENT       (1ULL << 0)
#define ENTRY_ADDR_MASK     0x0000FFFFFFFFF000ULL  // Bits 12-51 for address
 
typedef uint64_t* PageTable;
 
// Hardware register holding L1 table physical address
extern uint64_t cr3_register;
 
// Convert physical address to kernel virtual address
extern void* phys_to_virt(uint64_t phys_addr);
 
// Page fault handler
extern void page_fault(uint64_t vaddr, int reason);
 
uint64_t translate_three_level(uint64_t virtual_address) {
    // STEP 1: Decompose virtual address
    uint64_t l1_index = GET_L1(virtual_address);
    uint64_t l2_index = GET_L2(virtual_address);
    uint64_t l3_index = GET_L3(virtual_address);
    uint64_t offset   = GET_OFFSET(virtual_address);
    
    // STEP 2: Access L1 table (root)
    // First memory access
    PageTable l1_table = phys_to_virt(cr3_register);
    uint64_t l1_entry = l1_table[l1_index];
    
    // STEP 3: Validate L1 entry
    if (!(l1_entry & ENTRY_PRESENT)) {
        page_fault(virtual_address, FAULT_L2_NOT_PRESENT);
        return 0;  // Not reached
    }
    
    // STEP 4: Access L2 table
    // Second memory access
    uint64_t l2_phys = l1_entry & ENTRY_ADDR_MASK;
    PageTable l2_table = phys_to_virt(l2_phys);
    uint64_t l2_entry = l2_table[l2_index];
    
    // STEP 5: Validate L2 entry
    if (!(l2_entry & ENTRY_PRESENT)) {
        page_fault(virtual_address, FAULT_L3_NOT_PRESENT);
        return 0;  // Not reached
    }
    
    // STEP 6: Access L3 table (page table)
    // Third memory access
    uint64_t l3_phys = l2_entry & ENTRY_ADDR_MASK;
    PageTable l3_table = phys_to_virt(l3_phys);
    uint64_t l3_entry = l3_table[l3_index];
    
    // STEP 7: Validate L3 entry
    if (!(l3_entry & ENTRY_PRESENT)) {
        page_fault(virtual_address, FAULT_PAGE_NOT_PRESENT);
        return 0;  // Not reached
    }
    
    // STEP 8 & 9: Extract frame and form physical address
    uint64_t frame_address = l3_entry & ENTRY_ADDR_MASK;
    uint64_t physical_address = frame_address | offset;
    
    return physical_address;
}
 
/*
 * Memory Access Count: 3 (L1, L2, L3) + 1 (actual data) = 4 total
 * 
 * Without TLB: Every memory access requires 4 physical reads
 *              Performance would be 4× slower than uncached
 * 
 * With TLB: Translation cached, most accesses need only 1 read
 *           TLB is essential for multi-level paging viability
 */

Visualizing the translation:

Virtual Address: 0x00ABCDEF12 (40-bit example)
                 ├───┤├─────┤├─────┤├───────┤
                 L1:2  L2:755 L3:495 Off:274

         CR3: 0x00001000
              │
              ▼
┌─────────────────────────────────────────┐
│       L1 Table @ 0x00001000             │
│  Entry[0]: 0x0 (not present)            │
│  Entry[1]: 0x0 (not present)            │
│  Entry[2]: 0x00072003 ─────────────────────┐
│  ...                                    │  │
└─────────────────────────────────────────┘  │
                                             │
              ┌──────────────────────────────┘
              ▼
┌─────────────────────────────────────────┐
│       L2 Table @ 0x00072000             │
│  Entry[755]: 0x00089003 ───────────────────┐
│  ...                                    │  │
└─────────────────────────────────────────┘  │
                                             │
              ┌──────────────────────────────┘
              ▼
┌─────────────────────────────────────────┐
│       L3 Table @ 0x00089000             │
│  Entry[495]: 0x00456003 ───────────────────┐
│  (Frame: 0x00456, Flags: Present, RW)   │  │
│  ...                                    │  │
└─────────────────────────────────────────┘  │
                                             │
              ┌──────────────────────────────┘
              ▼
┌─────────────────────────────────────────┐
│  Physical Frame @ 0x00456000            │
│  Physical Address = 0x00456000 + 274    │
│                   = 0x00456112          │
└─────────────────────────────────────────┘

Memory Access Overhead

Three-level paging requires three memory accesses before accessing the actual data—a 4× overhead compared to direct physical access. This makes the TLB (Translation Lookaside Buffer) absolutely critical. Without TLB caching, multi-level paging would be impractically slow.

Memory Efficiency in Three-Level Paging

Three-level paging extends the memory savings of two-level paging to larger address spaces. Let's analyze the overhead for different usage patterns.

three_level_analysis.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# Three-Level Page Table Memory Analysis
 
# Configuration: 48-bit address space
# If using three levels: might use 16-10-10-12 split
# L1: 16 bits = 65,536 entries × 8 bytes = 512KB (problematic!)
# L2: 10 bits = 1,024 entries × 8 bytes = 8KB
# L3: 10 bits = 1,024 entries × 8 bytes = 8KB
 
# Better: Use more levels to keep tables page-sized
# x86-64 four-level: 9-9-9-9-12 split
# Each table: 512 entries × 8 bytes = 4KB ✓
 
def analyze_three_level(address_bits, l1_bits, l2_bits, l3_bits):
    """Analyze memory overhead for three-level paging."""
    
    offset_bits = 12  # 4KB pages
    entry_size = 8    # 64-bit entries
    
    l1_entries = 2 ** l1_bits
    l2_entries = 2 ** l2_bits
    l3_entries = 2 ** l3_bits
    
    l1_table_size = l1_entries * entry_size
    l2_table_size = l2_entries * entry_size
    l3_table_size = l3_entries * entry_size
    
    # Coverage per table
    l3_coverage = l3_entries * (2 ** offset_bits)  # Bytes per L3 table
    l2_coverage = l2_entries * l3_coverage          # Bytes per L2 table
    l1_coverage = l1_entries * l2_coverage          # Total address space
    
    print(f"Three-Level Analysis: {address_bits}-bit addresses")
    print(f"Split: {l1_bits}-{l2_bits}-{l3_bits}-{offset_bits}")
    print(f"")
    print(f"L1 (Root) Table:")
    print(f"  Entries: {l1_entries:,}")
    print(f"  Size: {l1_table_size:,} bytes ({l1_table_size/1024:.1f} KB)")
    print(f"  Each entry covers: {l2_coverage / (1024**2):.1f} MB")
    print(f"")
    print(f"L2 (Middle) Tables:")
    print(f"  Entries per table: {l2_entries:,}")
    print(f"  Size per table: {l2_table_size:,} bytes ({l2_table_size/1024:.1f} KB)")
    print(f"  Each entry covers: {l3_coverage / (1024**2):.1f} MB")
    print(f"  Max L2 tables: {l1_entries:,}")
    print(f"")
    print(f"L3 (Bottom) Tables:")
    print(f"  Entries per table: {l3_entries:,}")
    print(f"  Size per table: {l3_table_size:,} bytes ({l3_table_size/1024:.1f} KB)")
    print(f"  Each entry covers: {2**offset_bits / 1024:.1f} KB (one page)")
    print(f"  Max L3 tables: {l1_entries * l2_entries:,}")
    
    return l1_table_size, l2_table_size, l3_table_size
 
 
# Example: 40-bit address with 8-10-10-12 split
print("=" * 60)
analyze_three_level(40, 8, 10, 10)
print()
print("=" * 60)
 
# Compare overhead for a sparse 20MB process:
# - Code: 8MB at 0x00400000
# - Heap: 8MB at 0x00800000  
# - Stack: 4MB at 0xFF000000
#
# L1 table: 2KB (always allocated)
# L2 tables needed: 3 (entries 1, 2, and 255)
# L3 tables needed: ~8 (2 per used region, covering 4MB each)
#
# Total: 2KB + 3×8KB + 8×8KB = 2KB + 24KB + 64KB = 90KB
# 
# Flat table for 40-bit: 2^28 entries × 8 bytes = 2GB per process!

Three-Level Memory Overhead Examples
Scenario	Used Memory	L1 Tables	L2 Tables	L3 Tables	Total Overhead
Minimal process (1MB)	1MB	1 (4KB)*	1 (4KB)	1 (4KB)	12KB
Small process (20MB)	20MB	1 (4KB)*	3 (12KB)	5 (20KB)	36KB
Medium process (100MB)	100MB	1 (4KB)*	4 (16KB)	25 (100KB)	120KB
Large process (1GB)	1GB	1 (4KB)*	~8 (32KB)	256 (1MB)	~1MB
Huge process (10GB)	10GB	1 (4KB)*	~40 (160KB)	2560 (10MB)	~10MB

*Assuming 4KB page-sized tables with 9-9-9-9-12 split (actual x86-64 uses 4 levels).

Key observations:

Overhead scales with usage, not address space — A process using 20MB needs ~36KB of page tables, regardless of whether the address space is 40-bit or 48-bit.
Small processes benefit most — A 1MB process has only 12KB overhead—a 1.2% overhead. This is vastly better than a flat table.
Large processes have reasonable overhead — Even a 10GB process needs only 10MB of page tables—0.1% overhead.
Three levels may not be enough — With a 48-bit address space, keeping L1 page-sized requires four levels, not three.

Scalable Design

Multi-level paging achieves O(used memory) overhead instead of O(address space) overhead. This fundamental shift makes huge virtual address spaces practical, enabling modern computing's memory abstractions.

Large Pages in Multi-Level Structures

Multi-level page tables enable a powerful optimization: large pages (also called huge pages or superpages). Instead of traversing all levels to map a 4KB page, an intermediate entry can directly map a larger region, skipping lower levels.

How large pages work:

In a three-level system with appropriate bit splits, an L2 entry can directly map a 2MB region (the same as what one L3 table would cover). Similarly, an L1 entry could map a 1GB region. When a large page is used:

The entry at the upper level has a special flag set (e.g., PS = Page Size)
The entry directly contains the physical address of the large page
More bits from the virtual address become the offset within the page
No lower-level table access is needed

large_pages.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
// Large Page Support in Three-Level Paging
 
// Standard page sizes
#define PAGE_4KB        (1UL << 12)   // 4KB  - Standard page
#define PAGE_2MB        (1UL << 21)   // 2MB  - Large page (L3 skipped)
#define PAGE_1GB        (1UL << 30)   // 1GB  - Huge page (L2 & L3 skipped)
 
// Large page flag (PS bit in x86-64)
#define ENTRY_LARGE_PAGE    (1UL << 7)
 
// Detecting large pages
bool is_large_page_l2(uint64_t l2_entry) {
    return (l2_entry & ENTRY_LARGE_PAGE) != 0;
}
 
bool is_large_page_l1(uint64_t l1_entry) {
    return (l1_entry & ENTRY_LARGE_PAGE) != 0;
}
 
// Translation with large page support
uint64_t translate_with_large_pages(uint64_t virtual_address) {
    uint64_t l1_index = GET_L1(virtual_address);
    uint64_t l2_index = GET_L2(virtual_address);
    uint64_t l3_index = GET_L3(virtual_address);
    uint64_t offset_4kb = GET_OFFSET(virtual_address);
    
    // Access L1 table
    PageTable l1_table = phys_to_virt(cr3_register);
    uint64_t l1_entry = l1_table[l1_index];
    
    if (!(l1_entry & ENTRY_PRESENT)) {
        page_fault(virtual_address, FAULT_L1);
        return 0;
    }
    
    // Check for 1GB large page at L1
    if (is_large_page_l1(l1_entry)) {
        // L1 entry directly maps 1GB page
        // Offset = L2 index bits + L3 index bits + page offset
        uint64_t offset_1gb = (l2_index << 21) | (l3_index << 12) | offset_4kb;
        uint64_t frame_1gb = l1_entry & 0xFFFFFFC0000000UL;  // 1GB aligned
        return frame_1gb | offset_1gb;
    }
    
    // Access L2 table
    uint64_t l2_phys = l1_entry & ENTRY_ADDR_MASK;
    PageTable l2_table = phys_to_virt(l2_phys);
    uint64_t l2_entry = l2_table[l2_index];
    
    if (!(l2_entry & ENTRY_PRESENT)) {
        page_fault(virtual_address, FAULT_L2);
        return 0;
    }
    
    // Check for 2MB large page at L2
    if (is_large_page_l2(l2_entry)) {
        // L2 entry directly maps 2MB page
        // Offset = L3 index bits + page offset
        uint64_t offset_2mb = (l3_index << 12) | offset_4kb;
        uint64_t frame_2mb = l2_entry & 0xFFFFFFFFE00000UL;  // 2MB aligned
        return frame_2mb | offset_2mb;
    }
    
    // Standard 4KB page - continue to L3
    uint64_t l3_phys = l2_entry & ENTRY_ADDR_MASK;
    PageTable l3_table = phys_to_virt(l3_phys);
    uint64_t l3_entry = l3_table[l3_index];
    
    if (!(l3_entry & ENTRY_PRESENT)) {
        page_fault(virtual_address, FAULT_L3);
        return 0;
    }
    
    uint64_t frame_4kb = l3_entry & ENTRY_ADDR_MASK;
    return frame_4kb | offset_4kb;
}

Large Page Benefits

•Reduced TLB pressure — One TLB entry covers 2MB or 1GB instead of 4KB
•Faster translation — Fewer table lookups required
•Less page table memory — Fewer entries needed for large mappings
•Better cache performance — Contiguous physical memory improves locality

Large Page Challenges

•Allocation difficulty — Must find contiguous 2MB or 1GB physical regions
•Internal fragmentation — Waste if full page isn't used
•Alignment requirements — Virtual and physical addresses must be page-size aligned
•Memory pressure — Harder to reclaim large pages under memory pressure

Transparent Huge Pages (THP)

Modern Linux supports Transparent Huge Pages, which automatically uses 2MB pages when possible without application changes. The kernel's khugepaged daemon coalesces small pages into huge pages in the background, and can split them back under memory pressure.

From Three Levels to Modern Four-Level Systems

While three-level paging works for address spaces up to around 40-42 bits, modern 64-bit systems typically support 48-bit addresses (256TB), requiring four levels to keep all tables page-sized.

Why x86-64 uses four levels:

With 48-bit addresses, 4KB pages, and 8-byte entries:

Address bits to index: 48 - 12 = 36 bits
If 3 levels: 36 / 3 = 12 bits per level → 4096 entries × 8 bytes = 32KB per table
If 4 levels: 36 / 4 = 9 bits per level → 512 entries × 8 bytes = 4KB per table ✓

Four levels with 9 bits each gives exactly page-sized tables, which:

Simplifies allocation (use the same allocator as for regular pages)
Enables swapping of page tables themselves
Keeps overhead predictable (always 4KB per table)

x86-64 Four-Level Paging Structure
Level	Name	Index Bits	Entries	Entry Size	Table Size	Coverage
1	PML4	39-47 (9 bits)	512	8 bytes	4KB	512GB per entry
2	PDPT	30-38 (9 bits)	512	8 bytes	4KB	1GB per entry
3	PD	21-29 (9 bits)	512	8 bytes	4KB	2MB per entry
4	PT	12-20 (9 bits)	512	8 bytes	4KB	4KB per entry
—	Offset	0-11 (12 bits)	4096	1 byte	—	Byte within page

The extension to five levels:

Intel's newer processors support 5-level paging for 57-bit addresses (128PB), adding a PML5 level:

5-Level: PML5 → PML4 → PDPT → PD → PT → Frame + Offset
         9 bits  9 bits  9 bits  9 bits  9 bits  12 bits = 57 bits

This provides headroom for future memory expansion while maintaining the elegant 4KB-per-table design.

Architectural comparison:

Architecture	Levels	Address Bits	Page Size	Entry Size
x86 (32-bit)	2	32	4KB	4 bytes
x86 PAE	3	36	4KB	8 bytes
x86-64	4	48	4KB	8 bytes
x86-64 (LA57)	5	57	4KB	8 bytes
ARM AArch32	2	32	4KB	4 bytes
ARM AArch64	3-4	39-48	4KB	8 bytes

Why Not More Levels?

Each additional level adds one memory access to translation. With 5 levels, a TLB miss requires 5 memory accesses before reaching data. More levels would make cold TLB misses prohibitively expensive. The 4-5 level design balances table size, address space, and translation latency.

Summary: Three-Level Paging

Three-level paging extends the hierarchical approach to accommodate larger address spaces while maintaining the memory efficiency benefits of multi-level structures.

Key Takeaways

•Two levels fail for large address spaces — The outer table becomes too large, negating memory savings.
•Three levels add intermediate indirection — An additional directory level keeps all tables manageable.
•Translation requires three memory accesses — L1, L2, and L3 tables must all be read before accessing data.
•Memory savings scale with sparsity — Page table overhead is proportional to used memory, not address space size.
•Large pages skip levels — 2MB and 1GB pages bypass lower levels, reducing overhead and TLB pressure.
•Modern systems use four or five levels — x86-64 uses four levels for 48-bit addresses, five for 57-bit.
•The TLB is essential — Without caching, multi-level translation overhead would be prohibitive.

What's next:

We've now seen two-level and three-level paging as specific instances of hierarchical page tables. In the next page, we'll generalize these concepts to understand hierarchical page tables as a universal pattern—examining the principles that govern any multi-level structure, regardless of the specific number of levels.

Page Complete

You now understand three-level paging: why additional levels are needed for larger address spaces, how the address is decomposed across three indices, the complete translation algorithm, the role of large pages, and how three-level concepts extend to modern four and five-level systems.

2 / 5

Loading learning content...

Operating SystemsMulti-Level Paging

Multi-Level Paging

LevelIntermediate

Duration75 mins

TopicMulti-Level Paging

2 / 5

Three-Level Paging

Beyond 32 Bits: The Need for Deeper Hierarchies

32-bit: 4GB address space → 2²⁰ pages → 4MB flat page table
48-bit: 256TB address space → 2³⁶ pages → 256GB flat page table
64-bit: 16EB address space → 2⁵² pages → 32PB flat page table

What You Will Learn

Why Two Levels Become Insufficient

To understand why we need three levels (and beyond), let's analyze what happens when we apply two-level paging to larger address spaces.

The scaling problem:

In a two-level system with 4KB pages, we split the virtual address into:

Page offset: 12 bits (fixed by page size)
Remaining bits split between outer and inner indices

For the inner table to be page-sized (4KB = 1024 entries × 4 bytes), it uses 10 bits of index. This leaves all remaining bits for the outer index.

Two-Level Page Table Scaling
Address Bits	Outer Index	Inner Index	Offset	Outer Table Size	Entries
32-bit	10 bits	10 bits	12 bits	4KB	1,024
36-bit	14 bits	10 bits	12 bits	64KB	16,384
40-bit	18 bits	10 bits	12 bits	1MB	262,144
48-bit	26 bits	10 bits	12 bits	256MB	67M
57-bit	35 bits	10 bits	12 bits	128GB	34B

The outer table explosion:

As the address space grows, the outer table grows exponentially. For a 48-bit address space:

Inner tables remain 4KB (1024 entries)
Outer table needs 2²⁶ = 67 million entries
Outer table size: 67M × 4 bytes = 256MB per process

This is worse than a flat page table for 32-bit systems! The outer table must be fully allocated even for sparse address spaces because it's a single contiguous array.

The fundamental problem:

The Sparsity Requirement

The solution: Add another level

By adding a third level, we can split the outer table into smaller chunks:

Two-Level (48-bit):
  Level 1 (26 bits) → Level 2 (10 bits) → Offset (12 bits)
  Problem: Level 1 is 256MB

Three-Level (48-bit):
  Level 1 (16 bits) → Level 2 (10 bits) → Level 3 (10 bits) → Offset (12 bits)
  Improvement: Level 1 is now 256KB

Four-Level (48-bit):
  L1 (9 bits) → L2 (9 bits) → L3 (9 bits) → L4 (9 bits) → Offset (12 bits)
  Each level: 512 entries × 8 bytes = 4KB ✓

This is exactly what modern 64-bit systems do: use four levels (or five for 57-bit addresses) to keep each table page-sized.

Three-Level Page Table Structure

A three-level page table introduces an additional layer of indirection between the top-level directory and the page tables that directly map physical frames.

Hierarchical structure:

┌─────────────────────┐
│  Level 1 Directory  │  ← CR3 register points here
│    (Root Table)     │     1 table always allocated
└─────────┬───────────┘
          │ L1 index selects entry
          ▼
┌─────────────────────┐
│  Level 2 Directory  │  ← Multiple possible tables
│   (Middle Tables)   │     Only allocated if L1 entry is valid
└─────────┬───────────┘
          │ L2 index selects entry
          ▼
┌─────────────────────┐
│  Level 3 Tables     │  ← Multiple possible tables
│   (Page Tables)     │     Only allocated if L2 entry is valid
└─────────┬───────────┘
          │ L3 index selects entry
          ▼
┌─────────────────────┐
│   Physical Frame    │  ← Frame number from L3 entry
│   + Offset          │     Offset from virtual address
└─────────────────────┘

three_level_structure.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// Three-Level Page Table Structure for Larger Address Spaces
// Example: 40-bit virtual address with 4KB pages
 
#define PAGE_SIZE           4096
#define PAGE_SHIFT          12
#define ENTRIES_PER_TABLE   1024        // 2^10 entries per table
 
// Address field sizes
#define L1_BITS             8           // Level 1 index bits
#define L2_BITS             10          // Level 2 index bits
#define L3_BITS             10          // Level 3 index bits
#define OFFSET_BITS         12          // Page offset bits
// Total: 8 + 10 + 10 + 12 = 40 bits
 
// Address field masks and shifts
#define OFFSET_MASK         0xFFF       // Bits 0-11
#define L3_MASK             0x3FF       // 10 bits
#define L2_MASK             0x3FF       // 10 bits
#define L1_MASK             0xFF        // 8 bits
 
#define L3_SHIFT            12
#define L2_SHIFT            22
#define L1_SHIFT            32
 
// Entry types (all 64 bits for alignment)
typedef uint64_t L1Entry;   // Points to L2 table
typedef uint64_t L2Entry;   // Points to L3 table  
typedef uint64_t L3Entry;   // Contains frame number
 
// Entry flags (same as x86-64)
#define ENTRY_PRESENT       (1ULL << 0)
#define ENTRY_WRITE         (1ULL << 1)
#define ENTRY_USER          (1ULL << 2)
#define ENTRY_ACCESSED      (1ULL << 5)
#define ENTRY_DIRTY         (1ULL << 6)
#define ENTRY_ADDR_MASK     0x000FFFFFFFFFF000ULL
 
// Table types
typedef L1Entry Level1Table[256];       // 2^8 = 256 entries
typedef L2Entry Level2Table[1024];      // 2^10 = 1024 entries
typedef L3Entry Level3Table[1024];      // 2^10 = 1024 entries
 
// Level 1: 256 entries × 8 bytes = 2KB (not full page, suboptimal)
// Level 2: 1024 entries × 8 bytes = 8KB (2 pages, also suboptimal)
// Level 3: 1024 entries × 8 bytes = 8KB (2 pages)
 
// NOTE: Real systems adjust bit splits to make each level page-sized.
// x86-64 uses 9-9-9-9-12 for exactly 512 entries × 8 bytes = 4KB per table.

Naming conventions:

Different architectures and texts use various names for the levels:

Level	Intel x86-64	Generic Name	Role
1	PML4 (Page Map Level 4)	Page Global Directory	Root table, pointed to by CR3
2	PDPT (Page Directory Pointer Table)	Page Upper Directory	Second-level directory
3	PD (Page Directory)	Page Middle Directory	Third-level directory
4	PT (Page Table)	Page Table	Contains frame numbers

Note: Intel's naming is for four-level paging. For pure three-level, levels 1-3 would be PDPT, PD, and PT.

The design principle:

Why 9-9-9-9-12 in x86-64?

Three-Level Address Decomposition

In three-level paging, the virtual address is decomposed into four fields:

┌────────────────────────────────────────────────────────────────────────────────┐
│                         Virtual Address (e.g., 40 bits)                        │
├──────────────┬──────────────┬──────────────┬───────────────────────────────────┤
│  Bits 39-32  │  Bits 31-22  │  Bits 21-12  │           Bits 11-0              │
│   (8 bits)   │  (10 bits)   │  (10 bits)   │           (12 bits)              │
├──────────────┼──────────────┼──────────────┼───────────────────────────────────┤
│     L1       │      L2      │      L3      │            Offset                │
│   (Root)     │   (Middle)   │   (Bottom)   │          (Page offset)           │
└──────────────┴──────────────┴──────────────┴───────────────────────────────────┘

Each field serves a specific purpose in navigating the three-level hierarchy:

Address Field Roles

•L1 Index (Root) — Selects an entry in the root table (pointed to by CR3). This entry either points to an L2 table or is marked invalid.
•L2 Index (Middle) — Selects an entry in the chosen L2 table. This entry either points to an L3 table or is marked invalid.
•L3 Index (Bottom) — Selects an entry in the chosen L3 table. This entry contains the physical frame number or is marked invalid.
•Offset — Specifies the byte position within the 4KB page. This field is not used for table indexing; it's appended to the frame address to form the complete physical address.

three_level_decomposition.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// Three-Level Address Decomposition and Extraction
 
// Example configuration for 40-bit virtual addresses
#define VA_BITS         40
#define OFFSET_BITS     12
#define L3_BITS         10
#define L2_BITS         10
#define L1_BITS         (VA_BITS - OFFSET_BITS - L3_BITS - L2_BITS)  // 8 bits
 
// Extraction macros
#define GET_OFFSET(va)  ((va) & ((1UL << OFFSET_BITS) - 1))
#define GET_L3(va)      (((va) >> OFFSET_BITS) & ((1UL << L3_BITS) - 1))
#define GET_L2(va)      (((va) >> (OFFSET_BITS + L3_BITS)) & ((1UL << L2_BITS) - 1))
#define GET_L1(va)      (((va) >> (OFFSET_BITS + L3_BITS + L2_BITS)) & ((1UL << L1_BITS) - 1))
 
// Demonstration function
void decompose_address(uint64_t virtual_address) {
    uint64_t l1_index = GET_L1(virtual_address);
    uint64_t l2_index = GET_L2(virtual_address);
    uint64_t l3_index = GET_L3(virtual_address);
    uint64_t offset   = GET_OFFSET(virtual_address);
    
    printf("Virtual Address: 0x%010lX
", virtual_address);
    printf("Decomposition (40-bit address, 8-10-10-12 split):
");
    printf("  L1 Index (Root):   %3lu  (0x%02lX)
", l1_index, l1_index);
    printf("  L2 Index (Middle): %4lu (0x%03lX)
", l2_index, l2_index);
    printf("  L3 Index (Bottom): %4lu (0x%03lX)
", l3_index, l3_index);
    printf("  Page Offset:       %4lu (0x%03lX)
", offset, offset);
}
 
/*
 * Example: Virtual address 0x00_AB_CD_EF_12
 * Binary: 0000 0000 1010 1011 1100 1101 1110 1111 0001 0010
 *         └─────────┘└──────────┘└──────────┘└────────────┘
 *          L1 (8b)    L2 (10b)    L3 (10b)    Offset (12b)
 * 
 * L1 Index:     0x02 = 2
 * L2 Index:    0x2F3 = 755
 * L3 Index:    0x1EF = 495
 * Offset:      0x112 = 274
 */

Bit Allocation Trade-offs

Three-Level Translation Process

Three-Level Translation Steps

•Decompose virtual address — Extract L1 index, L2 index, L3 index, and page offset from the virtual address.
•Access L1 table (root) — Read the CR3/TTBR register to find the physical address of the L1 table. Access entry L1[l1_index].
•Validate L1 entry — Check the present bit. If not present, trigger page fault (L2 table doesn't exist).
•Access L2 table — Extract physical address of L2 table from L1 entry. Access entry L2[l2_index].
•Validate L2 entry — Check the present bit. If not present, trigger page fault (L3 table doesn't exist).
•Access L3 table — Extract physical address of L3 table from L2 entry. Access entry L3[l3_index].
•Validate L3 entry — Check the present bit. If not present, trigger page fault (page not in memory).
•Extract frame number — Get the physical frame number from the L3 entry.
•Form physical address — Combine frame number with page offset to produce final physical address.

three_level_translation.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// Complete Three-Level Address Translation
 
#define ENTRY_PRESENT       (1ULL << 0)
#define ENTRY_ADDR_MASK     0x0000FFFFFFFFF000ULL  // Bits 12-51 for address
 
typedef uint64_t* PageTable;
 
// Hardware register holding L1 table physical address
extern uint64_t cr3_register;
 
// Convert physical address to kernel virtual address
extern void* phys_to_virt(uint64_t phys_addr);
 
// Page fault handler
extern void page_fault(uint64_t vaddr, int reason);
 
uint64_t translate_three_level(uint64_t virtual_address) {
    // STEP 1: Decompose virtual address
    uint64_t l1_index = GET_L1(virtual_address);
    uint64_t l2_index = GET_L2(virtual_address);
    uint64_t l3_index = GET_L3(virtual_address);
    uint64_t offset   = GET_OFFSET(virtual_address);
    
    // STEP 2: Access L1 table (root)
    // First memory access
    PageTable l1_table = phys_to_virt(cr3_register);
    uint64_t l1_entry = l1_table[l1_index];
    
    // STEP 3: Validate L1 entry
    if (!(l1_entry & ENTRY_PRESENT)) {
        page_fault(virtual_address, FAULT_L2_NOT_PRESENT);
        return 0;  // Not reached
    }
    
    // STEP 4: Access L2 table
    // Second memory access
    uint64_t l2_phys = l1_entry & ENTRY_ADDR_MASK;
    PageTable l2_table = phys_to_virt(l2_phys);
    uint64_t l2_entry = l2_table[l2_index];
    
    // STEP 5: Validate L2 entry
    if (!(l2_entry & ENTRY_PRESENT)) {
        page_fault(virtual_address, FAULT_L3_NOT_PRESENT);
        return 0;  // Not reached
    }
    
    // STEP 6: Access L3 table (page table)
    // Third memory access
    uint64_t l3_phys = l2_entry & ENTRY_ADDR_MASK;
    PageTable l3_table = phys_to_virt(l3_phys);
    uint64_t l3_entry = l3_table[l3_index];
    
    // STEP 7: Validate L3 entry
    if (!(l3_entry & ENTRY_PRESENT)) {
        page_fault(virtual_address, FAULT_PAGE_NOT_PRESENT);
        return 0;  // Not reached
    }
    
    // STEP 8 & 9: Extract frame and form physical address
    uint64_t frame_address = l3_entry & ENTRY_ADDR_MASK;
    uint64_t physical_address = frame_address | offset;
    
    return physical_address;
}
 
/*
 * Memory Access Count: 3 (L1, L2, L3) + 1 (actual data) = 4 total
 * 
 * Without TLB: Every memory access requires 4 physical reads
 *              Performance would be 4× slower than uncached
 * 
 * With TLB: Translation cached, most accesses need only 1 read
 *           TLB is essential for multi-level paging viability
 */

Visualizing the translation:

Virtual Address: 0x00ABCDEF12 (40-bit example)
                 ├───┤├─────┤├─────┤├───────┤
                 L1:2  L2:755 L3:495 Off:274

         CR3: 0x00001000
              │
              ▼
┌─────────────────────────────────────────┐
│       L1 Table @ 0x00001000             │
│  Entry[0]: 0x0 (not present)            │
│  Entry[1]: 0x0 (not present)            │
│  Entry[2]: 0x00072003 ─────────────────────┐
│  ...                                    │  │
└─────────────────────────────────────────┘  │
                                             │
              ┌──────────────────────────────┘
              ▼
┌─────────────────────────────────────────┐
│       L2 Table @ 0x00072000             │
│  Entry[755]: 0x00089003 ───────────────────┐
│  ...                                    │  │
└─────────────────────────────────────────┘  │
                                             │
              ┌──────────────────────────────┘
              ▼
┌─────────────────────────────────────────┐
│       L3 Table @ 0x00089000             │
│  Entry[495]: 0x00456003 ───────────────────┐
│  (Frame: 0x00456, Flags: Present, RW)   │  │
│  ...                                    │  │
└─────────────────────────────────────────┘  │
                                             │
              ┌──────────────────────────────┘
              ▼
┌─────────────────────────────────────────┐
│  Physical Frame @ 0x00456000            │
│  Physical Address = 0x00456000 + 274    │
│                   = 0x00456112          │
└─────────────────────────────────────────┘

Memory Access Overhead

Memory Efficiency in Three-Level Paging

Three-level paging extends the memory savings of two-level paging to larger address spaces. Let's analyze the overhead for different usage patterns.

three_level_analysis.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# Three-Level Page Table Memory Analysis
 
# Configuration: 48-bit address space
# If using three levels: might use 16-10-10-12 split
# L1: 16 bits = 65,536 entries × 8 bytes = 512KB (problematic!)
# L2: 10 bits = 1,024 entries × 8 bytes = 8KB
# L3: 10 bits = 1,024 entries × 8 bytes = 8KB
 
# Better: Use more levels to keep tables page-sized
# x86-64 four-level: 9-9-9-9-12 split
# Each table: 512 entries × 8 bytes = 4KB ✓
 
def analyze_three_level(address_bits, l1_bits, l2_bits, l3_bits):
    """Analyze memory overhead for three-level paging."""
    
    offset_bits = 12  # 4KB pages
    entry_size = 8    # 64-bit entries
    
    l1_entries = 2 ** l1_bits
    l2_entries = 2 ** l2_bits
    l3_entries = 2 ** l3_bits
    
    l1_table_size = l1_entries * entry_size
    l2_table_size = l2_entries * entry_size
    l3_table_size = l3_entries * entry_size
    
    # Coverage per table
    l3_coverage = l3_entries * (2 ** offset_bits)  # Bytes per L3 table
    l2_coverage = l2_entries * l3_coverage          # Bytes per L2 table
    l1_coverage = l1_entries * l2_coverage          # Total address space
    
    print(f"Three-Level Analysis: {address_bits}-bit addresses")
    print(f"Split: {l1_bits}-{l2_bits}-{l3_bits}-{offset_bits}")
    print(f"")
    print(f"L1 (Root) Table:")
    print(f"  Entries: {l1_entries:,}")
    print(f"  Size: {l1_table_size:,} bytes ({l1_table_size/1024:.1f} KB)")
    print(f"  Each entry covers: {l2_coverage / (1024**2):.1f} MB")
    print(f"")
    print(f"L2 (Middle) Tables:")
    print(f"  Entries per table: {l2_entries:,}")
    print(f"  Size per table: {l2_table_size:,} bytes ({l2_table_size/1024:.1f} KB)")
    print(f"  Each entry covers: {l3_coverage / (1024**2):.1f} MB")
    print(f"  Max L2 tables: {l1_entries:,}")
    print(f"")
    print(f"L3 (Bottom) Tables:")
    print(f"  Entries per table: {l3_entries:,}")
    print(f"  Size per table: {l3_table_size:,} bytes ({l3_table_size/1024:.1f} KB)")
    print(f"  Each entry covers: {2**offset_bits / 1024:.1f} KB (one page)")
    print(f"  Max L3 tables: {l1_entries * l2_entries:,}")
    
    return l1_table_size, l2_table_size, l3_table_size
 
 
# Example: 40-bit address with 8-10-10-12 split
print("=" * 60)
analyze_three_level(40, 8, 10, 10)
print()
print("=" * 60)
 
# Compare overhead for a sparse 20MB process:
# - Code: 8MB at 0x00400000
# - Heap: 8MB at 0x00800000  
# - Stack: 4MB at 0xFF000000
#
# L1 table: 2KB (always allocated)
# L2 tables needed: 3 (entries 1, 2, and 255)
# L3 tables needed: ~8 (2 per used region, covering 4MB each)
#
# Total: 2KB + 3×8KB + 8×8KB = 2KB + 24KB + 64KB = 90KB
# 
# Flat table for 40-bit: 2^28 entries × 8 bytes = 2GB per process!

Three-Level Memory Overhead Examples
Scenario	Used Memory	L1 Tables	L2 Tables	L3 Tables	Total Overhead
Minimal process (1MB)	1MB	1 (4KB)*	1 (4KB)	1 (4KB)	12KB
Small process (20MB)	20MB	1 (4KB)*	3 (12KB)	5 (20KB)	36KB
Medium process (100MB)	100MB	1 (4KB)*	4 (16KB)	25 (100KB)	120KB
Large process (1GB)	1GB	1 (4KB)*	~8 (32KB)	256 (1MB)	~1MB
Huge process (10GB)	10GB	1 (4KB)*	~40 (160KB)	2560 (10MB)	~10MB

*Assuming 4KB page-sized tables with 9-9-9-9-12 split (actual x86-64 uses 4 levels).

Key observations:

Overhead scales with usage, not address space — A process using 20MB needs ~36KB of page tables, regardless of whether the address space is 40-bit or 48-bit.
Small processes benefit most — A 1MB process has only 12KB overhead—a 1.2% overhead. This is vastly better than a flat table.
Large processes have reasonable overhead — Even a 10GB process needs only 10MB of page tables—0.1% overhead.
Three levels may not be enough — With a 48-bit address space, keeping L1 page-sized requires four levels, not three.

Scalable Design

Large Pages in Multi-Level Structures

How large pages work:

The entry at the upper level has a special flag set (e.g., PS = Page Size)
The entry directly contains the physical address of the large page
More bits from the virtual address become the offset within the page
No lower-level table access is needed

large_pages.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
// Large Page Support in Three-Level Paging
 
// Standard page sizes
#define PAGE_4KB        (1UL << 12)   // 4KB  - Standard page
#define PAGE_2MB        (1UL << 21)   // 2MB  - Large page (L3 skipped)
#define PAGE_1GB        (1UL << 30)   // 1GB  - Huge page (L2 & L3 skipped)
 
// Large page flag (PS bit in x86-64)
#define ENTRY_LARGE_PAGE    (1UL << 7)
 
// Detecting large pages
bool is_large_page_l2(uint64_t l2_entry) {
    return (l2_entry & ENTRY_LARGE_PAGE) != 0;
}
 
bool is_large_page_l1(uint64_t l1_entry) {
    return (l1_entry & ENTRY_LARGE_PAGE) != 0;
}
 
// Translation with large page support
uint64_t translate_with_large_pages(uint64_t virtual_address) {
    uint64_t l1_index = GET_L1(virtual_address);
    uint64_t l2_index = GET_L2(virtual_address);
    uint64_t l3_index = GET_L3(virtual_address);
    uint64_t offset_4kb = GET_OFFSET(virtual_address);
    
    // Access L1 table
    PageTable l1_table = phys_to_virt(cr3_register);
    uint64_t l1_entry = l1_table[l1_index];
    
    if (!(l1_entry & ENTRY_PRESENT)) {
        page_fault(virtual_address, FAULT_L1);
        return 0;
    }
    
    // Check for 1GB large page at L1
    if (is_large_page_l1(l1_entry)) {
        // L1 entry directly maps 1GB page
        // Offset = L2 index bits + L3 index bits + page offset
        uint64_t offset_1gb = (l2_index << 21) | (l3_index << 12) | offset_4kb;
        uint64_t frame_1gb = l1_entry & 0xFFFFFFC0000000UL;  // 1GB aligned
        return frame_1gb | offset_1gb;
    }
    
    // Access L2 table
    uint64_t l2_phys = l1_entry & ENTRY_ADDR_MASK;
    PageTable l2_table = phys_to_virt(l2_phys);
    uint64_t l2_entry = l2_table[l2_index];
    
    if (!(l2_entry & ENTRY_PRESENT)) {
        page_fault(virtual_address, FAULT_L2);
        return 0;
    }
    
    // Check for 2MB large page at L2
    if (is_large_page_l2(l2_entry)) {
        // L2 entry directly maps 2MB page
        // Offset = L3 index bits + page offset
        uint64_t offset_2mb = (l3_index << 12) | offset_4kb;
        uint64_t frame_2mb = l2_entry & 0xFFFFFFFFE00000UL;  // 2MB aligned
        return frame_2mb | offset_2mb;
    }
    
    // Standard 4KB page - continue to L3
    uint64_t l3_phys = l2_entry & ENTRY_ADDR_MASK;
    PageTable l3_table = phys_to_virt(l3_phys);
    uint64_t l3_entry = l3_table[l3_index];
    
    if (!(l3_entry & ENTRY_PRESENT)) {
        page_fault(virtual_address, FAULT_L3);
        return 0;
    }
    
    uint64_t frame_4kb = l3_entry & ENTRY_ADDR_MASK;
    return frame_4kb | offset_4kb;
}

Large Page Benefits

•Reduced TLB pressure — One TLB entry covers 2MB or 1GB instead of 4KB
•Faster translation — Fewer table lookups required
•Less page table memory — Fewer entries needed for large mappings
•Better cache performance — Contiguous physical memory improves locality

Large Page Challenges

•Allocation difficulty — Must find contiguous 2MB or 1GB physical regions
•Internal fragmentation — Waste if full page isn't used
•Alignment requirements — Virtual and physical addresses must be page-size aligned
•Memory pressure — Harder to reclaim large pages under memory pressure

Transparent Huge Pages (THP)

From Three Levels to Modern Four-Level Systems

While three-level paging works for address spaces up to around 40-42 bits, modern 64-bit systems typically support 48-bit addresses (256TB), requiring four levels to keep all tables page-sized.

Why x86-64 uses four levels:

With 48-bit addresses, 4KB pages, and 8-byte entries:

Address bits to index: 48 - 12 = 36 bits
If 3 levels: 36 / 3 = 12 bits per level → 4096 entries × 8 bytes = 32KB per table
If 4 levels: 36 / 4 = 9 bits per level → 512 entries × 8 bytes = 4KB per table ✓

Four levels with 9 bits each gives exactly page-sized tables, which:

Simplifies allocation (use the same allocator as for regular pages)
Enables swapping of page tables themselves
Keeps overhead predictable (always 4KB per table)

x86-64 Four-Level Paging Structure
Level	Name	Index Bits	Entries	Entry Size	Table Size	Coverage
1	PML4	39-47 (9 bits)	512	8 bytes	4KB	512GB per entry
2	PDPT	30-38 (9 bits)	512	8 bytes	4KB	1GB per entry
3	PD	21-29 (9 bits)	512	8 bytes	4KB	2MB per entry
4	PT	12-20 (9 bits)	512	8 bytes	4KB	4KB per entry
—	Offset	0-11 (12 bits)	4096	1 byte	—	Byte within page

The extension to five levels:

Intel's newer processors support 5-level paging for 57-bit addresses (128PB), adding a PML5 level:

5-Level: PML5 → PML4 → PDPT → PD → PT → Frame + Offset
         9 bits  9 bits  9 bits  9 bits  9 bits  12 bits = 57 bits

This provides headroom for future memory expansion while maintaining the elegant 4KB-per-table design.

Architectural comparison:

Architecture	Levels	Address Bits	Page Size	Entry Size
x86 (32-bit)	2	32	4KB	4 bytes
x86 PAE	3	36	4KB	8 bytes
x86-64	4	48	4KB	8 bytes
x86-64 (LA57)	5	57	4KB	8 bytes
ARM AArch32	2	32	4KB	4 bytes
ARM AArch64	3-4	39-48	4KB	8 bytes

Why Not More Levels?

Summary: Three-Level Paging

Three-level paging extends the hierarchical approach to accommodate larger address spaces while maintaining the memory efficiency benefits of multi-level structures.

Key Takeaways

•Two levels fail for large address spaces — The outer table becomes too large, negating memory savings.
•Three levels add intermediate indirection — An additional directory level keeps all tables manageable.
•Translation requires three memory accesses — L1, L2, and L3 tables must all be read before accessing data.
•Memory savings scale with sparsity — Page table overhead is proportional to used memory, not address space size.
•Large pages skip levels — 2MB and 1GB pages bypass lower levels, reducing overhead and TLB pressure.
•Modern systems use four or five levels — x86-64 uses four levels for 48-bit addresses, five for 57-bit.
•The TLB is essential — Without caching, multi-level translation overhead would be prohibitive.

What's next:

Page Complete

2 / 5