Page Tables - Learning Module

Loading content...

0/227

Valid/Invalid Bit

The Gatekeeper of Virtual Memory

Of all the bits in a Page Table Entry, none is more fundamental than the Valid/Invalid bit (also called the Present bit on x86). This single bit answers the most basic question the MMU asks: Is this virtual page actually mapped to a physical frame right now?

When this bit is set (Valid=1, Present=1), the MMU proceeds with address translation. When it's clear (Valid=0, Present=0), the MMU stops, traps to the operating system, and lets software decide what to do. This simple mechanism—hardware checking one bit and trapping on failure—is the foundation of:

Demand paging: Load pages only when accessed
Lazy allocation: Defer allocation until actual use
Swapping: Move pages to disk and back transparently
Copy-on-write: Share pages until modification
Memory-mapped files: Map files to address space without loading all content

What You Will Learn

By the end of this page, you will understand the semantics of the valid/invalid bit, how it enables page faults, what the OS can do with the remaining bits when a page is invalid, and the design patterns that make demand-driven memory management possible.

Semantics of the Valid Bit

The valid bit has simple but profound semantics:

Valid = 1 (Present):

The PTE contains a valid physical frame number
The MMU uses this translation to compute the physical address
Protection bits (R/W, U/S, NX) are enforced
Accessed and Dirty bits may be set by hardware

Valid = 0 (Not Present):

The MMU immediately generates a page fault exception
No translation occurs—the PFN field is undefined/unused
Execution transfers to the OS page fault handler
The OS examines the faulting address and decides what to do

mmu_walk_pseudocode.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
MMU Translation Algorithm (Simplified):
 
function translate(virtual_address):
    vpn = extract_page_number(virtual_address)
    offset = extract_offset(virtual_address)
    
    pte = page_table[vpn]  // May involve multi-level walk
    
    // THE CRITICAL CHECK
    if pte.valid == 0:
        raise PageFaultException(virtual_address, access_type)
        // Control transfers to OS
        // No translation returned
    
    // Page is valid - check permissions
    if access_type == WRITE and pte.read_only:
        raise ProtectionFaultException(virtual_address)
    
    if access_type == EXECUTE and pte.no_execute:
        raise ProtectionFaultException(virtual_address)
    
    if cpu_mode == USER and pte.supervisor_only:
        raise ProtectionFaultException(virtual_address)
    
    // All checks passed - perform translation
    frame_number = pte.pfn
    physical_address = (frame_number << PAGE_SHIFT) | offset
    
    // Update status bits
    pte.accessed = 1
    if access_type == WRITE:
        pte.dirty = 1
    
    return physical_address

The Order of Checks Matters:

Notice that the valid bit is checked before any other bit. This is crucial because:

If the page is not present, the PFN field may contain garbage or OS metadata
Checking permissions on an invalid translation would be meaningless
The OS might be using the 'PFN' field to store swap location information

This ordering also simplifies hardware: a single bit check gates the entire translation logic.

Terminology Variation

Intel calls this the 'Present' bit (P); ARM calls it 'Valid'; academic literature often uses 'Valid/Invalid'. The semantics are identical: 1 = proceed with translation, 0 = trigger fault. We'll use these terms interchangeably.

What Happens When Invalid

When the MMU encounters an invalid PTE, it triggers a page fault—a synchronous exception that transfers control to the operating system. But what happens in that handler determines everything about virtual memory's power.

Page Fault Handler Responsibilities:

Determine the faulting address: Where did the access attempt occur?
Determine the access type: Read, write, or instruction fetch?
Look up the virtual address in process metadata: Is this a valid region?
Decide what action to take: Allocate, load from disk, copy, or kill the process?

Page Fault Scenarios and OS Response
Scenario	PTE State	VMA Status	OS Action
Demand page (first access)	Invalid, zeroed	Valid, anonymous	Allocate frame, zero it, map
Page swapped out	Invalid, swap entry	Valid	Read from swap, map
Memory-mapped file	Invalid, file offset	Valid, file-backed	Read from file, map
Copy-on-write	Valid, read-only	Valid, COW marked	Copy page, map writable
Stack growth	Invalid, no entry	Below stack limit	Extend stack, allocate
Invalid access	Invalid or N/A	Not in any VMA	Send SIGSEGV, terminate
Protection violation	Valid, read-only	Write attempted	Send SIGSEGV or COW copy

The VMA (Virtual Memory Area) Layer:

The page table itself doesn't track why a page is invalid. That's the job of the Virtual Memory Area (VMA) data structures. Each process has a list/tree of VMAs describing valid regions:

struct vm_area_struct {
    unsigned long vm_start;    // Start virtual address
    unsigned long vm_end;      // End virtual address (exclusive)
    unsigned long vm_flags;    // Permissions (VM_READ, VM_WRITE, VM_EXEC)
    struct file *vm_file;      // Backing file (or NULL for anonymous)
    pgoff_t vm_pgoff;          // Offset within file
    // ... more fields
};

When a page fault occurs:

Kernel looks up the faulting address in the VMA tree
If found, the VMA tells the kernel how to handle it
If not found, it's an invalid access (SIGSEGV)

Two-Level Validation

Virtual memory has two levels of validity: (1) Hardware level: Is the PTE valid? (2) Software level: Is this address in a valid VMA? A page can be in a valid VMA but not currently present (demand paging). A page outside any VMA is truly invalid and will cause a segmentation fault.

The Invalid PTE Format

Here's a powerful insight: when the valid bit is 0, the hardware ignores all other bits. This means the OS can use the entire remaining 63 bits (on x86-64) for any purpose. This is exactly what operating systems do to track swapped-out pages, file offsets, and other metadata.

Common Invalid PTE Encodings:

invalid_pte_formats.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
/* Linux x86-64 Swap Entry Format (when Present=0) */
 
/*
 * Swap Entry Layout (64 bits):
 * 
 * Bit 0     = 0 (always - indicates not present)
 * Bit 1     = Soft dirty flag (for memory tracking)
 * Bit 2     = Exclusive flag (only reference to swap slot)
 * Bits 3-7  = Swap type (identifies which swap device)
 * Bits 8-57 = Swap offset (location within swap device)
 * Bits 58-63 = Reserved
 * 
 * This allows up to 32 swap devices, each up to 128TB
 */
 
#define SWP_TYPE_BITS    5
#define SWP_OFFSET_BITS  50
#define SWP_TYPE_SHIFT   3
#define SWP_OFFSET_SHIFT 8
 
static inline swp_entry_t pte_to_swp_entry(pte_t pte) {
    swp_entry_t entry;
    entry.val = pte_val(pte) >> SWP_TYPE_SHIFT;
    return entry;
}
 
static inline int swp_type(swp_entry_t entry) {
    return (entry.val) & ((1 << SWP_TYPE_BITS) - 1);
}
 
static inline pgoff_t swp_offset(swp_entry_t entry) {
    return entry.val >> SWP_TYPE_BITS;
}
 
static inline pte_t swp_entry_to_pte(swp_entry_t entry) {
    pte_t pte;
    /* Shift back and ensure Present bit is 0 */
    pte = __pte((entry.val << SWP_TYPE_SHIFT) & ~_PAGE_PRESENT);
    return pte;
}
 
/* Migration Entry (page being moved between NUMA nodes) */
static inline bool is_migration_entry(swp_entry_t entry) {
    return swp_type(entry) == SWP_MIGRATION_READ ||
           swp_type(entry) == SWP_MIGRATION_WRITE;
}
 
/* Special "none" PTE - completely unmapped */
static inline bool pte_none(pte_t pte) {
    return pte_val(pte) == 0;
}

The Power of Encoding:

By encoding metadata in invalid PTEs, the kernel avoids maintaining a separate data structure for every swapped page. With potentially millions of swapped pages, this is a significant memory savings.

Types of Invalid PTEs in Linux:

Type	Purpose	Bits Used
None	Never accessed, no mapping	All zeros
Swap	Page is in swap space	Type + Offset
Migration	Page being migrated	Source/dest node
Device	Page on DAX/persistent memory	Device info
Poisoned	Hardware memory error	Error marker

Each type has a unique encoding, allowing the fault handler to determine exactly what action to take.

Distinguishing Invalid Types

The fault handler must distinguish between a 'none' PTE (never mapped), a swap entry (was resident, now swapped), and other special types. On x86, a zero value means 'none'. Non-zero with Present=0 requires checking other bits to determine the specific type. Getting this wrong causes data corruption or security holes.

Demand Paging

Demand paging is the technique of loading pages into memory only when they're actually accessed. This is enabled entirely by the valid bit: pages start as invalid, and the first access triggers a fault that loads them.

Why Demand Paging Matters:

Faster startup: Programs start executing before all code is loaded
Memory efficiency: Never-accessed code/data never consumes RAM
Virtual > Physical: Programs can use more virtual memory than physical RAM
Lazy initialization: Zero pages aren't allocated until written

The Demand Paging Flow:

demand_paging_flow.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# Demand Paging for a New Memory-Mapped File
 
def mmap_file(file, offset, length, protection):
    """Memory-map a file into the process address space."""
    
    # 1. Find free virtual address range
    vaddr = find_free_vm_range(length)
    
    # 2. Create VMA describing the mapping
    vma = VMA(
        start=vaddr,
        end=vaddr + length,
        prot=protection,
        file=file,
        offset=offset
    )
    add_vma_to_process(current_process, vma)
    
    # 3. DO NOT touch page tables yet!
    #    All PTEs remain "none" (invalid, zero)
    
    return vaddr  # Return immediately - no I/O performed
 
def page_fault_handler(fault_addr, access_type):
    """Called by hardware when accessing invalid page."""
    
    # 1. Find VMA for this address
    vma = find_vma(current_process, fault_addr)
    if not vma:
        send_signal(SIGSEGV)
        return
    
    # 2. Check permissions
    if not vma.permits(access_type):
        send_signal(SIGSEGV)  # Permission denied
        return
    
    # 3. Allocate physical frame
    frame = allocate_frame()
    
    # 4. Populate frame with correct content
    if vma.is_file_backed:
        # Read from file
        file_offset = vma.offset + (fault_addr - vma.start)
        read_from_file(vma.file, file_offset, frame, PAGE_SIZE)
    else:
        # Anonymous page - zero fill
        zero_frame(frame)
    
    # 5. Create valid PTE
    pte = make_pte(frame, vma.protection)
    install_pte(current_process.page_table, fault_addr, pte)
    
    # 6. Resume execution at faulting instruction
    #    This time, MMU finds valid PTE and succeeds

Lazy Zero-Fill:

When a program allocates memory (malloc → mmap), most systems don't immediately allocate or zero frames. Instead:

VMA is created marking the region as valid (logically)
All PTEs in the region are invalid (none)
First read from any page → allocate frame, fill with zeros, map
First write (typically) → same as read

Many allocated pages are never accessed (over-allocation is common), so this saves both memory and CPU cycles.

The Zero Page Optimization:

Some systems map all anonymous read accesses to a single shared zero page (read-only). Only on first write do they allocate a private frame and copy zeros. This saves memory when programs allocate-but-don't-use large regions.

Demand Paging Performance

A program that loads a 100MB library doesn't wait for 100MB of disk I/O. Only the pages actually touched during execution are loaded. If the program uses 5MB of that library, only ~5MB is read from disk. This makes startup dramatically faster, especially on HDDs where random reads are slow.

Swap Integration

The valid bit is essential to swapping—the ability to move pages between RAM and disk. When memory pressure is high, the OS evicts infrequently-used pages to swap space, marks their PTEs invalid, and reclaims the frames for other uses.

The Swap-Out Process:

Swap-Out Steps

•Select victim page: Page replacement algorithm chooses a page to evict
•Flush TLB: Remove any cached translations for this page
•Check dirty bit: Was the page modified since last swap-out?
•If dirty, write to swap: Copy frame contents to swap slot on disk
•Record swap location: Store swap type + offset in the (now invalid) PTE
•Clear valid bit: Mark the PTE as not present
•Free the frame: Return the physical frame to the free pool

swap_operations.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
/* Simplified swap-out operation */
void swap_out_page(struct page *page, pte_t *pte) {
    unsigned long swap_entry;
    
    /* 1. Flush TLB on all CPUs */
    flush_tlb_page(vma, page_address(page));
    
    /* 2. Check if page needs writing */
    if (pte_dirty(*pte)) {
        /* 3. Allocate swap slot and write */
        swap_entry = get_swap_slot();
        write_to_swap(swap_entry, page);
    } else {
        /* Already in swap or never modified */
        swap_entry = page->swap_entry;  // Reuse existing
    }
    
    /* 4. Update PTE: store swap entry, clear Present */
    *pte = swp_entry_to_pte(swap_entry);
    /* Present bit = 0, but contains swap location */
    
    /* 5. Free the frame */
    free_page(page);
}
 
/* Simplified swap-in operation (page fault handler) */
void swap_in_page(pte_t *pte, unsigned long address) {
    swp_entry_t entry;
    struct page *page;
    
    /* 1. Extract swap entry from invalid PTE */
    entry = pte_to_swp_entry(*pte);
    
    /* 2. Allocate fresh frame */
    page = alloc_page(GFP_HIGHUSER);
    
    /* 3. Read from swap */
    read_from_swap(entry, page);
    
    /* 4. Optional: free swap slot if exclusive */
    if (swp_entry_exclusive(entry)) {
        free_swap_slot(entry);
    }
    
    /* 5. Update PTE: set frame number, set Present */
    *pte = mk_pte(page, vma->vm_page_prot);
    
    /* 6. Execution resumes, instruction completes */
}

Swap-In on Access:

When a process accesses a swapped-out page:

MMU finds invalid PTE → page fault
Fault handler sees it's a swap entry (not none)
Allocate a new frame
Read page contents from swap space
Update PTE with new frame number, set valid
Resume instruction execution

From the process's perspective, this is invisible—memory just appears to work, though potentially with disk I/O latency.

Swap vs Page Cache

File-backed pages (executables, mmap'd files) typically aren't written to swap—they can be re-read from the original file. Only 'anonymous' pages (heap, stack, non-file mmap) go to swap. This distinction affects which swap entry encodings are used in invalid PTEs.

Copy-on-Write Mechanism

Copy-on-Write (COW) is an elegant optimization that defers copying memory until absolutely necessary. It uses the protection bits rather than the valid bit, but the page fault mechanism is the same.

COW Fork Example:

When a process calls fork():

Child's page table is created with copies of parent's PTEs
All writable PTEs in both parent and child are marked read-only
A reference count on each page is incremented
No actual memory copying occurs yet!

When either process writes to a shared page:

MMU triggers protection fault (write to read-only)
Fault handler detects COW situation (page marked COW, refcount > 1)
Handler allocates new frame, copies page contents
Updates the writing process's PTE to point to new frame (read-write)
Decrements refcount on original page
Resumes execution

cow_fault_handler.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/* Copy-on-Write fault handler (simplified) */
int handle_cow_fault(pte_t *pte, unsigned long address) {
    struct page *old_page, *new_page;
    
    /* Get the current (shared) page */
    old_page = pte_page(*pte);
    
    /* Check if we're the only user */
    if (page_count(old_page) == 1) {
        /* We're alone - just make it writable */
        *pte = pte_mkwrite(*pte);
        flush_tlb_page(vma, address);
        return 0;
    }
    
    /* Multiple users - must copy */
    
    /* 1. Allocate new page */
    new_page = alloc_page(GFP_HIGHUSER);
    if (!new_page)
        return -ENOMEM;
    
    /* 2. Copy contents from old to new */
    copy_page(page_address(new_page), page_address(old_page));
    
    /* 3. Update PTE to point to new page, set writable */
    *pte = mk_pte(new_page, vma->vm_page_prot);
    *pte = pte_mkwrite(*pte);
    
    /* 4. Flush TLB */
    flush_tlb_page(vma, address);
    
    /* 5. Release reference to old page */
    put_page(old_page);
    /* If count reaches 0, page is freed */
    /* If count is 1, remaining user can now write directly */
    
    return 0;
}

Why COW is Brilliant:

Fork is nearly instant: No matter how much memory parent uses, fork() is O(# of page tables), not O(memory size)
Many pages never copied: If child calls exec() immediately (common pattern: fork + exec), parent's pages are never duplicated
Read-only pages stay shared: Code pages, read-only data remain shared forever
Only modified pages are copied: The minimum necessary work is done

COW + Demand Paging Combined:

These optimizations stack. A child process after fork:

Has PTEs copied from parent
Many of those PTEs are already invalid (demand paging)
The valid ones are marked read-only (COW)
Child touches very few pages before exec()
Result: Fork is fast, memory-efficient, and simple to implement

COW Everywhere

COW isn't just for fork(). It's used in: memory snapshots for databases, efficient cloning of virtual machines, implementing undo/redo in memory-mapped files, and creating lightweight containers (namespaces with COW memory).

Invalid Bit and Security

The valid bit plays a critical role in security by ensuring processes can only access memory explicitly granted to them. Several security vulnerabilities relate directly to how invalid PTEs are handled.

Guard Pages:

Many systems place deliberately-invalid pages at memory region boundaries:

Stack guard pages: Invalid page between stack and other memory; overflow triggers fault immediately rather than corrupting heap
Heap guard pages: Surrounding allocations to detect overflow/underflow
Null page guard: First few pages of address space kept invalid to catch null pointer dereferences

guard_pages.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/* Guard page implementation */
 
void setup_stack_guard(struct mm_struct *mm) {
    /*
     * Stack layout:
     * 
     * [Guard Page - Invalid]  <- Access here = immediate fault
     * [Stack grows down   ]
     * [     |             ]
     * [     v             ]
     * [Stack Bottom       ]
     * [Guard Page - Invalid]  <- Prevents underflow
     */
    
    unsigned long guard_start = STACK_TOP - PAGE_SIZE;
    
    /* Create VMA for guard page - note VM_NONE for no permissions */
    struct vm_area_struct *guard_vma = vm_area_alloc(mm);
    guard_vma->vm_start = guard_start;
    guard_vma->vm_end = STACK_TOP;
    guard_vma->vm_flags = VM_NONE;  /* No access allowed */
    
    insert_vm_struct(mm, guard_vma);
    
    /* PTEs in this range will remain invalid */
    /* Any access immediately triggers SIGSEGV */
}
 
/* Null page protection */
void protect_null_page(struct mm_struct *mm) {
    /*
     * Map page 0 (and possibly more) as inaccessible
     * Catches: NULL pointer dereference
     *          ptr + large_offset that wraps to low addresses
     */
    
    /* Linux default: first 64KB is unmapped */
    mm->mmap_min_addr = 65536;
    
    /* Access to addresses < mmap_min_addr with NULL VMA = SIGSEGV */
}

ASLR and the Valid Bit:

Address Space Layout Randomization (ASLR) relies on most of the address space being invalid. When attackers can't predict where code or data is located:

Valid regions are islands in a sea of invalid space
Guessing wrong triggers fault → detected/prevented attack
Attack surface dramatically reduced

Spectre/Meltdown Considerations:

Modern speculative execution attacks (Meltdown, Spectre) can sometimes leak data from pages even when the valid bit would prevent architectural access. This led to:

KPTI (Kernel Page Table Isolation): Separate page tables for user/kernel mode, not just relying on U/S bit
Not mapping kernel in user page tables: Invalid PTEs for entire kernel region when in user mode

Invalid Isn't Invisible

While architecturally an invalid page can't be accessed, speculative execution can access it before the fault is raised. Data can leak through cache timing side channels. Modern defenses assume the valid bit alone is insufficient for security-critical boundaries—additional measures like page table isolation are required.

Performance Implications

Page faults are expensive operations—transitioning to kernel mode, handling the fault, potentially doing disk I/O, and returning to user mode takes thousands to millions of cycles. Understanding when faults occur helps optimize performance.

Page Fault Costs:

Approximate Page Fault Latencies
Fault Type	I/O Required	Latency	Notes
Minor (anonymous, zero-fill)	None	1-10 μs	Just allocate and map frame
Minor (COW)	None	2-20 μs	Copy 4KB + allocate
Major (file-backed, cached)	None	5-50 μs	Map existing page cache page
Major (file-backed, not cached)	SSD	50-200 μs	4KB random read from SSD
Major (file-backed, not cached)	HDD	5-15 ms	4KB random read from HDD
Major (swap in)	SSD	50-200 μs	Read from swap on SSD
Major (swap in)	HDD	5-15 ms	Read from swap on HDD

Optimizing Fault Patterns:

Prefaulting: For known access patterns, explicitly touch pages before critical operations:

void prefault_range(void *start, size_t len) {
    volatile char *p = start;
    while (p < (char*)start + len) {
        *p;  // Read to trigger demand paging
        p += PAGE_SIZE;
    }
}

Huge Pages: Reducing the number of pages reduces faults proportionally. 2MB huge pages = 512× fewer faults than 4KB pages for the same memory range.

madvise() Hints:

madvise(addr, length, MADV_WILLNEED);  // Prefetch pages
madvise(addr, length, MADV_DONTNEED); // Hint: can discard
madvise(addr, length, MADV_SEQUENTIAL); // Hint: read-ahead beneficial

mlockall(): Lock all pages into memory, preventing any future page faults:

mlockall(MCL_CURRENT | MCL_FUTURE);

Used in real-time systems where fault latency is unacceptable.

Measuring Page Faults

Use 'perf stat' to measure page faults in your application: perf stat -e page-faults,minor-faults,major-faults ./your_program. High major fault counts indicate disk I/O bottlenecks. High minor fault counts in hot paths may indicate inefficient memory access patterns or opportunities for prefaulting.

Summary: Valid/Invalid Bit

The valid bit is the simplest yet most powerful bit in the page table entry. Its binary nature—proceed or fault—enables the entire edifice of demand-driven virtual memory. Let's consolidate:

Key Takeaways

•Valid=1 means proceed — MMU uses PTE for translation; protection bits are checked.
•Valid=0 means fault — Control transfers to OS; remaining bits are OS-defined.
•Invalid PTEs store metadata — Swap location, file offset, or simply 'none'.
•Demand paging uses invalidity — Pages start invalid; first access faults and loads.
•Swap uses invalidity — Evicted pages become invalid with swap location encoded.
•COW uses protection faults — Shared pages are valid but read-only; writes fault.
•Security leverages invalidity — Guard pages, ASLR, null protection all use invalid regions.
•Faults have cost — Minor faults are fast; major faults involve disk I/O.

What's Next:

With the valid bit controlling presence, we now turn to protection bits—the R/W, U/S, and NX bits that control what operations are permitted on valid pages. These bits form the access control matrix that enforces process isolation and data security.

Page Complete

You now understand how the valid/invalid bit enables demand paging, swap integration, and copy-on-write—the core mechanisms of modern virtual memory. This single bit, checked on every memory access, is the foundation upon which efficient, secure, and flexible memory management is built.