Loading content...
Of all the bits in a Page Table Entry, none is more fundamental than the Valid/Invalid bit (also called the Present bit on x86). This single bit answers the most basic question the MMU asks: Is this virtual page actually mapped to a physical frame right now?
When this bit is set (Valid=1, Present=1), the MMU proceeds with address translation. When it's clear (Valid=0, Present=0), the MMU stops, traps to the operating system, and lets software decide what to do. This simple mechanism—hardware checking one bit and trapping on failure—is the foundation of:
By the end of this page, you will understand the semantics of the valid/invalid bit, how it enables page faults, what the OS can do with the remaining bits when a page is invalid, and the design patterns that make demand-driven memory management possible.
The valid bit has simple but profound semantics:
Valid = 1 (Present):
Valid = 0 (Not Present):
12345678910111213141516171819202122232425262728293031323334
MMU Translation Algorithm (Simplified): function translate(virtual_address): vpn = extract_page_number(virtual_address) offset = extract_offset(virtual_address) pte = page_table[vpn] // May involve multi-level walk // THE CRITICAL CHECK if pte.valid == 0: raise PageFaultException(virtual_address, access_type) // Control transfers to OS // No translation returned // Page is valid - check permissions if access_type == WRITE and pte.read_only: raise ProtectionFaultException(virtual_address) if access_type == EXECUTE and pte.no_execute: raise ProtectionFaultException(virtual_address) if cpu_mode == USER and pte.supervisor_only: raise ProtectionFaultException(virtual_address) // All checks passed - perform translation frame_number = pte.pfn physical_address = (frame_number << PAGE_SHIFT) | offset // Update status bits pte.accessed = 1 if access_type == WRITE: pte.dirty = 1 return physical_addressThe Order of Checks Matters:
Notice that the valid bit is checked before any other bit. This is crucial because:
This ordering also simplifies hardware: a single bit check gates the entire translation logic.
Intel calls this the 'Present' bit (P); ARM calls it 'Valid'; academic literature often uses 'Valid/Invalid'. The semantics are identical: 1 = proceed with translation, 0 = trigger fault. We'll use these terms interchangeably.
When the MMU encounters an invalid PTE, it triggers a page fault—a synchronous exception that transfers control to the operating system. But what happens in that handler determines everything about virtual memory's power.
Page Fault Handler Responsibilities:
| Scenario | PTE State | VMA Status | OS Action |
|---|---|---|---|
| Demand page (first access) | Invalid, zeroed | Valid, anonymous | Allocate frame, zero it, map |
| Page swapped out | Invalid, swap entry | Valid | Read from swap, map |
| Memory-mapped file | Invalid, file offset | Valid, file-backed | Read from file, map |
| Copy-on-write | Valid, read-only | Valid, COW marked | Copy page, map writable |
| Stack growth | Invalid, no entry | Below stack limit | Extend stack, allocate |
| Invalid access | Invalid or N/A | Not in any VMA | Send SIGSEGV, terminate |
| Protection violation | Valid, read-only | Write attempted | Send SIGSEGV or COW copy |
The VMA (Virtual Memory Area) Layer:
The page table itself doesn't track why a page is invalid. That's the job of the Virtual Memory Area (VMA) data structures. Each process has a list/tree of VMAs describing valid regions:
struct vm_area_struct {
unsigned long vm_start; // Start virtual address
unsigned long vm_end; // End virtual address (exclusive)
unsigned long vm_flags; // Permissions (VM_READ, VM_WRITE, VM_EXEC)
struct file *vm_file; // Backing file (or NULL for anonymous)
pgoff_t vm_pgoff; // Offset within file
// ... more fields
};
When a page fault occurs:
Virtual memory has two levels of validity: (1) Hardware level: Is the PTE valid? (2) Software level: Is this address in a valid VMA? A page can be in a valid VMA but not currently present (demand paging). A page outside any VMA is truly invalid and will cause a segmentation fault.
Here's a powerful insight: when the valid bit is 0, the hardware ignores all other bits. This means the OS can use the entire remaining 63 bits (on x86-64) for any purpose. This is exactly what operating systems do to track swapped-out pages, file offsets, and other metadata.
Common Invalid PTE Encodings:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
/* Linux x86-64 Swap Entry Format (when Present=0) */ /* * Swap Entry Layout (64 bits): * * Bit 0 = 0 (always - indicates not present) * Bit 1 = Soft dirty flag (for memory tracking) * Bit 2 = Exclusive flag (only reference to swap slot) * Bits 3-7 = Swap type (identifies which swap device) * Bits 8-57 = Swap offset (location within swap device) * Bits 58-63 = Reserved * * This allows up to 32 swap devices, each up to 128TB */ #define SWP_TYPE_BITS 5#define SWP_OFFSET_BITS 50#define SWP_TYPE_SHIFT 3#define SWP_OFFSET_SHIFT 8 static inline swp_entry_t pte_to_swp_entry(pte_t pte) { swp_entry_t entry; entry.val = pte_val(pte) >> SWP_TYPE_SHIFT; return entry;} static inline int swp_type(swp_entry_t entry) { return (entry.val) & ((1 << SWP_TYPE_BITS) - 1);} static inline pgoff_t swp_offset(swp_entry_t entry) { return entry.val >> SWP_TYPE_BITS;} static inline pte_t swp_entry_to_pte(swp_entry_t entry) { pte_t pte; /* Shift back and ensure Present bit is 0 */ pte = __pte((entry.val << SWP_TYPE_SHIFT) & ~_PAGE_PRESENT); return pte;} /* Migration Entry (page being moved between NUMA nodes) */static inline bool is_migration_entry(swp_entry_t entry) { return swp_type(entry) == SWP_MIGRATION_READ || swp_type(entry) == SWP_MIGRATION_WRITE;} /* Special "none" PTE - completely unmapped */static inline bool pte_none(pte_t pte) { return pte_val(pte) == 0;}The Power of Encoding:
By encoding metadata in invalid PTEs, the kernel avoids maintaining a separate data structure for every swapped page. With potentially millions of swapped pages, this is a significant memory savings.
Types of Invalid PTEs in Linux:
| Type | Purpose | Bits Used |
|---|---|---|
| None | Never accessed, no mapping | All zeros |
| Swap | Page is in swap space | Type + Offset |
| Migration | Page being migrated | Source/dest node |
| Device | Page on DAX/persistent memory | Device info |
| Poisoned | Hardware memory error | Error marker |
Each type has a unique encoding, allowing the fault handler to determine exactly what action to take.
The fault handler must distinguish between a 'none' PTE (never mapped), a swap entry (was resident, now swapped), and other special types. On x86, a zero value means 'none'. Non-zero with Present=0 requires checking other bits to determine the specific type. Getting this wrong causes data corruption or security holes.
Demand paging is the technique of loading pages into memory only when they're actually accessed. This is enabled entirely by the valid bit: pages start as invalid, and the first access triggers a fault that loads them.
Why Demand Paging Matters:
The Demand Paging Flow:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
# Demand Paging for a New Memory-Mapped File def mmap_file(file, offset, length, protection): """Memory-map a file into the process address space.""" # 1. Find free virtual address range vaddr = find_free_vm_range(length) # 2. Create VMA describing the mapping vma = VMA( start=vaddr, end=vaddr + length, prot=protection, file=file, offset=offset ) add_vma_to_process(current_process, vma) # 3. DO NOT touch page tables yet! # All PTEs remain "none" (invalid, zero) return vaddr # Return immediately - no I/O performed def page_fault_handler(fault_addr, access_type): """Called by hardware when accessing invalid page.""" # 1. Find VMA for this address vma = find_vma(current_process, fault_addr) if not vma: send_signal(SIGSEGV) return # 2. Check permissions if not vma.permits(access_type): send_signal(SIGSEGV) # Permission denied return # 3. Allocate physical frame frame = allocate_frame() # 4. Populate frame with correct content if vma.is_file_backed: # Read from file file_offset = vma.offset + (fault_addr - vma.start) read_from_file(vma.file, file_offset, frame, PAGE_SIZE) else: # Anonymous page - zero fill zero_frame(frame) # 5. Create valid PTE pte = make_pte(frame, vma.protection) install_pte(current_process.page_table, fault_addr, pte) # 6. Resume execution at faulting instruction # This time, MMU finds valid PTE and succeedsLazy Zero-Fill:
When a program allocates memory (malloc → mmap), most systems don't immediately allocate or zero frames. Instead:
Many allocated pages are never accessed (over-allocation is common), so this saves both memory and CPU cycles.
The Zero Page Optimization:
Some systems map all anonymous read accesses to a single shared zero page (read-only). Only on first write do they allocate a private frame and copy zeros. This saves memory when programs allocate-but-don't-use large regions.
A program that loads a 100MB library doesn't wait for 100MB of disk I/O. Only the pages actually touched during execution are loaded. If the program uses 5MB of that library, only ~5MB is read from disk. This makes startup dramatically faster, especially on HDDs where random reads are slow.
The valid bit is essential to swapping—the ability to move pages between RAM and disk. When memory pressure is high, the OS evicts infrequently-used pages to swap space, marks their PTEs invalid, and reclaims the frames for other uses.
The Swap-Out Process:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
/* Simplified swap-out operation */void swap_out_page(struct page *page, pte_t *pte) { unsigned long swap_entry; /* 1. Flush TLB on all CPUs */ flush_tlb_page(vma, page_address(page)); /* 2. Check if page needs writing */ if (pte_dirty(*pte)) { /* 3. Allocate swap slot and write */ swap_entry = get_swap_slot(); write_to_swap(swap_entry, page); } else { /* Already in swap or never modified */ swap_entry = page->swap_entry; // Reuse existing } /* 4. Update PTE: store swap entry, clear Present */ *pte = swp_entry_to_pte(swap_entry); /* Present bit = 0, but contains swap location */ /* 5. Free the frame */ free_page(page);} /* Simplified swap-in operation (page fault handler) */void swap_in_page(pte_t *pte, unsigned long address) { swp_entry_t entry; struct page *page; /* 1. Extract swap entry from invalid PTE */ entry = pte_to_swp_entry(*pte); /* 2. Allocate fresh frame */ page = alloc_page(GFP_HIGHUSER); /* 3. Read from swap */ read_from_swap(entry, page); /* 4. Optional: free swap slot if exclusive */ if (swp_entry_exclusive(entry)) { free_swap_slot(entry); } /* 5. Update PTE: set frame number, set Present */ *pte = mk_pte(page, vma->vm_page_prot); /* 6. Execution resumes, instruction completes */}Swap-In on Access:
When a process accesses a swapped-out page:
From the process's perspective, this is invisible—memory just appears to work, though potentially with disk I/O latency.
File-backed pages (executables, mmap'd files) typically aren't written to swap—they can be re-read from the original file. Only 'anonymous' pages (heap, stack, non-file mmap) go to swap. This distinction affects which swap entry encodings are used in invalid PTEs.
Copy-on-Write (COW) is an elegant optimization that defers copying memory until absolutely necessary. It uses the protection bits rather than the valid bit, but the page fault mechanism is the same.
COW Fork Example:
When a process calls fork():
When either process writes to a shared page:
123456789101112131415161718192021222324252627282930313233343536373839
/* Copy-on-Write fault handler (simplified) */int handle_cow_fault(pte_t *pte, unsigned long address) { struct page *old_page, *new_page; /* Get the current (shared) page */ old_page = pte_page(*pte); /* Check if we're the only user */ if (page_count(old_page) == 1) { /* We're alone - just make it writable */ *pte = pte_mkwrite(*pte); flush_tlb_page(vma, address); return 0; } /* Multiple users - must copy */ /* 1. Allocate new page */ new_page = alloc_page(GFP_HIGHUSER); if (!new_page) return -ENOMEM; /* 2. Copy contents from old to new */ copy_page(page_address(new_page), page_address(old_page)); /* 3. Update PTE to point to new page, set writable */ *pte = mk_pte(new_page, vma->vm_page_prot); *pte = pte_mkwrite(*pte); /* 4. Flush TLB */ flush_tlb_page(vma, address); /* 5. Release reference to old page */ put_page(old_page); /* If count reaches 0, page is freed */ /* If count is 1, remaining user can now write directly */ return 0;}Why COW is Brilliant:
Fork is nearly instant: No matter how much memory parent uses, fork() is O(# of page tables), not O(memory size)
Many pages never copied: If child calls exec() immediately (common pattern: fork + exec), parent's pages are never duplicated
Read-only pages stay shared: Code pages, read-only data remain shared forever
Only modified pages are copied: The minimum necessary work is done
COW + Demand Paging Combined:
These optimizations stack. A child process after fork:
COW isn't just for fork(). It's used in: memory snapshots for databases, efficient cloning of virtual machines, implementing undo/redo in memory-mapped files, and creating lightweight containers (namespaces with COW memory).
The valid bit plays a critical role in security by ensuring processes can only access memory explicitly granted to them. Several security vulnerabilities relate directly to how invalid PTEs are handled.
Guard Pages:
Many systems place deliberately-invalid pages at memory region boundaries:
1234567891011121314151617181920212223242526272829303132333435363738394041
/* Guard page implementation */ void setup_stack_guard(struct mm_struct *mm) { /* * Stack layout: * * [Guard Page - Invalid] <- Access here = immediate fault * [Stack grows down ] * [ | ] * [ v ] * [Stack Bottom ] * [Guard Page - Invalid] <- Prevents underflow */ unsigned long guard_start = STACK_TOP - PAGE_SIZE; /* Create VMA for guard page - note VM_NONE for no permissions */ struct vm_area_struct *guard_vma = vm_area_alloc(mm); guard_vma->vm_start = guard_start; guard_vma->vm_end = STACK_TOP; guard_vma->vm_flags = VM_NONE; /* No access allowed */ insert_vm_struct(mm, guard_vma); /* PTEs in this range will remain invalid */ /* Any access immediately triggers SIGSEGV */} /* Null page protection */void protect_null_page(struct mm_struct *mm) { /* * Map page 0 (and possibly more) as inaccessible * Catches: NULL pointer dereference * ptr + large_offset that wraps to low addresses */ /* Linux default: first 64KB is unmapped */ mm->mmap_min_addr = 65536; /* Access to addresses < mmap_min_addr with NULL VMA = SIGSEGV */}ASLR and the Valid Bit:
Address Space Layout Randomization (ASLR) relies on most of the address space being invalid. When attackers can't predict where code or data is located:
Spectre/Meltdown Considerations:
Modern speculative execution attacks (Meltdown, Spectre) can sometimes leak data from pages even when the valid bit would prevent architectural access. This led to:
While architecturally an invalid page can't be accessed, speculative execution can access it before the fault is raised. Data can leak through cache timing side channels. Modern defenses assume the valid bit alone is insufficient for security-critical boundaries—additional measures like page table isolation are required.
Page faults are expensive operations—transitioning to kernel mode, handling the fault, potentially doing disk I/O, and returning to user mode takes thousands to millions of cycles. Understanding when faults occur helps optimize performance.
Page Fault Costs:
| Fault Type | I/O Required | Latency | Notes |
|---|---|---|---|
| Minor (anonymous, zero-fill) | None | 1-10 μs | Just allocate and map frame |
| Minor (COW) | None | 2-20 μs | Copy 4KB + allocate |
| Major (file-backed, cached) | None | 5-50 μs | Map existing page cache page |
| Major (file-backed, not cached) | SSD | 50-200 μs | 4KB random read from SSD |
| Major (file-backed, not cached) | HDD | 5-15 ms | 4KB random read from HDD |
| Major (swap in) | SSD | 50-200 μs | Read from swap on SSD |
| Major (swap in) | HDD | 5-15 ms | Read from swap on HDD |
Optimizing Fault Patterns:
Prefaulting: For known access patterns, explicitly touch pages before critical operations:
void prefault_range(void *start, size_t len) {
volatile char *p = start;
while (p < (char*)start + len) {
*p; // Read to trigger demand paging
p += PAGE_SIZE;
}
}
Huge Pages: Reducing the number of pages reduces faults proportionally. 2MB huge pages = 512× fewer faults than 4KB pages for the same memory range.
madvise() Hints:
madvise(addr, length, MADV_WILLNEED); // Prefetch pages
madvise(addr, length, MADV_DONTNEED); // Hint: can discard
madvise(addr, length, MADV_SEQUENTIAL); // Hint: read-ahead beneficial
mlockall(): Lock all pages into memory, preventing any future page faults:
mlockall(MCL_CURRENT | MCL_FUTURE);
Used in real-time systems where fault latency is unacceptable.
Use 'perf stat' to measure page faults in your application: perf stat -e page-faults,minor-faults,major-faults ./your_program. High major fault counts indicate disk I/O bottlenecks. High minor fault counts in hot paths may indicate inefficient memory access patterns or opportunities for prefaulting.
The valid bit is the simplest yet most powerful bit in the page table entry. Its binary nature—proceed or fault—enables the entire edifice of demand-driven virtual memory. Let's consolidate:
What's Next:
With the valid bit controlling presence, we now turn to protection bits—the R/W, U/S, and NX bits that control what operations are permitted on valid pages. These bits form the access control matrix that enforces process isolation and data security.
You now understand how the valid/invalid bit enables demand paging, swap integration, and copy-on-write—the core mechanisms of modern virtual memory. This single bit, checked on every memory access, is the foundation upon which efficient, secure, and flexible memory management is built.