Loading content...
Every page table structure we've examined so far shares a common assumption: the table is organized around virtual addresses. Whether linear, multi-level, or hashed, these tables answer the question: Given a virtual page number, what physical frame contains it?
Inverted page tables challenge this fundamental assumption by literally inverting the question: Given a physical frame, which virtual page (and which process) is mapped to it?
This may seem like a semantic distinction, but it has profound implications. Instead of maintaining a page table per process—each potentially large—an inverted page table maintains a single system-wide table with one entry per physical frame. For systems with large virtual address spaces but limited physical memory, this inversion can reduce page table memory overhead from terabytes to megabytes.
Inverted page tables represent one of the most elegant examples of how changing the perspective on a problem can lead to dramatically different solutions with entirely different trade-off profiles.
By the end of this page, you will understand the inverted page table architecture in depth, how address translation works through search rather than direct indexing, the role of hashing in making lookups efficient, and the fundamental trade-offs between per-process and system-wide table organizations.
To understand why inverted page tables exist, we must reconsider what page tables fundamentally represent.
Conventional Page Tables: The Virtual View
A conventional page table is a function:
f: VirtualPageNumber → PhysicalFrameNumber
The table is indexed by VPN, and each entry stores the corresponding PFN (if mapped). The domain of this function is the virtual address space.
The Problem:
For a 64-bit system with 4KB pages, the virtual address space contains 2⁵² virtual pages. Even with multi-level tables that only allocate entries for mapped regions, the overhead grows with the number of processes and their address space spans:
Scenario: 1000 processes, 1GB average virtual footprint
Per-process page table overhead (multi-level, estimated):
~1MB per process for page table structures
Total system page table memory:
1000 × 1MB = 1GB just for page tables!
Worse, much of this structure represents potential mappings that don't exist. Only the portions corresponding to actually-mapped pages contribute useful translations.
Inverted Page Tables: The Physical View
An inverted page table inverts the mapping:
g: PhysicalFrameNumber → (ProcessID, VirtualPageNumber)
The table is indexed by PFN, and each entry identifies which process and virtual page occupy that frame. The domain is now the physical memory size.
The Key Insight:
Physical memory is MUCH smaller than virtual address space.
Typical system:
Physical memory: 16GB
Virtual address space: 2⁶⁴ bytes = 16 exabytes
Ratio: 16GB / 16EB = 10⁻⁹ (one billionth)
With 4KB pages:
Physical frames: 16GB / 4KB = 4 million frames
Page table entries: 4 million × entry_size
With 32-byte entries:
Total table size: 4M × 32 = 128MB
For ALL processes combined!
This is the fundamental advantage: inverted page table size is proportional to physical memory, not virtual address space or number of processes.
| Configuration | Conventional (per-process) | Inverted (system-wide) |
|---|---|---|
| Single process, 100MB mapped | ~100KB | Entire table (128MB here) |
| 100 processes, 1GB each | ~100MB total | Entire table (128MB) |
| 1000 processes, 1GB each | ~1GB total | Entire table (128MB) |
| 10000 processes, 1GB each | ~10GB total | Entire table (128MB) |
Inverted page tables become increasingly advantageous as the number of processes grows and virtual address spaces become sparser. For systems with many processes sharing limited physical memory (servers, mainframes), the constant memory overhead regardless of process count is compelling.
An inverted page table is an array with exactly one entry per physical frame in the system. If physical memory contains N frames, the inverted page table has N entries.
Entry Structure:
Each entry must contain enough information to:
┌─────────────────────────────────────────────────────────────────┐
│ Inverted Page Table Entry (32 bytes typical) │
├─────────────────────────────────────────────────────────────────┤
│ Process ID (PID) │ 16-32 bits │
│ - Identifies process owner │ │
├─────────────────────────────────────────────────────────────────┤
│ Virtual Page Number (VPN) │ 52 bits (for 64-bit VA) │
│ - Page within process's space │ │
├─────────────────────────────────────────────────────────────────┤
│ Control Bits │ 16-32 bits │
│ - Valid, Read, Write, Execute │ │
│ - Dirty, Accessed, Cached │ │
├─────────────────────────────────────────────────────────────────┤
│ Chain Pointer/Index │ 32-64 bits │
│ - For hash collision chains │ │
└─────────────────────────────────────────────────────────────────┘
The Table Array:
Inverted Page Table:
Frame 0: [PID: 42, VPN: 0x0000FF, Controls, Chain]
Frame 1: [PID: 17, VPN: 0x003A00, Controls, Chain]
Frame 2: [EMPTY - frame unused]
Frame 3: [PID: 42, VPN: 0x000100, Controls, Chain]
Frame 4: [PID: 99, VPN: 0x1F0000, Controls, Chain]
... ...
Frame N-1: [PID: 17, VPN: 0x003A01, Controls, Chain]
Each entry at index i describes what's stored in physical frame i.
Why PID is Required:
Unlike conventional page tables (one per process), the inverted table is shared. Multiple processes may use the same virtual page number for different purposes:
Without the PID, we couldn't distinguish which mapping belongs to which process. The (PID, VPN) tuple uniquely identifies a mapping.
Address Space Identifier (ASID):
Some architectures use an Address Space Identifier instead of (or in addition to) PID:
123456789101112131415161718192021222324252627282930313233343536
// Inverted page table data structures #include <stdint.h>#include <stdbool.h> // Entry in the inverted page tabletypedef struct { uint16_t pid; // Process ID (or ASID) uint16_t padding; // Alignment padding uint32_t chain_next; // Index of next entry in hash chain // (0xFFFFFFFF = end of chain) uint64_t vpn; // Virtual page number uint32_t control; // Protection and status bits uint32_t reserved; // Reserved for future use} InvertedPageTableEntry; // Total: 24 bytes // The inverted page tabletypedef struct { InvertedPageTableEntry* entries; // Array indexed by frame number uint32_t num_frames; // Total physical frames uint32_t* hash_anchors; // Hash table for fast lookup uint32_t num_hash_buckets; // Size of hash table uint64_t hash_seed; // Hash function seed} InvertedPageTable; // Control bit definitions#define IPT_VALID (1 << 0) // Entry is valid#define IPT_READ (1 << 1) // Page readable#define IPT_WRITE (1 << 2) // Page writable#define IPT_EXECUTE (1 << 3) // Page executable#define IPT_USER (1 << 4) // User-mode accessible#define IPT_DIRTY (1 << 5) // Page modified#define IPT_ACCESSED (1 << 6) // Page accessed#define IPT_WIRED (1 << 7) // Page cannot be evicted #define IPT_CHAIN_END 0xFFFFFFFF // End of hash chain markerThe inverted page table's defining characteristic is that array index equals frame number. Entry IPT[i] describes what's in physical frame i. This direct correspondence makes frame-to-mapping lookup O(1)—useful for operations like page replacement that need to identify which process owns a frame being evicted.
Address translation with inverted page tables fundamentally differs from conventional tables. We cannot simply index into the table with the VPN—we must search for the matching (PID, VPN) entry.
The Naive Approach (Linear Search):
Given virtual address VA from process P:
1. Extract VPN from VA: vpn = VA >> PAGE_OFFSET_BITS
2. For each frame f from 0 to NUM_FRAMES-1:
if IPT[f].valid AND IPT[f].pid == P AND IPT[f].vpn == vpn:
return (f << PAGE_OFFSET_BITS) | (VA & PAGE_OFFSET_MASK)
3. Return PAGE_FAULT
Problem: This is O(N) where N is the number of physical frames. For a 16GB system with 4 million frames, checking each entry is catastrophically slow—milliseconds per translation compared to nanoseconds needed.
Solution: Hash-Assisted Lookup
To achieve acceptable performance, inverted page tables use a hash table that maps (PID, VPN) to frame number:
1. Compute hash: bucket = hash(pid, vpn) mod num_buckets
2. Follow chain from hash_anchors[bucket]
3. For each frame f in chain:
if IPT[f].pid == pid AND IPT[f].vpn == vpn:
return f
4. Return PAGE_FAULT (not mapped)
This reduces average lookup to O(1 + α) where α is the hash table load factor.
The Hash Table Structure:
The inverted page table uses a separate hash table (called the hash anchor table) that maps (PID, VPN) pairs to frame numbers. This hash table uses chaining, but critically, the chains are embedded in the inverted page table entries themselves.
Hash Anchor Table: Inverted Page Table:
┌─────────┐
│Bucket 0 │────┐ ┌─────────────────────────────────┐
├─────────┤ │ │ Frame 0: PID:42, VPN:0xFF │
│Bucket 1 │──┐ │ │ chain_next: 7 │
├─────────┤ │ │ ├─────────────────────────────────┤
│Bucket 2 │ │ └─────────────→│ Frame 1: PID:17, VPN:0x3A00 │
├─────────┤ │ │ chain_next: END │
│ ... │ │ ├─────────────────────────────────┤
│Bucket N │ │ │ ... │
└─────────┘ │ ├─────────────────────────────────┤
└───────────────→│ Frame 7: PID:89, VPN:0x1000 │
│ chain_next: 42 │
├─────────────────────────────────┤
│ ... │
├─────────────────────────────────┤
│ Frame 42: PID:12, VPN:0x8080 │
│ chain_next: END │
└─────────────────────────────────┘
Dual Indexing:
The inverted page table supports two access patterns:
This dual capability is powerful but requires maintaining the hash chains.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
// Complete address translation using inverted page table #include "inverted_page_table.h" // Hash function for (PID, VPN) pairsstatic inline uint32_t ipt_hash(uint16_t pid, uint64_t vpn, uint64_t seed, uint32_t num_buckets) { // Combine PID and VPN uint64_t combined = ((uint64_t)pid << 48) | (vpn & 0xFFFFFFFFFFFFULL); // Apply mixing function combined ^= seed; combined ^= combined >> 33; combined *= 0xFF51AFD7ED558CCDULL; combined ^= combined >> 33; combined *= 0xC4CEB9FE1A85EC53ULL; combined ^= combined >> 33; return combined % num_buckets;} // Translate virtual address to physical addresstypedef struct { bool success; uint64_t physical_address; uint32_t control; uint32_t frame_number; int chain_depth; // For performance monitoring} IPTTranslationResult; IPTTranslationResult ipt_translate(InvertedPageTable* ipt, uint16_t pid, uint64_t virtual_address) { IPTTranslationResult result = {0}; // Extract VPN and offset const uint64_t PAGE_OFFSET_BITS = 12; const uint64_t PAGE_OFFSET_MASK = (1ULL << PAGE_OFFSET_BITS) - 1; uint64_t vpn = virtual_address >> PAGE_OFFSET_BITS; uint64_t offset = virtual_address & PAGE_OFFSET_MASK; // Compute hash bucket uint32_t bucket = ipt_hash(pid, vpn, ipt->hash_seed, ipt->num_hash_buckets); // Get chain head from hash anchor table uint32_t frame = ipt->hash_anchors[bucket]; // Traverse chain while (frame != IPT_CHAIN_END) { result.chain_depth++; InvertedPageTableEntry* entry = &ipt->entries[frame]; if ((entry->control & IPT_VALID) && entry->pid == pid && entry->vpn == vpn) { // Found matching entry! result.success = true; result.physical_address = ((uint64_t)frame << PAGE_OFFSET_BITS) | offset; result.control = entry->control; result.frame_number = frame; // Update accessed bit (atomically in real implementation) entry->control |= IPT_ACCESSED; return result; } // Follow chain frame = entry->chain_next; } // Not found - page fault result.success = false; return result;} // Insert mapping (when handling page fault)bool ipt_insert(InvertedPageTable* ipt, uint16_t pid, uint64_t vpn, uint32_t frame, uint32_t control) { // Validate frame number if (frame >= ipt->num_frames) { return false; } // Verify frame is not already in use if (ipt->entries[frame].control & IPT_VALID) { return false; // Frame occupied } // Fill entry InvertedPageTableEntry* entry = &ipt->entries[frame]; entry->pid = pid; entry->vpn = vpn; entry->control = control | IPT_VALID; // Add to hash chain (insert at head) uint32_t bucket = ipt_hash(pid, vpn, ipt->hash_seed, ipt->num_hash_buckets); entry->chain_next = ipt->hash_anchors[bucket]; ipt->hash_anchors[bucket] = frame; return true;}Even with hash-assisted lookup, inverted page table translation is fundamentally a SEARCH operation, not an array INDEX operation. This means translation latency varies with hash chain length and cache behavior. Hardware TLB caching is even more critical for inverted tables because the software translation path is more complex.
The IBM PowerPC architecture provides the most well-known production implementation of inverted page tables. Understanding its design reveals practical solutions to the theoretical challenges.
PowerPC HTAB (Hashed Page Table):
PowerPC uses a structure called the HTAB (Hashed Page Table) that combines inverted page table concepts with hashing:
Structure:
The HTEG Concept:
Instead of simple chaining, PowerPC groups 8 entries into an HTEG (Hashed Page Table Entry Group). Each translation can be stored in one of two HTEGs:
Primary HTEG: hash₁(VSID, VPN)
Secondary HTEG: hash₂(VSID, VPN) = NOT(hash₁)
VSID: Virtual Segment ID (identifies address space)
VPN: Virtual Page Number within segment
This dual-hashing approach provides two chances to find space, reducing overflow handling.
PowerPC Page Table Entry (PTE):
64-bit PowerPC PTE (16 bytes total):
┌─────────────────────────────────────────────────────────────────┐
│ First Doubleword (8 bytes) │
├─────────────────────────────────────────────────────────────────┤
│ V │ AVPN (Abbreviated VPN) │ SW │ L │
│ 1 │ 57 bits │ 1 │ 1 │
├─────────────────────────────────────────────────────────────────┤
│ Second Doubleword (8 bytes) │
├─────────────────────────────────────────────────────────────────┤
│ AC │ R │ C │ WIMG │ N │ PP │ RPN │
│ 1 │ 1 │ 1 │ 4 │ 1 │ 2 │ 52 bits │
└─────────────────────────────────────────────────────────────────┘
V: Valid bit
AVPN: Abbreviated Virtual Page Number (for matching)
SW: Software-defined bit
L: Large page
AC: Address compare (for watchpoints)
R: Referenced (accessed)
C: Changed (dirty)
WIMG: Memory/cache attributes
N: No-execute
PP: Protection bits
RPN: Real Page Number (physical frame)
Hardware Translation Process:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
// PowerPC-style hashed page table concepts #include <stdint.h> #define HTEG_SIZE 8 // 8 PTEs per group#define PTE_SIZE 16 // 16 bytes per PTE typedef struct { uint64_t word0; // V, AVPN, SW, L bits uint64_t word1; // RPN, protection, WIMG, R, C bits} PowerPCPTE; typedef struct { PowerPCPTE entries[HTEG_SIZE]; // 8 entries per group} HTEG; // 128 bytes total - one cache line in some systems // Extract fields from PTE word 0#define PTE_VALID(pte) (((pte)->word0 >> 63) & 1)#define PTE_AVPN(pte) (((pte)->word0 >> 7) & 0x1FFFFFFFFFFFFFFULL)#define PTE_LARGE(pte) ((pte)->word0 & 1) // Extract fields from PTE word 1#define PTE_RPN(pte) (((pte)->word1 >> 12))#define PTE_R_BIT(pte) (((pte)->word1 >> 8) & 1) // Referenced#define PTE_C_BIT(pte) (((pte)->word1 >> 7) & 1) // Changed#define PTE_PP(pte) ((pte)->word1 & 0x3) // Protection // Compute HTEG addressestypedef struct { uint64_t primary; uint64_t secondary;} HTEGAddresses; HTEGAddresses compute_hteg_addresses(uint64_t vsid, uint64_t vpn, uint64_t htab_base, uint64_t htab_size_mask) { HTEGAddresses addrs; // Primary hash uint64_t hash1 = (vsid ^ vpn) & htab_size_mask; addrs.primary = htab_base + (hash1 * sizeof(HTEG)); // Secondary hash (ones complement of primary) uint64_t hash2 = (~hash1) & htab_size_mask; addrs.secondary = htab_base + (hash2 * sizeof(HTEG)); return addrs;} // Search HTEG for matching PTEPowerPCPTE* search_hteg(HTEG* hteg, uint64_t avpn_to_match) { for (int i = 0; i < HTEG_SIZE; i++) { PowerPCPTE* pte = &hteg->entries[i]; if (PTE_VALID(pte) && PTE_AVPN(pte) == avpn_to_match) { return pte; // Match found } } return NULL; // Not in this HTEG} // Full translation (simplified)uint64_t powerpc_translate(uint64_t vsid, uint64_t effective_addr, void* htab_base, uint64_t htab_size_mask) { uint64_t vpn = effective_addr >> 12; // 4KB pages uint64_t avpn = (vsid << 23) | (vpn >> 11); // Abbreviated VPN HTEGAddresses addrs = compute_hteg_addresses( vsid, vpn, (uint64_t)htab_base, htab_size_mask); // Check primary HTEG PowerPCPTE* pte = search_hteg((HTEG*)addrs.primary, avpn); if (pte != NULL) { return (PTE_RPN(pte) << 12) | (effective_addr & 0xFFF); } // Check secondary HTEG pte = search_hteg((HTEG*)addrs.secondary, avpn); if (pte != NULL) { return (PTE_RPN(pte) << 12) | (effective_addr & 0xFFF); } // Not found - trigger page fault return TRANSLATION_FAULT;}Grouping 8 entries into an HTEG allows the hardware to fetch all candidates with a single memory access (if aligned to cache line). This parallelism compensates for the search-based lookup. Additionally, the two-hash scheme with 8 entries each means 16 slots are available before any overflow handling is needed—sufficient for typical collision rates.
Inverted page tables offer unique advantages that make them particularly suitable for certain system configurations. Understanding these advantages helps identify when inverted tables are the right choice.
Primary Advantages:
Ideal Application Scenarios:
1. Server Systems with Many Processes:
Web servers, database servers, and application servers commonly run thousands of processes. Each process at minimal uses some page table memory. With inverted tables, server can support more processes without page table memory explosion.
2. Large Physical Memory Systems:
Mainframes and high-end servers with terabytes of RAM benefit from the fixed-size nature. The inverted table is proportional to this RAM, while conventional tables would grow with process count.
3. Real-Time Systems:
The bounded, predictable page table size aids real-time memory budgeting. No surprises from process page table growth.
4. Virtualization Hosts:
Hypervisors manage many VMs, each with its own address space. Inverted tables help contain the page table overhead explosion.
5. Systems with 64-bit Addresses but Limited Physical Memory:
IoT and embedded 64-bit systems may have gigabytes of physical RAM but full 64-bit virtual addresses. Inverted tables scale with the small physical memory, not the huge VA space.
| Workload Characteristic | Inverted Advantage | Conventional Advantage |
|---|---|---|
| Many processes | ✓ Fixed overhead | |
| Large VA spaces | ✓ No VA scaling | |
| Big physical memory | ✓ Simpler than large IPT | |
| Shared pages | ✓ Easier sharing (see next page) | |
| Fast context switch | ✓ No table switch | ✓ Per-process ASID |
| Page replacement | ✓ O(1) frame lookup | Needs reverse map |
| Dense VA usage | ✓ Direct indexing faster |
Inverted page tables were pioneered by IBM for System/38 (1978) and later used in AS/400 and PowerPC architectures. HP PA-RISC also used inverted tables. While x86-64 uses multi-level conventional tables, understanding inverted tables remains valuable for kernel developers and system architects evaluating trade-offs.
Inverted page tables are not universally superior. Several significant disadvantages limit their applicability and explain why conventional tables remain dominant in many systems.
Core Disadvantages:
The Sharing Problem (Preview):
This is the most significant limitation. Consider shared libraries used by 1000 processes:
Conventional Tables:
Process 1 page table: VPN 0x4000 → Frame 100
Process 2 page table: VPN 0x4000 → Frame 100
... ... ... ...
Process 1000 page table: VPN 0x4000 → Frame 100
(Easy: each process has independent mapping to same frame)
Inverted Table:
IPT[100]: ??? PID and VPN can only store ONE process!
- Can't store 1000 different (PID, VPN) pairs in one entry
- Must use additional data structures
- Breaks the fundamental single-entry-per-frame property
Sharing in inverted tables requires supplementary structures like per-process shared mapping tables or reference-counted sharing lists, adding complexity.
Performance Considerations:
| Step | 4-Level Conventional | Inverted + Hash |
|---|---|---|
| Address parsing | ~2 cycles | ~2 cycles |
| First table access | ~100 cycles | Hash: ~10 cycles |
| Second table access | ~100 cycles | Bucket: ~100 cycles |
| Third table access | ~100 cycles | Chain[0]: ~100 cycles |
| Fourth table access | ~100 cycles | Chain[1]: ~100 cycles (if collision) |
| Total (best) | ~400 cycles | ~210 cycles |
| Total (typical) | ~400 cycles | ~310 cycles |
| Total (worst) | ~400 cycles | ~1000+ cycles (long chain) |
Why Convention Dominates Today:
Most modern systems (x86-64, ARM) use conventional multi-level page tables despite inverted tables' advantages because:
Hardware Investment: Decades of x86/ARM hardware optimization for multi-level walking has made conventional translation very fast (hardware page table walkers, large TLBs with nested entries).
Sharing Importance: Modern systems share extensively (shared libraries, CoW after fork, shared memory IPC). The sharing complexity of inverted tables is a significant burden.
Predictable Worst Case: Multi-level tables have bounded translation time (exactly N accesses for N-level tables). Inverted tables' hash chain length is probabilistic.
Ecosystem Inertia: Operating systems, compilers, and debugging tools assume per-process address spaces with conventional page tables.
Inverted page tables save memory but complicate the common-case translation path and make sharing difficult. This trade-off is acceptable for memory-constrained systems with simple sharing requirements, but problematic for general-purpose systems where translation speed and extensive sharing are priorities.
Inverted page tables represent a fundamental paradigm shift in address translation, organizing mappings around physical frames rather than virtual pages. This inversion trades translation complexity for memory efficiency in specific workloads.
What's Next:
The next page explores the single table for all processes design in depth—examining how the system-wide nature of inverted page tables affects context switching, TLB management, and overall system organization compared to per-process conventional tables.
You now understand inverted page tables comprehensively—from the motivation of physical-memory-proportional sizing through the structure and translation mechanism to real-world implementations and trade-offs. This knowledge reveals an important alternative design point in the page table design space.