Loading content...
In conventional memory management, each process maintains its own page table—a private directory mapping that process's virtual addresses to physical frames. When the OS switches between processes, it must also switch page tables, loading a new base register and potentially flushing cached translations.
Inverted page tables break this per-process paradigm. A single system-wide table serves all processes simultaneously. Every translation—regardless of which process initiated it—consults the same data structure. The table doesn't change when processes switch; only the process identifier used for lookup changes.
This architectural choice has profound implications for system design. It affects context switch performance, TLB management strategy, concurrent access patterns, memory allocation policies, and even security isolation. Understanding these implications is essential for system architects evaluating memory management approaches.
By the end of this page, you will understand how a single table supports multiple processes, the impact on context switching and TLB behavior, concurrency control requirements, process identifier management, and the system-wide coordination challenges that emerge from shared table organization.
Understanding the fundamental difference between per-process and system-wide table organizations clarifies why each approach exists and when each is appropriate.
Per-Process Tables (Conventional):
Each process has its own page table hierarchy:
System State:
┌─────────────────┐ ┌──────────────────┐
│ Process A (PID 1) │ │ Process A Tables │
│ CR3 → 0x1000 │────→│ L4 → L3 → L2 → L1│
└─────────────────┘ └──────────────────┘
┌─────────────────┐ ┌──────────────────┐
│ Process B (PID 2) │ │ Process B Tables │
│ CR3 → 0x5000 │────→│ L4 → L3 → L2 → L1│
└─────────────────┘ └──────────────────┘
┌─────────────────┐ ┌──────────────────┐
│ Process C (PID 3) │ │ Process C Tables │
│ CR3 → 0x9000 │────→│ L4 → L3 → L2 → L1│
└─────────────────┘ └──────────────────┘
Properties:
System-Wide Table (Inverted):
One table serves all processes:
System State:
┌─────────────────┐
│ Process A (PID 1) │─────┐
│ ASID: 0x01 │ │
└─────────────────┘ │
│ ┌──────────────────────┐
┌─────────────────┐ ├───→│ Single Inverted │
│ Process B (PID 2) │─────┤ │ Page Table │
│ ASID: 0x02 │ │ │ │
└─────────────────┘ │ │ Frame 0: (A, 0x100) │
│ │ Frame 1: (B, 0x200) │
┌─────────────────┐ │ │ Frame 2: (C, 0x050) │
│ Process C (PID 3) │─────┘ │ Frame 3: (A, 0x101) │
│ ASID: 0x03 │ │ ... │
└─────────────────┘ └──────────────────────┘
Properties:
| Aspect | Per-Process Tables | System-Wide Table |
|---|---|---|
| Isolation mechanism | Separate tables | PID in entries |
| Context switch | Change CR3/table base | Change current PID |
| Table switch overhead | May flush TLB | No table switch needed |
| Memory overhead | Per-process growth | Fixed by physical size |
| Entry content | No PID (implicit) | PID required (explicit) |
| Concurrent access | Per-process locks viable | Global coordination needed |
| TLB management | Per-process flush options | ASID-based invalidation |
With per-process tables, a translation inherently 'knows' which process it belongs to by which table it's in. With system-wide tables, each translation is an orphan that must be explicitly tagged with its owner. This difference ripples through the entire system design.
Context switching—transitioning CPU execution from one process to another—interacts differently with per-process versus system-wide page tables. This difference significantly affects switch performance and TLB behavior.
Per-Process Table Context Switch:
1. Save current process state (registers, etc.)
2. Load new process's page table base into CR3
├── Hardware automatically invalidates TLB entries
│ (all cached translations now invalid)
└── Or: Use ASID to tag TLB entries, selective invalidation
3. Restore new process state
4. Resume execution
└── First memory accesses cause TLB misses
(must rewalk page tables to reload TLB)
The TLB Flush Problem:
When CR3 changes, the TLB contains translations from the old process. If not invalidated, these could cause the new process to access wrong physical frames. Traditional x86 invalidated the entire TLB on CR3 change—very expensive.
Modern Mitigation: PCID/ASID
Modern CPUs (Intel with PCID, ARM with ASID) tag TLB entries with a process/address space identifier. This allows TLB entries from multiple processes to coexist:
TLB Entry: (ASID, VPN) → (PFN, permissions)
Switch from Process A (ASID=1) to Process B (ASID=2):
- Change current ASID register from 1 to 2
- No TLB flush needed!
- Process B's entries (ASID=2) are already cached
- Process A's entries (ASID=1) remain but won't match
System-Wide Table Context Switch:
With an inverted page table, there's no table base register to change:
1. Save current process state
2. Update current PID/ASID register
└── No page table base change!
(same table serves all processes)
3. Restore new process state
4. Resume execution
└── TLB entries from previous process may remain
but won't match (different PID in lookup)
Inherent ASID-like Behavior:
Inverted page tables effectively require ASID-style TLB organization because:
This means inverted page table systems 'get ASID for free'—the per-lookup PID serves the same purpose.
12345678910111213141516171819202122232425262728293031323334353637383940414243
// Context switch comparison // Per-process table context switch (simplified x86-64)void context_switch_per_process(Process* old, Process* new) { // Save old process state save_registers(&old->registers); old->kernel_stack = get_current_stack(); // Change page table base uint64_t new_cr3 = new->page_table_base; // With PCID support: Preserve TLB entries if (pcid_supported()) { new_cr3 |= (uint64_t)new->pcid; // Bit 63: don't flush on write new_cr3 |= (1ULL << 63); } // else: TLB fully flushed on CR3 write write_cr3(new_cr3); // Switch page tables // Restore new process state set_current_stack(new->kernel_stack); restore_registers(&new->registers);} // System-wide table context switchvoid context_switch_inverted(Process* old, Process* new) { // Save old process state save_registers(&old->registers); old->kernel_stack = get_current_stack(); // No page table base change needed! // Just update the current process identifier set_current_asid(new->asid); // Restore new process state set_current_stack(new->kernel_stack); restore_registers(&new->registers); // TLB entries from 'old' remain but have different ASID // so they won't match lookups for 'new'}System-wide tables can offer faster context switches because there's no table base change and no possibility of TLB flush from base change. However, modern per-process systems with PCID achieve similar benefits by preserving TLB entries across switches. The performance difference is now minimal in practice.
The Translation Lookaside Buffer (TLB) is a critical performance component that caches recent address translations. With system-wide page tables, TLB management has unique characteristics.
TLB Entry Structure for Inverted Tables:
TLB entries must include the process identifier:
Standard TLB Entry:
Key: VPN
Value: PFN, permissions
Inverted Table TLB Entry:
Key: (PID/ASID, VPN)
Value: PFN, permissions
Matching Logic:
if TLB[i].asid == current_asid AND TLB[i].vpn == requested_vpn:
return TLB[i].pfn
TLB Lookup with ASID:
Every TLB lookup must check the process identifier:
Lookup(virtual_address):
vpn = extract_vpn(virtual_address)
current_asid = get_current_asid()
for each TLB_entry:
if TLB_entry.valid AND
TLB_entry.asid == current_asid AND
TLB_entry.vpn == vpn:
return HIT(TLB_entry.pfn)
return MISS // Must consult inverted page table
TLB Invalidation Strategies:
Invalidation must account for the global nature of the table:
1. Full TLB Flush:
flush_tlb_all():
// Invalidate every entry in TLB
// Expensive but simple
// Used when: ASID wrap-around, major table restructuring
2. ASID-Specific Flush:
flush_tlb_asid(asid):
// Invalidate only entries with this ASID
for each TLB_entry:
if TLB_entry.asid == asid:
TLB_entry.valid = false
// Used when: Process exits, ASID recycled
3. Single Entry Invalidation:
flush_tlb_entry(asid, vpn):
// Invalidate specific (ASID, VPN) pair
for each TLB_entry:
if TLB_entry.asid == asid AND TLB_entry.vpn == vpn:
TLB_entry.valid = false
// Used when: Single page unmapped, protection change
4. Range Invalidation:
flush_tlb_range(asid, vpn_start, vpn_end):
// Invalidate range of pages
for each TLB_entry:
if TLB_entry.asid == asid AND
TLB_entry.vpn >= vpn_start AND
TLB_entry.vpn <= vpn_end:
TLB_entry.valid = false
// Used when: munmap of multi-page region
| Event | Required Invalidation | Scope |
|---|---|---|
| Process exits | Flush ASID | Single process |
| munmap() single page | Invalidate (ASID, VPN) | Single entry |
| mprotect() region | Flush range (ASID, VPN range) | Range |
| ASID wraparound | Full flush | Entire TLB |
| Fork (CoW setup) | Invalidate parent entries for CoW pages | Specific pages |
| Shared page unmap | All ASIDs for that VPN* | Multi-process |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
// TLB management for inverted page table systems typedef struct { uint16_t asid; bool valid; bool global; // Kernel pages: match all ASIDs uint64_t vpn; uint64_t pfn; uint32_t permissions;} TLBEntry; // Software TLB miss handler (inverted page table system)void tlb_miss_handler(uint16_t asid, uint64_t faulting_address) { uint64_t vpn = faulting_address >> PAGE_SHIFT; // Search inverted page table IPTTranslationResult result = ipt_translate(global_ipt, asid, faulting_address); if (!result.success) { // Not a TLB miss - it's a genuine page fault handle_page_fault(asid, faulting_address); return; } // Install translation in TLB TLBEntry new_entry = { .asid = asid, .valid = true, .global = false, .vpn = vpn, .pfn = result.frame_number, .permissions = result.control }; // Find slot (implementation-dependent replacement policy) int slot = tlb_find_replacement_slot(); hardware_write_tlb(slot, &new_entry); // Return to faulting instruction} // ASID management#define MAX_ASID 256 // 8-bit ASID typedef struct { uint16_t asid_to_pid[MAX_ASID]; // What PID uses each ASID uint8_t next_asid; // Next ASID to allocate uint64_t generation; // Incremented on wraparound} ASIDManager; uint16_t allocate_asid(ASIDManager* mgr, uint32_t pid) { uint16_t asid = mgr->next_asid++; if (mgr->next_asid == 0) { // Wraparound! Must flush all TLB entries mgr->generation++; tlb_flush_all(); mgr->next_asid = 1; // 0 often reserved for kernel } mgr->asid_to_pid[asid] = pid; return asid;}ASID space is limited (typically 8-16 bits = 256-65536 values). When ASIDs are exhausted and must be recycled, a full TLB flush is typically required to ensure old translations don't match new processes. Systems with many active processes may hit this limit, causing periodic TLB performance drops.
With a single table accessed by all processes (and potentially all CPUs simultaneously), concurrency control becomes significantly more complex than with per-process tables.
The Concurrency Challenge:
Per-Process Tables:
- CPU₀ running Process A → Accesses Table_A
- CPU₁ running Process B → Accesses Table_B
- Minimal contention (different tables)
System-Wide Table:
- CPU₀ running Process A → Accesses Global_IPT
- CPU₁ running Process B → Accesses Global_IPT
- Potential contention (same table)
Contention Sources:
Lookup Contention: Multiple CPUs reading hash chains simultaneously (typically safe with proper memory ordering)
Insert Contention: Adding new entries requires modifying hash chain heads (requires synchronization)
Delete Contention: Removing entries requires unlinking from chains (requires careful synchronization)
Update Contention: Modifying entry control bits (e.g., dirty bit) requires atomic operations
Synchronization Strategies:
1. Global Lock (Simple but Slow):
spinlock_t global_ipt_lock;
void ipt_operation(Operation op) {
spin_lock(&global_ipt_lock);
perform_operation(op);
spin_unlock(&global_ipt_lock);
}
Problem: Serializes all IPT accesses across all CPUs. Severe bottleneck.
2. Per-Bucket Locks (Better):
spinlock_t bucket_locks[NUM_BUCKETS];
void ipt_operation(uint64_t vpn, Operation op) {
uint32_t bucket = hash(vpn) % NUM_BUCKETS;
spin_lock(&bucket_locks[bucket]);
perform_operation(op);
spin_unlock(&bucket_locks[bucket]);
}
Better: Concurrent access to different buckets. Contention only on same bucket.
3. Read-Copy-Update (RCU) for Lookups:
void ipt_lookup(uint64_t vpn) {
rcu_read_lock(); // No actual lock, just memory barrier
HPTEntry* entry = find_entry(vpn); // Read-only traversal
result = entry ? entry->pfn : NOT_FOUND;
rcu_read_unlock();
return result;
}
void ipt_insert(uint64_t vpn, uint64_t pfn) {
HPTEntry* new_entry = alloc_entry(vpn, pfn);
spin_lock(&bucket_locks[bucket]);
// Atomically add to chain
new_entry->next = bucket_head;
atomic_store(&bucket_head, new_entry);
spin_unlock(&bucket_locks[bucket]);
}
Excellent for read-heavy workloads (most memory accesses are lookups, not modifications).
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
// Concurrent inverted page table with RCU #include <urcu.h> // Userspace RCU library typedef struct IPTEntry { uint16_t pid; uint64_t vpn; uint64_t pfn; _Atomic uint32_t control; // Atomic for concurrent updates struct IPTEntry* next;} IPTEntry; typedef struct { _Atomic(IPTEntry*) hash_heads[NUM_BUCKETS]; // Atomic pointers spinlock_t mod_locks[NUM_BUCKETS]; // For modifications} ConcurrentIPT; // Read-side: No locking, just RCUIPTEntry* ipt_lookup_rcu(ConcurrentIPT* ipt, uint16_t pid, uint64_t vpn) { rcu_read_lock(); uint32_t bucket = hash(pid, vpn); IPTEntry* entry = atomic_load(&ipt->hash_heads[bucket]); while (entry != NULL) { if (entry->pid == pid && entry->vpn == vpn) { // Found - update accessed bit atomically atomic_fetch_or(&entry->control, IPT_ACCESSED); rcu_read_unlock(); return entry; } entry = entry->next; } rcu_read_unlock(); return NULL;} // Write-side: Lock bucket, use RCU-safe updatesbool ipt_insert_rcu(ConcurrentIPT* ipt, uint16_t pid, uint64_t vpn, uint64_t pfn, uint32_t control) { uint32_t bucket = hash(pid, vpn); // Allocate before locking IPTEntry* new_entry = malloc(sizeof(IPTEntry)); new_entry->pid = pid; new_entry->vpn = vpn; new_entry->pfn = pfn; atomic_store(&new_entry->control, control); spin_lock(&ipt->mod_locks[bucket]); // Link at head (atomic pointer update) new_entry->next = atomic_load(&ipt->hash_heads[bucket]); atomic_store(&ipt->hash_heads[bucket], new_entry); spin_unlock(&ipt->mod_locks[bucket]); return true;} // Delete: RCU deferred freebool ipt_delete_rcu(ConcurrentIPT* ipt, uint16_t pid, uint64_t vpn) { uint32_t bucket = hash(pid, vpn); spin_lock(&ipt->mod_locks[bucket]); IPTEntry* prev = NULL; IPTEntry* entry = atomic_load(&ipt->hash_heads[bucket]); while (entry != NULL) { if (entry->pid == pid && entry->vpn == vpn) { // Unlink from chain if (prev == NULL) { atomic_store(&ipt->hash_heads[bucket], entry->next); } else { prev->next = entry->next; } spin_unlock(&ipt->mod_locks[bucket]); // Wait for all readers to finish, then free call_rcu(&entry->rcu_head, rcu_free_entry); return true; } prev = entry; entry = entry->next; } spin_unlock(&ipt->mod_locks[bucket]); return false;}On architectures with hardware page table walkers (like PowerPC with HTAB), the hardware handles much of the synchronization. Software only locks during modifications, and hardware reads are inherently safe. Software-walked TLB systems (some MIPS, older ARM) require more careful software synchronization.
Managing process identifiers in a system-wide table environment requires careful coordination between several system components. The identifier serves as a critical discriminator in all address translations.
Identifier Types:
1. Process ID (PID):
2. Address Space Identifier (ASID):
3. Virtual Segment ID (VSID):
ASID Lifecycle Management:
Process Creation:
1. Allocate new process (get PID from OS)
2. Request ASID from ASID allocator
- If free ASID available: assign it
- If no free ASIDs: reclaim one (see below)
3. Record PID ↔ ASID mapping
4. Initialize page table entries with this ASID
Process Execution:
1. On context switch, load ASID into hardware register
2. All TLB lookups automatially filtered by ASID
3. Page table lookups include ASID in search key
Process Exit:
1. Invalidate all TLB entries for this ASID
2. Remove all page table entries with this PID
3. Release ASID back to free pool
4. Clear PID ↔ ASID mapping
ASID Exhaustion and Recycling:
With only 256 ASIDs (8-bit) and potentially thousands of processes:
Scenario: 300 processes active, only 256 ASIDs
Solution 1: ASID Versioning (Generational)
- Maintain generation counter G
- Full ASID is (G * 256 + raw_asid)
- TLB entries tagged with full ASID
- When raw_asid wraps, increment G and flush TLB
- Processes with old-generation ASIDs get new ones on next run
Solution 2: ASID Stealing
- When out of ASIDs, pick a victim process
- Flush victim's TLB entries
- Reassign victim's ASID to new process
- Victim gets new ASID when it runs again
Solution 3: Lazy ASID Allocation
- Only assign ASID when process is about to run
- Release ASID if process is idle too long
- Active process set << total process set
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
// ASID Allocator with generation tracking #include <stdint.h>#include <stdbool.h> #define ASID_BITS 8#define NUM_ASIDS (1 << ASID_BITS) typedef struct { uint64_t generation; // Current generation uint8_t next_raw_asid; // Next raw ASID to allocate uint32_t asid_to_pid[NUM_ASIDS]; // Who owns each ASID uint64_t asid_generation[NUM_ASIDS]; // Generation when assigned spinlock_t lock;} ASIDAllocator; typedef struct { uint64_t full_asid; // (generation << ASID_BITS) | raw_asid bool valid; // Was allocation successful} ASIDAllocation; // Allocate ASID for processASIDAllocation allocate_asid(ASIDAllocator* allocator, uint32_t pid) { ASIDAllocation result = {0}; spin_lock(&allocator->lock); uint8_t raw_asid = allocator->next_raw_asid++; // Check for wraparound if (allocator->next_raw_asid == 0) { // Generation rollover - must flush all TLBs allocator->generation++; tlb_flush_all(); allocator->next_raw_asid = 1; // 0 often reserved raw_asid = 1; } // Record allocation allocator->asid_to_pid[raw_asid] = pid; allocator->asid_generation[raw_asid] = allocator->generation; // Compute full ASID result.full_asid = (allocator->generation << ASID_BITS) | raw_asid; result.valid = true; spin_unlock(&allocator->lock); return result;} // Check if process's ASID is still validbool asid_valid(ASIDAllocator* allocator, uint32_t pid, uint64_t process_asid) { uint8_t raw_asid = process_asid & (NUM_ASIDS - 1); uint64_t asid_gen = process_asid >> ASID_BITS; spin_lock(&allocator->lock); bool valid = (allocator->asid_to_pid[raw_asid] == pid && allocator->asid_generation[raw_asid] == asid_gen); spin_unlock(&allocator->lock); return valid;} // Free ASID when process exitsvoid free_asid(ASIDAllocator* allocator, uint32_t pid, uint64_t asid) { uint8_t raw_asid = asid & (NUM_ASIDS - 1); spin_lock(&allocator->lock); if (allocator->asid_to_pid[raw_asid] == pid) { allocator->asid_to_pid[raw_asid] = 0; // Mark as free // Flush TLB entries for this ASID tlb_flush_asid(raw_asid); } spin_unlock(&allocator->lock);}ARM64 supports 8-bit or 16-bit ASIDs (configurable). Intel PCID is 12 bits (4096 values). Larger ASID spaces reduce recycling frequency and TLB flush overhead, making system-wide table designs more practical for systems with many processes.
The choice of system-wide versus per-process page tables ripples through many aspects of operating system design. Understanding these implications helps evaluate when each approach is appropriate.
Memory Allocator Integration:
System-wide tables tightly couple with physical memory allocation:
Per-Process Tables:
- Process can have arbitrary VPN → PFN mappings
- No constraint between frame number and table location
- Allocator returns any free frame
Inverted Table:
- Entry for frame F is at IPT[F]
- When allocating frame F, must update IPT[F] in place
- Allocator must coordinate with table update
Implications:
Kernel Mapping Strategies:
Kernel virtual addresses must work for all processes:
Per-Process (Linux/x86-64):
0x0000... - 0x7FFF...: User space
0x8000... - 0xFFFF...: Kernel space
Every process table has
identical kernel entries
System-Wide (Inverted):
IPT entries:
[Frame 0]: global, VPN: 0x8000
[Frame 1]: global, VPN: 0x8001
...
[Frame K]: ASID:5, VPN: 0x100
Debugging and Introspection:
System-wide tables affect how debuggers and monitoring tools work:
Per-Process Tables:
/proc/PID/maps can show virtual layoutInverted Tables:
Security Isolation Considerations:
Per-Process Tables:
- Isolation by construction: Different tables, no overlap possible
- Compromise of one table doesn't affect others
- KASLR: Each process can have different kernel randomization*
Inverted Tables:
- Isolation by PID check: Same table, access filtered by PID
- Corruption of IPT affects all processes
- Must ensure PID checks are unforgeable
*Note: In practice, KASLR typically uses same kernel layout for all processes for simplicity.
Modern commodity systems (x86-64, ARM64) predominantly use per-process multi-level tables with PCID/ASID support. Inverted tables remain important in specialized domains (PowerPC servers, some embedded systems) and as a conceptual alternative that illuminates different trade-offs in memory management design.
System-wide page tables represent a fundamentally different organization than per-process tables, with implications spanning context switching, TLB management, concurrency control, and overall system architecture.
What's Next:
The final page explores sharing challenges—the most significant limitation of inverted page tables. When multiple processes share physical frames (shared libraries, shared memory, copy-on-write after fork), the single-entry-per-frame constraint creates complexity that doesn't exist with per-process tables.
You now understand system-wide page table organization comprehensively—from the fundamental contrast with per-process tables through context switching, TLB management, concurrency control, and ASID management. This knowledge illuminates how table organization choices affect every aspect of memory management system design.