Loading learning content...
Consider a typical Linux server running 50 instances of a web application. Each process requires 100 MB of code and libraries. Without memory sharing, you'd need 5 GB of physical memory just for the redundant code. With page sharing, that same code occupies a mere 100 MB — a 50x reduction.
This isn't theoretical optimization; it's the difference between a $200/month cloud instance and a $4,000/month one. Page sharing is the fundamental mechanism that makes modern multi-process computing economically viable.
In this page, we'll dissect exactly how virtual memory enables this remarkable efficiency, exploring the hardware and software mechanisms that allow multiple processes to share physical memory pages while maintaining complete isolation and security.
By the end of this page, you will understand the fundamental mechanisms of page sharing: how page tables enable multiple virtual addresses to map to the same physical frame, the different types of sharing (read-only vs copy-on-write), and the architectural principles that make sharing both efficient and safe. You'll gain the vocabulary and conceptual framework to understand how operating systems optimize memory usage through sharing.
Page sharing is made possible by the fundamental architecture of virtual memory systems. To understand it fully, we must first revisit the core abstraction that enables all modern memory management.
The Key Insight: Indirection
Virtual memory introduces an indirection layer between the addresses a process uses (virtual addresses) and the actual physical memory locations (physical addresses). This indirection is implemented through page tables, which the Memory Management Unit (MMU) consults on every memory access.
Here's the crucial observation: nothing in the virtual memory architecture requires that a physical frame be referenced by only one virtual page. The page table is simply a mapping structure, and multiple entries — whether in the same page table or in different processes' page tables — can point to the same physical frame.
This seemingly simple insight has profound implications. It means that two processes can have page table entries that contain the same physical frame number, allowing them to access the same physical memory through their own distinct virtual addresses.
| Field | Size (typical) | Role in Sharing |
|---|---|---|
| Physical Frame Number (PFN) | 20-40 bits | The actual physical address; identical in shared PTEs |
| Present/Valid bit | 1 bit | Must be set for accessible shared pages |
| Read/Write bit | 1 bit | Often read-only for shared pages to enable COW |
| User/Supervisor bit | 1 bit | Determines if user-space can access the shared page |
| Global bit | 1 bit | Prevents TLB flush; useful for widely-shared pages (e.g., kernel) |
| Accessed/Dirty bits | 2 bits | Track usage; shared pages complicate dirty tracking |
| Cache control bits | 2-3 bits | Must be consistent across all sharing PTEs |
When multiple page tables reference the same physical frame, the operating system must maintain a reference count for each frame. The frame can only be freed when the reference count drops to zero, meaning no page table entries point to it anymore. This reference counting is managed by the OS's frame allocator and is critical for correctness.
Let's trace exactly how page sharing is established and maintained at the hardware and software levels.
Establishing a Shared Mapping
When the operating system decides to share a page between two processes, it performs the following operations:
Let's visualize this with a concrete example:
Virtual Address Independence
Notice that Process A maps the shared page at virtual address 0x1000, while Process B maps the same physical frame at virtual address 0x2000. The virtual addresses don't need to match — each process has complete autonomy over its virtual address space layout.
However, some types of sharing do require consistent virtual addresses across processes. Code that contains absolute addresses (rather than position-independent code) will only work correctly if mapped at the expected virtual address. This is why:
Position-Independent Code uses relative addressing for all internal references, allowing the same code to execute correctly regardless of where it's loaded in virtual memory. This is achieved using instruction-pointer-relative addressing (for code) and the Global Offset Table (GOT) with Procedure Linkage Table (PLT) (for data and function calls). Modern shared libraries are always compiled as PIC.
Page sharing in operating systems manifests in several distinct forms, each with specific use cases, implementation requirements, and performance characteristics. Understanding these categories is essential for systems programming and OS development.
Pure Read-Only Sharing
This is the simplest and most common form of sharing. The shared pages are marked read-only in all processes' page tables, and no process ever modifies them.
Use Cases:
Implementation:
// Pseudo-code for establishing read-only sharing
frame_t frame = find_or_create_frame(file, offset);
add_pte(process_A, vaddr_A, frame, PROT_READ | PROT_EXEC);
add_pte(process_B, vaddr_B, frame, PROT_READ | PROT_EXEC);
frame->refcount = 2;
Properties:
Example: When 100 processes run /usr/bin/python3, they all share the same physical pages containing the Python interpreter's executable code. The 15 MB Python binary exists once in memory, not 100 times.
For page sharing to work correctly and efficiently, the operating system must maintain sophisticated data structures that track which frames are shared, by whom, and with what properties. Let's examine this infrastructure in detail.
_mapcount field tracks how many page table entries reference this frame.123456789101112131415161718192021222324252627282930313233343536
// Simplified representation of Linux's page frame metadatastruct page { unsigned long flags; // Page state flags (Locked, Dirty, Active, etc.) union { atomic_t _mapcount; // Count of page table mappings (-1 = not mapped) unsigned int page_type; // For special pages }; atomic_t _refcount; // Usage count (must be > 0 to use page) struct address_space *mapping; // File this page belongs to (if file-backed) pgoff_t index; // Offset within file (in pages) struct list_head lru; // Position on LRU list for reclamation // For anonymous pages (COW, heap, stack) struct anon_vma *anon_vma; // Reverse mapping for anonymous pages}; // VMA describes a region of virtual address spacestruct vm_area_struct { unsigned long vm_start; // Start virtual address unsigned long vm_end; // End address (exclusive) pgprot_t vm_page_prot; // Access permissions unsigned long vm_flags; // Flags: VM_READ, VM_WRITE, VM_SHARED, etc. struct file *vm_file; // Backing file (NULL for anonymous) unsigned long vm_pgoff; // Offset in file (in pages) struct mm_struct *vm_mm; // Owning process's address space // For shared file mappings, all processes share same address_space // For anonymous sharing, anon_vma links all COW-related VMAs};The Reference Counting Challenge
Reference counting for shared pages must handle several edge cases:
Transient references: The kernel often takes temporary references (e.g., during I/O). The _refcount must account for these beyond just page table mappings.
Split references: _mapcount counts page table entries, while _refcount counts all references. A page with _mapcount = 5 (five PTEs) might have _refcount = 7 (PTEs + 2 ongoing I/O operations).
Large pages: When using huge pages (2 MB), all component 4 KB pages must maintain consistent counts when the huge page is split.
NUMA proximity: Reference tracking must preserve NUMA node information to avoid migrating pages away from their optimal memory node.
The hardware plays a crucial role in enabling efficient page sharing. Modern CPUs provide several features specifically designed to support sharing semantics.
| Scenario | TLB Impact | Optimization |
|---|---|---|
| Context switch between sharing processes | TLB entries with different ASIDs can coexist | No flush needed if ASID differs |
| COW page becomes private | Must invalidate other CPUs' TLB entries (TLB shootdown) | IPI to affected CPUs only |
| Global page (kernel, vDSO) | Shared TLB entry, never flushed | One TLB entry serves all processes |
| File page evicted from memory | All PTEs must be cleared, TLB shootdown | Batch invalidations for efficiency |
| Huge page shared | Single TLB entry covers 2MB/1GB | Massive TLB coverage improvement |
When a shared page's mapping changes (e.g., COW copy, unmapping), the OS must ensure all CPUs invalidate their TLB entries for that page. This 'TLB shootdown' requires inter-processor interrupts (IPIs), which are expensive. On a 128-core system, a shootdown can take thousands of cycles. This is a hidden cost of sharing that can become a bottleneck for write-heavy workloads.
On Non-Uniform Memory Access (NUMA) systems, memory access time depends on which CPU socket is accessing which memory bank. Shared pages introduce interesting NUMA considerations.
The NUMA Sharing Dilemma
When a page is shared across processes running on different NUMA nodes:
| Scenario | Memory Latency | Practical Impact |
|---|---|---|
| Local access | 80-100 ns | Optimal performance |
| Remote access (1 hop) | 120-150 ns | 30-50% slower |
| Remote access (2 hops) | 200+ ns | 2x+ slower |
For a shared library page accessed by processes on different nodes, someone will have remote access. The OS must decide where to place the page.
Linux AutoNUMA
Linux implements automatic NUMA balancing (AutoNUMA) that monitors access patterns and migrates pages to the accessing node. For shared pages, this creates interesting dynamics:
Page initially on Node 0
|
v
Process on Node 1 accesses frequently
|
v
AutoNUMA detects remote access pattern
|
v
Migrates page to Node 1
|
v
But now Process on Node 0 has remote access!
This back-and-forth ('NUMA ping-pong') can occur with shared pages accessed equally from multiple nodes. The solution is often to either:
For shared memory IPC on NUMA systems: (1) Keep communicating processes on the same node when possible, (2) Use interleaving for truly shared structures accessed equally, (3) For producer-consumer patterns, place the shared region on the consumer's node (reads are typically more latency-sensitive than writes due to CPU stalls), (4) Use numactl and mbind() for explicit control in performance-critical applications.
Understanding sharing efficiency requires metrics to measure actual sharing in a running system. Operating systems provide various tools and interfaces to observe sharing behavior.
123456789101112131415161718192021222324
# Check memory sharing statistics on Linux # 1. System-wide sharing statscat /proc/meminfo | grep -E 'Shmem|Mapped|AnonPages'# Shmem: explicitly shared memory (shm_open, etc.)# Mapped: file-backed pages mapped into processes# AnonPages: private process memory (heap, stack, COW) # 2. Per-process sharing analysis# PSS (Proportional Set Size) divides shared pages by sharers# Example: 100 processes share 100 MB libc → each shows ~1 MB PSScat /proc/<pid>/smaps | grep -E 'Pss|Shared|Private' # 3. Shared memory segmentsipcs -m # System V shared memoryls /dev/shm # POSIX shared memory # 4. Analyze a specific library's sharing# Count how many processes map libclsof /lib/x86_64-linux-gnu/libc.so.* | wc -l # 5. Detailed page flags analysiscat /proc/<pid>/pagemap # Raw page table data# Requires specialized tools like /proc/kpageflags| Metric | Definition | Interpretation |
|---|---|---|
| RSS (Resident Set Size) | All pages in physical memory | Includes shared pages (counted fully) |
| PSS (Proportional Set Size) | Private pages + shared/N | Fair share accounting for shared pages |
| USS (Unique Set Size) | Only private pages | Memory freed if process terminates |
| Shared_Clean | Shared pages not modified | True read-only sharing |
| Shared_Dirty | Shared pages with pending writes | File-backed with unsaved changes |
| Private_Clean | Private pages not modified | COW pages not yet written |
| Private_Dirty | Private pages that were modified | Definitely uses memory on stop |
Limits on Sharing
Several factors limit how much memory can be shared:
Alignment requirements: Pages must be naturally aligned. Partially-overlapping data cannot share a page.
Write frequency: High-write pages break COW sharing immediately. Data structures with frequent writes are poor sharing candidates.
ASLR entropy: Address Space Layout Randomization places libraries at random addresses. Position-independent code handles this, but offset randomization within libraries can affect page alignment.
Execution state: Writable data (globals, static variables) in shared libraries is COW'd per-process.
Huge page granularity: 2 MB huge pages can only share if the entire 2 MB section is shareable.
Maximum mapcount: Linux limits how many PTEs can reference one page (128*1024 by default). Systems with extreme sharing (containers) can hit this limit.
We've established the foundational concepts of page sharing in virtual memory systems. Let's consolidate the key takeaways:
What's Next:
Now that we understand the fundamental mechanisms of page sharing, we'll explore one of its most important applications: shared libraries. We'll see how operating systems enable hundreds of processes to share a single copy of libc, how symbol resolution works with shared code, and the performance implications of library loading strategies.
You now understand the fundamental mechanisms of page sharing in virtual memory systems. You've learned how the indirection provided by page tables enables sharing, the different types of sharing and their use cases, the kernel data structures that track sharing, and the hardware features that make it efficient. This knowledge forms the foundation for understanding shared libraries, IPC, and memory optimization in real systems.