Operating SystemsMemory Management Goals

Memory Management Goals

LevelIntermediate

Duration90 mins

TopicMemory Management Goals

3 / 5

Sharing

The Paradox of Controlled Access

We've established that memory protection isolates processes from each other. But consider this scenario: you have 100 processes all running the same library—say, the C standard library (libc). Without sharing, each process would need its own copy of the library's code in physical memory. With libc being roughly 2MB, that's 200MB of RAM consumed by 100 identical copies of the same instructions.

This is absurd waste. The library code is read-only—it's identical in every process. Why not have all 100 processes share a single physical copy?

Memory sharing is the third fundamental goal of memory management: enabling controlled, safe access to common memory regions. Sharing seems to contradict protection, but the operating system achieves both simultaneously. This page explores how.

What You Will Learn

By the end of this page, you will understand: why memory sharing is essential for efficiency, the different forms of sharing (code, data, IPC), how sharing works with virtual memory mechanisms, copy-on-write as a sharing optimization, shared memory for inter-process communication, and how sharing and protection coexist.

Why Share Memory?

Memory sharing serves three primary purposes in operating systems: efficiency, communication, and functionality. Each represents a different use case with different requirements.

Primary Motivations for Memory Sharing

•Efficiency Through Deduplication — Shared libraries, executables run by multiple users, and common system components would consume enormous memory if duplicated per process. Sharing eliminates this redundancy.
•Fast Inter-Process Communication — Processes need to exchange data. While pipes and sockets work, shared memory enables zero-copy communication—the ultimate in IPC speed.
•Memory-Mapped Files — Mapping files into memory allows file I/O through ordinary memory operations. Multiple processes mapping the same file naturally share the physical pages.
•Fork Optimization — When a process forks, the child inherits the parent's address space. Copy-on-write sharing makes fork() nearly instantaneous regardless of process size.

Memory Savings Through Sharing (Typical Server)
Component	Size	Without Sharing (100 processes)	With Sharing
C Library (libc)	2 MB	200 MB	2 MB
GUI Toolkit (Qt/GTK)	20 MB	2 GB	20 MB
Language Runtime (Java/Python)	50 MB	5 GB	50 MB
Kernel Code (mapped read-only)	10 MB	1 GB	10 MB
Total	—	8.2 GB	82 MB

The table above illustrates the dramatic impact of sharing. A server running 100 identical processes could consume 8+ GB of RAM for libraries alone—or just 82 MB with sharing enabled. This isn't optimization; it's the difference between a system that works and one that doesn't.

The Sharing-Protection Balance:

Sharing and protection might seem contradictory:

Protection says: "Each process can only access its own memory"
Sharing says: "Multiple processes should access the same memory"

The resolution lies in controlled sharing:

Protection prevents unauthorized access
Sharing allows authorized access to common regions
The OS and hardware enforce which regions are shared and with what permissions

Sharing is Opt-In, Not Default

By default, processes are fully isolated. Sharing only occurs when explicitly configured: shared libraries loaded by the dynamic linker, memory regions explicitly shared via shmat() or mmap(), or pages subject to copy-on-write after fork(). This default-isolated approach maintains security while enabling efficiency.

How Sharing Works: Virtual Memory Mechanics

Memory sharing is implemented through the same virtual memory mechanisms used for protection. The key insight: multiple page table entries can point to the same physical frame.

Process A Page Table          Physical Memory         Process B Page Table
┌──────────────────┐          ┌─────────────┐        ┌──────────────────┐
│ VA 0x1000        │          │             │        │ VA 0x5000        │
│ → Frame 500 ─────┼─────────►│  Frame 500  │◄───────┼─── Frame 500    │
│ (R-X, User)      │          │  (libc code)│        │ (R-X, User)      │
└──────────────────┘          │             │        └──────────────────┘
                              └─────────────┘

In this example:

Process A maps virtual address 0x1000 to physical frame 500
Process B maps virtual address 0x5000 to the same physical frame 500
Both can read and execute the code, but neither can write (R-X permissions)
Only one copy of the code exists in physical memory

Important Observations:

Different Virtual Addresses, Same Physical Frame

The virtual addresses don't need to match
Process A might see shared library at 0x7f123000
Process B might see the same library at 0x7f456000
ASLR intentionally randomizes these addresses for security

Reference Counting

The OS tracks how many processes are using each physical frame
A shared frame with 50 users has reference count 50
The frame is only freed when the reference count drops to zero
This prevents freeing memory still in use by other processes

Consistent Permissions

Typically, all sharers have the same permissions (e.g., read-only for shared code)
Different permissions are possible but require careful handling
Copy-on-write is a special case where permissions change dynamically

shared_mapping.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Simplified: Creating a shared mapping
function create_shared_mapping(process, virtual_addr, size, permissions):
    // Find or create the shared memory object
    shmem = find_shared_memory_object(key)
    if shmem is NULL:
        shmem = create_shared_memory_object(size)
        allocate_physical_frames(shmem, size)
    
    // Map into this process's address space
    for page in range(0, size, PAGE_SIZE):
        vpage = virtual_addr + page
        pframe = shmem.frames[page / PAGE_SIZE]
        
        // Create page table entry pointing to shared frame
        create_pte(process.page_table, vpage, pframe, permissions)
        
        // Increment reference count on physical frame
        pframe.ref_count += 1
    
    return virtual_addr
 
// When unmapping shared memory
function unmap_shared_memory(process, virtual_addr, size):
    for page in range(0, size, PAGE_SIZE):
        pte = get_pte(process.page_table, virtual_addr + page)
        pframe = pte.frame
        
        // Remove mapping
        invalidate_pte(pte)
        
        // Decrement reference count
        pframe.ref_count -= 1
        
        // Only free frame if no one else is using it
        if pframe.ref_count == 0:
            free_frame(pframe)

TLB and Sharing

The TLB (Translation Lookaside Buffer) caches virtual-to-physical translations. Different processes have different page tables, so their TLB entries are tagged with an Address Space ID (ASID). When sharing, each process still looks up its own virtual address—it just happens to resolve to the same physical frame. The TLB entries are separate but point to the same destination.

Shared Libraries: The Primary Sharing Use Case

The most impactful application of memory sharing is shared libraries (also called dynamic libraries or DLLs on Windows). Rather than including library code in every executable, programs link against shared libraries that are loaded once and shared among all processes that use them.

Static vs. Dynamic Linking:

Static Linking

•Library code copied into executable at link time
•Each executable contains its own copy
•Large executables
•No external dependencies
•Updates require recompiling all programs
•No sharing—100 processes = 100 copies in RAM

Dynamic Linking

•Library loaded at runtime by dynamic linker
•Single shared .so/.dll file on disk
•Small executables
•Requires library at runtime
•Library updates benefit all programs
•Full sharing—100 processes = 1 copy in RAM

How Shared Library Loading Works:

Process Startup: The dynamic linker (ld.so on Linux, dyld on macOS) runs before main()
Dependency Resolution: The linker identifies which shared libraries are needed
Library Loading: For each library:
- If already in memory (used by another process), map the existing frames
- If not in memory, load from disk and map
Symbol Resolution: Resolve function and variable references to library addresses
Execution: Program runs with all libraries available in its address space

Position-Independent Code (PIC):

For sharing to work, library code cannot contain absolute addresses (which would only work at one specific virtual address). Instead, shared libraries are compiled as Position-Independent Code:

All internal references are PC-relative (relative to the current instruction)
External references go through the Global Offset Table (GOT) and Procedure Linkage Table (PLT)
The library works correctly regardless of where it's loaded in virtual memory

// Non-PIC (problematic for sharing):
mov eax, [0x12345678]    // Absolute address - only works at one location

// PIC (works anywhere):
lea rbx, [rip + got]     // Get address of GOT relative to current instruction
mov eax, [rbx + offset]  // Access through GOT

The PIC Overhead Trade-off

Position-independent code has slight overhead due to GOT/PLT indirection. On x86, this was significant (~5%); on x86-64 with PC-relative addressing, it's minimal (~1%). The memory savings from sharing far outweigh this penalty in virtually all scenarios.

Copy-on-Write: Lazy Sharing Optimization

Copy-on-Write (COW) is one of the most elegant optimizations in operating systems. It allows memory to be shared initially, with copying deferred until actually necessary—which may be never.

The Fork Problem:

The fork() system call creates a new process as an exact copy of the parent. Naively, this would require:

Allocating new physical frames equal to the parent's entire memory
Copying all data from parent frames to child frames

For a 1GB process, this means allocating and copying 1GB of memory—even if the child immediately exec()s a different program, discarding all that copied data.

The COW Solution:

Instead of copying, share everything:

Mark all the parent's pages as read-only
Give the child a copy of the page table (pointing to the same frames)
Increment reference counts on all shared frames
Fork completes instantly—no actual copying!

copy_on_write.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// Fork with copy-on-write
function fork():
    child = create_process()
    
    // Copy page table structure (but not frame data)
    child.page_table = clone_page_table(parent.page_table)
    
    for each pte in parent.page_table:
        if pte.present and pte.writable:
            // Mark as read-only and copy-on-write
            pte.writable = false
            pte.cow_flag = true  // Custom flag to track COW pages
            child_pte.writable = false
            child_pte.cow_flag = true
        
        // Increment reference count
        pte.frame.ref_count += 1
    
    // Flush TLB (protection bits changed)
    flush_tlb()
    
    return child
 
// Handle write fault on COW page
function handle_cow_fault(process, virtual_addr):
    pte = get_pte(process.page_table, virtual_addr)
    
    if not pte.cow_flag:
        // Not a COW page - genuine protection violation
        send_signal(process, SIGSEGV)
        return
    
    old_frame = pte.frame
    
    if old_frame.ref_count == 1:
        // We're the only user - just make it writable again
        pte.writable = true
        pte.cow_flag = false
    else:
        // Others are sharing - need to actually copy
        new_frame = allocate_frame()
        copy_frame_contents(old_frame, new_frame)
        
        pte.frame = new_frame
        pte.writable = true
        pte.cow_flag = false
        
        old_frame.ref_count -= 1
    
    // Retry the write instruction

COW in Action: The Timeline

Time T0: Before Fork
┌─────────────────────────────────────────┐
│ Parent Process                          │
│ Page 1 [RW] → Frame 100 (ref=1)        │
│ Page 2 [RW] → Frame 101 (ref=1)        │
└─────────────────────────────────────────┘

Time T1: After Fork (COW setup)
┌─────────────────────────────────────────┐
│ Parent Process                          │
│ Page 1 [RO,COW] → Frame 100 (ref=2)    │
│ Page 2 [RO,COW] → Frame 101 (ref=2)    │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ Child Process                           │
│ Page 1 [RO,COW] → Frame 100 (ref=2)    │
│ Page 2 [RO,COW] → Frame 101 (ref=2)    │
└─────────────────────────────────────────┘

Time T2: Child writes to Page 2 (COW triggered)
┌─────────────────────────────────────────┐
│ Parent Process                          │
│ Page 1 [RO,COW] → Frame 100 (ref=2)    │
│ Page 2 [RO,COW] → Frame 101 (ref=1)    │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ Child Process                           │
│ Page 1 [RO,COW] → Frame 100 (ref=2)    │
│ Page 2 [RW]     → Frame 102 (ref=1)    │ ← New frame!
└─────────────────────────────────────────┘

Note: Only the page that was written gets copied. Pages that are never written remain shared forever.

COW Makes fork() Practical

Without COW, fork() would be prohibitively expensive for large processes. A 4GB browser forking would require allocating and copying 4GB of memory. With COW, fork() completes in microseconds regardless of process size. Pages are copied only when actually modified, and read-only pages (like code) are never copied at all.

Shared Memory for Inter-Process Communication

While pipes, sockets, and message queues are common IPC mechanisms, they all involve copying data—from the sender's address space into kernel buffers, then from kernel buffers into the receiver's address space. For high-performance communication, this copying is unacceptable.

Shared memory IPC eliminates all copying. Both processes map the same physical frames, so data written by one is immediately visible to the other.

Performance Comparison:

IPC Method Performance Characteristics
IPC Method	Copies per Message	Syscalls per Message	Latency	Throughput
Pipe/Socket	2 (sender→kernel→receiver)	2 (write/read)	Medium	Medium
Message Queue	2 (sender→kernel→receiver)	2 (msgsnd/msgrcv)	Medium	Medium
Shared Memory	0 (direct access)	0 (after setup)	Lowest	Highest

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// POSIX Shared Memory API (Linux/macOS/BSD)
#include <sys/mman.h>
#include <fcntl.h>
 
// ===== CREATE/OPEN SHARED MEMORY =====
// Open or create named shared memory object
int shm_fd = shm_open("/my_shared_mem", 
                      O_CREAT | O_RDWR,  // Create if not exists
                      0666);              // Permissions
 
// Set the size
ftruncate(shm_fd, 4096);  // 4KB region
 
// Map into address space
void *ptr = mmap(NULL,                // Let OS choose address
                 4096,                 // Size
                 PROT_READ | PROT_WRITE,
                 MAP_SHARED,           // Changes visible to others
                 shm_fd,
                 0);                   // Offset
 
// Now both processes can use ptr to read/write shared data!
 
// ===== IMPORTANT: Synchronization required! =====
// Shared memory provides no synchronization
// You MUST use semaphores, mutexes, or atomics to prevent races
 
// ===== CLEANUP =====
munmap(ptr, 4096);        // Unmap from this process
close(shm_fd);            // Close file descriptor
shm_unlink("/my_shared_mem");  // Remove shared memory object

Synchronization is Mandatory

Shared memory provides no synchronization whatsoever. Without external synchronization (semaphores, mutexes, or atomic operations), simultaneous reads and writes will cause race conditions, data corruption, and subtle bugs. Always pair shared memory with appropriate synchronization primitives.

Memory-Mapped Files: Sharing with Persistence

Memory-mapped files extend the sharing concept to include disk files. Instead of using read() and write() system calls, a file is mapped into the process's address space. The file's contents appear as memory, and modifications are (eventually) written back to disk.

How It Works:

Memory-Mapped File Mechanics

•Mapping Request: Process calls mmap() with a file descriptor
•Kernel Setup: Kernel creates VMA (Virtual Memory Area) for the mapped region, but doesn't actually load data yet
•First Access: When the process accesses the mapped memory, a page fault occurs
•Demand Loading: The kernel loads the relevant file page into a physical frame, maps it, and resumes execution
•Writes: If the process writes to the mapped memory, the page is marked dirty
•Write-Back: Kernel periodically writes dirty pages back to the file (or at msync/munmap)

Sharing Memory-Mapped Files:

When multiple processes map the same file with MAP_SHARED, they share the same physical frames:

Process A                    Physical Memory              Process B
┌──────────────┐            ┌──────────────┐             ┌──────────────┐
│ VA: 0x10000  │────────────▶ Page Cache    ◀────────────│ VA: 0x20000  │
│ file offset 0│            │ Frame 500     │             │ file offset 0│
└──────────────┘            │ (file page 0) │             └──────────────┘
                            └──────────────┘
                                   │
                                   ▼
                            ┌──────────────┐
                            │   Disk File  │
                            │   data.bin   │
                            └──────────────┘

This enables:

Database buffer pools: Multiple database connections share cached pages
Inter-process data files: Configuration, shared state, communication
Memory-mapped I/O: Treat device memory as regular memory

mmap_example.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Memory-mapped file example
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
 
int main() {
    // Open the file
    int fd = open("data.bin", O_RDWR);
    
    // Get file size
    off_t size = lseek(fd, 0, SEEK_END);
    
    // Map the file into memory
    void *mapped = mmap(NULL,           // Let OS choose address
                        size,            // Map entire file
                        PROT_READ | PROT_WRITE,
                        MAP_SHARED,      // Changes visible to others
                        fd,              //                         file descriptor
                        0);              // Start at beginning of file
    
    if (mapped == MAP_FAILED) {
        perror("mmap failed");
        return 1;
    }
    
    // Now we can access file contents as memory!
    int *data = (int *)mapped;
    data[0] = 42;  // This writes to the file!
    
    // Changes written to disk automatically, but we can force it:
    msync(mapped, size, MS_SYNC);
    
    // Cleanup
    munmap(mapped, size);
    close(fd);
    
    return 0;
}

When to Use Memory-Mapped Files

Memory-mapped files excel for random access to large files, when multiple processes need to share file data, for read-only access to configuration/data files, and when simplifying file I/O code. They're less suitable for sequential-only access (read() is fine), small files (setup overhead not worthwhile), or when you need precise control over when writes occur.

Sharing Challenges and Considerations

Memory sharing introduces complexities that don't exist with isolated processes. Understanding these challenges is essential for correctly using shared memory.

Key Sharing Challenges

•Coherence and Consistency — When multiple processors access shared memory, CPU caches can hold different values for the same location. Hardware cache coherence protocols (MESI, MOESI) handle this, but it adds latency and complexity.
•False Sharing — When unrelated variables happen to be on the same cache line, writes by one CPU invalidate the cache for another CPU even though they're accessing different data. This destroys performance.
•Race Conditions — Without proper synchronization, concurrent access leads to lost updates, corrupted data, and non-deterministic behavior. Shared memory requires explicit synchronization.
•Security Implications — Shared regions create information flow channels between processes. A bug or vulnerability in one process can corrupt or leak data affecting others.
•Complex Lifetime Management — With multiple users, determining when to free shared memory is tricky. Reference counting helps but introduces its own issues (circular references, forgotten unmaps).
•Debugging Difficulty — Problems in shared memory are hard to reproduce and diagnose. A race condition might occur once in a million executions.

False Sharing in Detail:

False sharing is a particularly subtle performance problem. Consider this code:

struct { int counter_a; int counter_b; } shared;

// Thread A                    // Thread B
while(1) {                     while(1) {
    shared.counter_a++;            shared.counter_b++;
}                              }

Logically, threads A and B access different variables—no data race exists. But if counter_a and counter_b are on the same cache line (64 bytes on most systems), every write by thread A invalidates thread B's cache line and vice versa. Performance may be 10-100x worse than expected.

Solution: Pad structures to ensure separate cache lines:

struct {
    int counter_a;
    char padding[60];  // Ensure counter_b is on a different cache line
    int counter_b;
} shared;

Modern compilers offer alignas() and specialized types (C++11's std::atomic_ref with alignment) to handle this correctly.

Sharing Requires Discipline

Shared memory is the fastest IPC mechanism, but also the most dangerous. Without rigorous synchronization and careful design, bugs are almost guaranteed. For most applications, higher-level mechanisms (message passing, RPC, channels) are safer. Reserve shared memory for performance-critical paths where the complexity is justified.

Summary: Memory Sharing

This page explored memory sharing as the third fundamental goal of memory management. Let's consolidate the key concepts:

Key Takeaways

•Memory sharing allows multiple processes to access the same physical memory, eliminating duplication and enabling communication.
•Implementation is through page tables: multiple PTEs can point to the same physical frame with appropriate reference counting.
•Shared libraries are the primary sharing use case, saving enormous amounts of memory in multi-process systems.
•Position-Independent Code (PIC) enables libraries to work at any virtual address, which is essential since different processes may load them at different locations.
•Copy-on-Write (COW) defers copying until actually needed, making fork() nearly instantaneous regardless of process size.
•Shared memory IPC provides zero-copy communication between processes but requires explicit synchronization.
•Memory-mapped files treat disk files as memory, automatically sharing pages between processes mapping the same file.
•Challenges include cache coherence, false sharing, race conditions, and complex lifetime management.

Beyond Physical Organization:

We've now covered three of the five memory management goals: allocation, protection, and sharing. The next page explores Logical Organization—how the OS organizes memory to match programmer expectations and enable modular program design.

Page Complete

You now understand how memory sharing works, why it's essential, and the challenges it introduces. Sharing complements protection—together they enable efficient, safe multiprogramming. Next, we'll examine how memory is organized from the programmer's logical perspective.

3 / 5

Loading learning content...

Operating SystemsMemory Management Goals

Memory Management Goals

LevelIntermediate

Duration90 mins

TopicMemory Management Goals

3 / 5

Sharing

The Paradox of Controlled Access

This is absurd waste. The library code is read-only—it's identical in every process. Why not have all 100 processes share a single physical copy?

What You Will Learn

Why Share Memory?

Memory sharing serves three primary purposes in operating systems: efficiency, communication, and functionality. Each represents a different use case with different requirements.

Primary Motivations for Memory Sharing

•Efficiency Through Deduplication — Shared libraries, executables run by multiple users, and common system components would consume enormous memory if duplicated per process. Sharing eliminates this redundancy.
•Fast Inter-Process Communication — Processes need to exchange data. While pipes and sockets work, shared memory enables zero-copy communication—the ultimate in IPC speed.
•Memory-Mapped Files — Mapping files into memory allows file I/O through ordinary memory operations. Multiple processes mapping the same file naturally share the physical pages.
•Fork Optimization — When a process forks, the child inherits the parent's address space. Copy-on-write sharing makes fork() nearly instantaneous regardless of process size.

Memory Savings Through Sharing (Typical Server)
Component	Size	Without Sharing (100 processes)	With Sharing
C Library (libc)	2 MB	200 MB	2 MB
GUI Toolkit (Qt/GTK)	20 MB	2 GB	20 MB
Language Runtime (Java/Python)	50 MB	5 GB	50 MB
Kernel Code (mapped read-only)	10 MB	1 GB	10 MB
Total	—	8.2 GB	82 MB

The Sharing-Protection Balance:

Sharing and protection might seem contradictory:

Protection says: "Each process can only access its own memory"
Sharing says: "Multiple processes should access the same memory"

The resolution lies in controlled sharing:

Protection prevents unauthorized access
Sharing allows authorized access to common regions
The OS and hardware enforce which regions are shared and with what permissions

Sharing is Opt-In, Not Default

How Sharing Works: Virtual Memory Mechanics

Memory sharing is implemented through the same virtual memory mechanisms used for protection. The key insight: multiple page table entries can point to the same physical frame.

Process A Page Table          Physical Memory         Process B Page Table
┌──────────────────┐          ┌─────────────┐        ┌──────────────────┐
│ VA 0x1000        │          │             │        │ VA 0x5000        │
│ → Frame 500 ─────┼─────────►│  Frame 500  │◄───────┼─── Frame 500    │
│ (R-X, User)      │          │  (libc code)│        │ (R-X, User)      │
└──────────────────┘          │             │        └──────────────────┘
                              └─────────────┘

In this example:

Process A maps virtual address 0x1000 to physical frame 500
Process B maps virtual address 0x5000 to the same physical frame 500
Both can read and execute the code, but neither can write (R-X permissions)
Only one copy of the code exists in physical memory

Important Observations:

Different Virtual Addresses, Same Physical Frame

The virtual addresses don't need to match
Process A might see shared library at 0x7f123000
Process B might see the same library at 0x7f456000
ASLR intentionally randomizes these addresses for security

Reference Counting

The OS tracks how many processes are using each physical frame
A shared frame with 50 users has reference count 50
The frame is only freed when the reference count drops to zero
This prevents freeing memory still in use by other processes

Consistent Permissions

Typically, all sharers have the same permissions (e.g., read-only for shared code)
Different permissions are possible but require careful handling
Copy-on-write is a special case where permissions change dynamically

shared_mapping.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Simplified: Creating a shared mapping
function create_shared_mapping(process, virtual_addr, size, permissions):
    // Find or create the shared memory object
    shmem = find_shared_memory_object(key)
    if shmem is NULL:
        shmem = create_shared_memory_object(size)
        allocate_physical_frames(shmem, size)
    
    // Map into this process's address space
    for page in range(0, size, PAGE_SIZE):
        vpage = virtual_addr + page
        pframe = shmem.frames[page / PAGE_SIZE]
        
        // Create page table entry pointing to shared frame
        create_pte(process.page_table, vpage, pframe, permissions)
        
        // Increment reference count on physical frame
        pframe.ref_count += 1
    
    return virtual_addr
 
// When unmapping shared memory
function unmap_shared_memory(process, virtual_addr, size):
    for page in range(0, size, PAGE_SIZE):
        pte = get_pte(process.page_table, virtual_addr + page)
        pframe = pte.frame
        
        // Remove mapping
        invalidate_pte(pte)
        
        // Decrement reference count
        pframe.ref_count -= 1
        
        // Only free frame if no one else is using it
        if pframe.ref_count == 0:
            free_frame(pframe)

TLB and Sharing

Shared Libraries: The Primary Sharing Use Case

Static vs. Dynamic Linking:

Static Linking

•Library code copied into executable at link time
•Each executable contains its own copy
•Large executables
•No external dependencies
•Updates require recompiling all programs
•No sharing—100 processes = 100 copies in RAM

Dynamic Linking

•Library loaded at runtime by dynamic linker
•Single shared .so/.dll file on disk
•Small executables
•Requires library at runtime
•Library updates benefit all programs
•Full sharing—100 processes = 1 copy in RAM

How Shared Library Loading Works:

Process Startup: The dynamic linker (ld.so on Linux, dyld on macOS) runs before main()
Dependency Resolution: The linker identifies which shared libraries are needed
Library Loading: For each library:
- If already in memory (used by another process), map the existing frames
- If not in memory, load from disk and map
Symbol Resolution: Resolve function and variable references to library addresses
Execution: Program runs with all libraries available in its address space

Position-Independent Code (PIC):

For sharing to work, library code cannot contain absolute addresses (which would only work at one specific virtual address). Instead, shared libraries are compiled as Position-Independent Code:

All internal references are PC-relative (relative to the current instruction)
External references go through the Global Offset Table (GOT) and Procedure Linkage Table (PLT)
The library works correctly regardless of where it's loaded in virtual memory

// Non-PIC (problematic for sharing):
mov eax, [0x12345678]    // Absolute address - only works at one location

// PIC (works anywhere):
lea rbx, [rip + got]     // Get address of GOT relative to current instruction
mov eax, [rbx + offset]  // Access through GOT

The PIC Overhead Trade-off

Copy-on-Write: Lazy Sharing Optimization

Copy-on-Write (COW) is one of the most elegant optimizations in operating systems. It allows memory to be shared initially, with copying deferred until actually necessary—which may be never.

The Fork Problem:

The fork() system call creates a new process as an exact copy of the parent. Naively, this would require:

Allocating new physical frames equal to the parent's entire memory
Copying all data from parent frames to child frames

For a 1GB process, this means allocating and copying 1GB of memory—even if the child immediately exec()s a different program, discarding all that copied data.

The COW Solution:

Instead of copying, share everything:

Mark all the parent's pages as read-only
Give the child a copy of the page table (pointing to the same frames)
Increment reference counts on all shared frames
Fork completes instantly—no actual copying!

copy_on_write.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// Fork with copy-on-write
function fork():
    child = create_process()
    
    // Copy page table structure (but not frame data)
    child.page_table = clone_page_table(parent.page_table)
    
    for each pte in parent.page_table:
        if pte.present and pte.writable:
            // Mark as read-only and copy-on-write
            pte.writable = false
            pte.cow_flag = true  // Custom flag to track COW pages
            child_pte.writable = false
            child_pte.cow_flag = true
        
        // Increment reference count
        pte.frame.ref_count += 1
    
    // Flush TLB (protection bits changed)
    flush_tlb()
    
    return child
 
// Handle write fault on COW page
function handle_cow_fault(process, virtual_addr):
    pte = get_pte(process.page_table, virtual_addr)
    
    if not pte.cow_flag:
        // Not a COW page - genuine protection violation
        send_signal(process, SIGSEGV)
        return
    
    old_frame = pte.frame
    
    if old_frame.ref_count == 1:
        // We're the only user - just make it writable again
        pte.writable = true
        pte.cow_flag = false
    else:
        // Others are sharing - need to actually copy
        new_frame = allocate_frame()
        copy_frame_contents(old_frame, new_frame)
        
        pte.frame = new_frame
        pte.writable = true
        pte.cow_flag = false
        
        old_frame.ref_count -= 1
    
    // Retry the write instruction

COW in Action: The Timeline

Time T0: Before Fork
┌─────────────────────────────────────────┐
│ Parent Process                          │
│ Page 1 [RW] → Frame 100 (ref=1)        │
│ Page 2 [RW] → Frame 101 (ref=1)        │
└─────────────────────────────────────────┘

Time T1: After Fork (COW setup)
┌─────────────────────────────────────────┐
│ Parent Process                          │
│ Page 1 [RO,COW] → Frame 100 (ref=2)    │
│ Page 2 [RO,COW] → Frame 101 (ref=2)    │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ Child Process                           │
│ Page 1 [RO,COW] → Frame 100 (ref=2)    │
│ Page 2 [RO,COW] → Frame 101 (ref=2)    │
└─────────────────────────────────────────┘

Time T2: Child writes to Page 2 (COW triggered)
┌─────────────────────────────────────────┐
│ Parent Process                          │
│ Page 1 [RO,COW] → Frame 100 (ref=2)    │
│ Page 2 [RO,COW] → Frame 101 (ref=1)    │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ Child Process                           │
│ Page 1 [RO,COW] → Frame 100 (ref=2)    │
│ Page 2 [RW]     → Frame 102 (ref=1)    │ ← New frame!
└─────────────────────────────────────────┘

Note: Only the page that was written gets copied. Pages that are never written remain shared forever.

COW Makes fork() Practical

Shared Memory for Inter-Process Communication

Shared memory IPC eliminates all copying. Both processes map the same physical frames, so data written by one is immediately visible to the other.

Performance Comparison:

IPC Method Performance Characteristics
IPC Method	Copies per Message	Syscalls per Message	Latency	Throughput
Pipe/Socket	2 (sender→kernel→receiver)	2 (write/read)	Medium	Medium
Message Queue	2 (sender→kernel→receiver)	2 (msgsnd/msgrcv)	Medium	Medium
Shared Memory	0 (direct access)	0 (after setup)	Lowest	Highest

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// POSIX Shared Memory API (Linux/macOS/BSD)
#include <sys/mman.h>
#include <fcntl.h>
 
// ===== CREATE/OPEN SHARED MEMORY =====
// Open or create named shared memory object
int shm_fd = shm_open("/my_shared_mem", 
                      O_CREAT | O_RDWR,  // Create if not exists
                      0666);              // Permissions
 
// Set the size
ftruncate(shm_fd, 4096);  // 4KB region
 
// Map into address space
void *ptr = mmap(NULL,                // Let OS choose address
                 4096,                 // Size
                 PROT_READ | PROT_WRITE,
                 MAP_SHARED,           // Changes visible to others
                 shm_fd,
                 0);                   // Offset
 
// Now both processes can use ptr to read/write shared data!
 
// ===== IMPORTANT: Synchronization required! =====
// Shared memory provides no synchronization
// You MUST use semaphores, mutexes, or atomics to prevent races
 
// ===== CLEANUP =====
munmap(ptr, 4096);        // Unmap from this process
close(shm_fd);            // Close file descriptor
shm_unlink("/my_shared_mem");  // Remove shared memory object

Synchronization is Mandatory

Memory-Mapped Files: Sharing with Persistence

How It Works:

Memory-Mapped File Mechanics

•Mapping Request: Process calls mmap() with a file descriptor
•Kernel Setup: Kernel creates VMA (Virtual Memory Area) for the mapped region, but doesn't actually load data yet
•First Access: When the process accesses the mapped memory, a page fault occurs
•Demand Loading: The kernel loads the relevant file page into a physical frame, maps it, and resumes execution
•Writes: If the process writes to the mapped memory, the page is marked dirty
•Write-Back: Kernel periodically writes dirty pages back to the file (or at msync/munmap)

Sharing Memory-Mapped Files:

When multiple processes map the same file with MAP_SHARED, they share the same physical frames:

Process A                    Physical Memory              Process B
┌──────────────┐            ┌──────────────┐             ┌──────────────┐
│ VA: 0x10000  │────────────▶ Page Cache    ◀────────────│ VA: 0x20000  │
│ file offset 0│            │ Frame 500     │             │ file offset 0│
└──────────────┘            │ (file page 0) │             └──────────────┘
                            └──────────────┘
                                   │
                                   ▼
                            ┌──────────────┐
                            │   Disk File  │
                            │   data.bin   │
                            └──────────────┘

This enables:

Database buffer pools: Multiple database connections share cached pages
Inter-process data files: Configuration, shared state, communication
Memory-mapped I/O: Treat device memory as regular memory

mmap_example.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Memory-mapped file example
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
 
int main() {
    // Open the file
    int fd = open("data.bin", O_RDWR);
    
    // Get file size
    off_t size = lseek(fd, 0, SEEK_END);
    
    // Map the file into memory
    void *mapped = mmap(NULL,           // Let OS choose address
                        size,            // Map entire file
                        PROT_READ | PROT_WRITE,
                        MAP_SHARED,      // Changes visible to others
                        fd,              //                         file descriptor
                        0);              // Start at beginning of file
    
    if (mapped == MAP_FAILED) {
        perror("mmap failed");
        return 1;
    }
    
    // Now we can access file contents as memory!
    int *data = (int *)mapped;
    data[0] = 42;  // This writes to the file!
    
    // Changes written to disk automatically, but we can force it:
    msync(mapped, size, MS_SYNC);
    
    // Cleanup
    munmap(mapped, size);
    close(fd);
    
    return 0;
}

When to Use Memory-Mapped Files

Sharing Challenges and Considerations

Memory sharing introduces complexities that don't exist with isolated processes. Understanding these challenges is essential for correctly using shared memory.

Key Sharing Challenges

•Coherence and Consistency — When multiple processors access shared memory, CPU caches can hold different values for the same location. Hardware cache coherence protocols (MESI, MOESI) handle this, but it adds latency and complexity.
•False Sharing — When unrelated variables happen to be on the same cache line, writes by one CPU invalidate the cache for another CPU even though they're accessing different data. This destroys performance.
•Race Conditions — Without proper synchronization, concurrent access leads to lost updates, corrupted data, and non-deterministic behavior. Shared memory requires explicit synchronization.
•Security Implications — Shared regions create information flow channels between processes. A bug or vulnerability in one process can corrupt or leak data affecting others.
•Complex Lifetime Management — With multiple users, determining when to free shared memory is tricky. Reference counting helps but introduces its own issues (circular references, forgotten unmaps).
•Debugging Difficulty — Problems in shared memory are hard to reproduce and diagnose. A race condition might occur once in a million executions.

False Sharing in Detail:

False sharing is a particularly subtle performance problem. Consider this code:

struct { int counter_a; int counter_b; } shared;

// Thread A                    // Thread B
while(1) {                     while(1) {
    shared.counter_a++;            shared.counter_b++;
}                              }

Solution: Pad structures to ensure separate cache lines:

struct {
    int counter_a;
    char padding[60];  // Ensure counter_b is on a different cache line
    int counter_b;
} shared;

Modern compilers offer alignas() and specialized types (C++11's std::atomic_ref with alignment) to handle this correctly.

Sharing Requires Discipline

Summary: Memory Sharing

This page explored memory sharing as the third fundamental goal of memory management. Let's consolidate the key concepts:

Key Takeaways

•Memory sharing allows multiple processes to access the same physical memory, eliminating duplication and enabling communication.
•Implementation is through page tables: multiple PTEs can point to the same physical frame with appropriate reference counting.
•Shared libraries are the primary sharing use case, saving enormous amounts of memory in multi-process systems.
•Position-Independent Code (PIC) enables libraries to work at any virtual address, which is essential since different processes may load them at different locations.
•Copy-on-Write (COW) defers copying until actually needed, making fork() nearly instantaneous regardless of process size.
•Shared memory IPC provides zero-copy communication between processes but requires explicit synchronization.
•Memory-mapped files treat disk files as memory, automatically sharing pages between processes mapping the same file.
•Challenges include cache coherence, false sharing, race conditions, and complex lifetime management.

Beyond Physical Organization:

Page Complete

3 / 5