Operating SystemsInter-Process Communication

IPC Overview

LevelIntermediate

Duration60 mins

TopicInter-Process Communication

2 / 5

Shared Memory Model

Communicating Through Common Ground

When two people need to collaborate, one approach is to give them a shared whiteboard. Both can write on it, both can read from it, changes appear instantly, and there's no middleman passing notes. This is the essence of shared memory IPC.

In the shared memory model, multiple processes map the same physical memory region into their respective virtual address spaces. Once established, processes can read and write this shared region as if it were regular memory—at processor speeds, without kernel involvement for each access.

This approach offers the highest possible performance for IPC. But like a shared whiteboard with no rules about who writes where and when, shared memory without proper synchronization leads to chaos. Understanding both the power and the pitfalls of shared memory is essential for systems programming.

What You Will Learn

By the end of this page, you will understand how shared memory works at the hardware and OS level, the System V and POSIX APIs for creating and mapping shared memory, why synchronization is mandatory (not optional), common patterns for safe shared memory usage, and when shared memory is the right choice.

The Shared Memory Concept

Shared memory is deceptively simple in concept: take a region of physical memory and make it accessible from multiple process address spaces. But understanding the implementation requires grasping how virtual memory works.

Virtual to Physical Mapping

Recall that each process has its own virtual address space. The Memory Management Unit (MMU) translates virtual addresses to physical addresses using page tables. Normally, each process's page tables point to distinct physical pages—this is what provides isolation.

Shared memory works by having multiple processes' page tables point to the same physical pages:

shared_memory_mapping.txt
Virtual Address Space              Physical Memory
═══════════════════════════        ═══════════════
 
Process A                          Physical RAM
┌─────────────────────┐            ┌─────────────────────┐
│ ...                 │            │                     │
├─────────────────────┤            │                     │
│ 0x7fff00000000      │──┐         │                     │
│ [Shared Region]     │  │   ┌────►│ Physical Page 0x1A3 │◄────┐
│ 4096 bytes          │  │   │     │ [Shared Data Here]  │     │
├─────────────────────┤  │   │     │                     │     │
│ ...                 │  │   │     │                     │     │
└─────────────────────┘  │   │     │                     │     │
                         │   │     │                     │     │
    Page Table A         │   │     └─────────────────────┘     │
    ┌────────────────┐   │   │                                 │
    │ VPN 0x7fff0000 ├───┴───┘                                 │
    │ → PFN 0x1A3    │                                         │
    └────────────────┘                                         │
                                                               │
Process B                              Page Table B            │
┌─────────────────────┐               ┌────────────────┐       │
│ ...                 │               │ VPN 0x4000000  ├───────┘
├─────────────────────┤               │ → PFN 0x1A3    │
│ 0x40000000          │──────────────►└────────────────┘
│ [Shared Region]     │
│ 4096 bytes          │
├─────────────────────┤    Same physical page (0x1A3) is
│ ...                 │    mapped at different virtual
└─────────────────────┘    addresses in each process!

Key Observations:

Same physical memory, different virtual addresses: Process A sees the shared region at 0x7fff00000000; Process B sees it at 0x40000000. The virtual addresses differ, but the physical page is identical.
No kernel involvement for access: Once mapped, reading and writing the shared region is just normal memory access. The CPU's memory bus handles it directly—no system call overhead.
Cache coherency is automatic: Modern multi-core CPUs maintain cache coherency through hardware protocols (like MESI). When one core writes to shared memory, other cores see the update (eventually—we'll discuss memory ordering later).
Explicit setup required: Unlike normal process memory, shared memory must be explicitly created and attached. This is where the IPC system calls come in.

Persistence of Shared Memory

Shared memory segments exist independently of any process. A process creates a segment, and it persists until explicitly destroyed—even if the creating process terminates. This is both a feature (allows processes to communicate after restarts) and a hazard (leaked segments consume memory indefinitely).

Creating Shared Memory: APIs and Patterns

There are three primary mechanisms for creating shared memory in Unix-like systems:

System V Shared Memory (shmget, shmat)
POSIX Shared Memory (shm_open, mmap)
Memory-Mapped Files (open, mmap with MAP_SHARED)

Each has distinct characteristics. Let's examine them in detail.

System V shared memory is the oldest IPC shared memory mechanism, dating to Unix System V in 1983. It remains widely supported and is still used in many production systems.

API Overview:

sysv_shared_memory.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <sys/ipc.h>
#include <sys/shm.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
 
int main() {
    // Step 1: Generate a key
    // ftok() creates a key from a file path and project ID
    // The file must exist; the path and ID together create a unique key
    key_t key = ftok("/tmp/myapp", 'A');
    if (key == -1) {
        perror("ftok failed");
        exit(1);
    }
    
    // Step 2: Create or get the shared memory segment
    // Arguments: key, size in bytes, flags
    // IPC_CREAT: create if doesn't exist
    // IPC_EXCL: fail if already exists (use with IPC_CREAT for exclusive create)
    // 0666: permissions (read/write for owner, group, others)
    int shmid = shmget(key, 4096, IPC_CREAT | 0666);
    if (shmid == -1) {
        perror("shmget failed");
        exit(1);
    }
    printf("Created/accessed shared memory segment ID: %d\n", shmid);
    
    // Step 3: Attach the segment to our address space
    // shmat() maps the segment into our virtual address space
    // Arguments: segment ID, preferred address (NULL = let kernel choose), flags
    // Returns the address where the segment is attached
    void *addr = shmat(shmid, NULL, 0);
    if (addr == (void*)-1) {
        perror("shmat failed");
        exit(1);
    }
    printf("Attached at address: %p\n", addr);
    
    // Step 4: Use the shared memory like regular memory
    strcpy((char*)addr, "Hello from Process A!");
    
    // Step 5: Detach when done (optional before exit, but good practice)
    // This only removes mapping from this process; segment still exists
    if (shmdt(addr) == -1) {
        perror("shmdt failed");
        exit(1);
    }
    
    // Step 6: Optionally remove the segment entirely
    // Only do this when no processes need it anymore
    // if (shmctl(shmid, IPC_RMID, NULL) == -1) {
    //     perror("shmctl IPC_RMID failed");
    // }
    
    return 0;
}

System V Shared Memory Characteristics

•Numeric keys: Identified by key_t values generated via ftok() or manually
•Persistent until deleted: Survives process termination; must use shmctl(shmid, IPC_RMID, NULL) to remove
•Fixed size at creation: Size is set when segment is created and cannot be resized
•System-wide limits: /proc/sys/kernel/shmmax, /proc/sys/kernel/shmmni control max sizes
•ipcs/ipcrm commands: List (ipcs -m) and remove (ipcrm -m shmid) segments from shell

Choosing Between Mechanisms

Use POSIX shared memory (shm_open) for in-memory IPC with modern, clean APIs. Use memory-mapped files when you need persistence to disk or want file-based access control. Use System V shared memory only for legacy compatibility or on systems lacking POSIX shared memory support.

The Synchronization Imperative

Shared memory provides the fastest possible IPC—but this speed comes with responsibility. Shared memory provides no automatic synchronization whatsoever. The operating system maps the memory and then steps back. What happens next is entirely up to your code.

This section is critical. More bugs in shared memory systems come from synchronization failures than from the actual mapping code.

The Fundamental Danger

Without proper synchronization, two processes writing to overlapping memory regions will produce unpredictable results. This isn't a 'sometimes fails' situation—it's a 'silently corrupts data in ways you won't notice until production' situation.

Why Is Synchronization Necessary?

Consider a simple counter shared between two processes:

race_condition_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Shared memory contains:
struct shared_data {
    int counter;  // Both processes increment this
};
 
// Process A and Process B both run:
void increment_counter(struct shared_data *shared) {
    shared->counter = shared->counter + 1;  // Looks atomic, but ISN'T!
}
 
// This simple line compiles to multiple CPU instructions:
// 1. LOAD  shared->counter from memory into register
// 2. ADD   1 to the register
// 3. STORE register value back to shared->counter
 
// Race condition scenario:
// Time   Process A                   Process B                   Memory
// ─────────────────────────────────────────────────────────────────────
//  t0    counter = 0                                              0
//  t1    LOAD counter (=0)                                        0
//  t2                                LOAD counter (=0)            0
//  t3    ADD 1 (register=1)                                       0
//  t4                                ADD 1 (register=1)           0
//  t5    STORE counter (=1)                                       1
//  t6                                STORE counter (=1)           1  ← BUG!
//
// Expected: counter = 2 (incremented twice)
// Actual:   counter = 1 (one increment lost!)

Memory Ordering and Visibility

The problem goes deeper than race conditions. Modern CPUs and compilers reorder memory operations for performance. Without proper memory barriers:

Writes may not be immediately visible: Process A writes to location X, then Y. Process B might see the new Y but the old X.
Reads may return stale values: Process B's CPU cache might hold an old copy of memory that hasn't been invalidated yet.
Compilers reorder code: The compiler might reorder statements for optimization, breaking assumptions about execution order.

Synchronization Primitives for Shared Memory:

To use shared memory safely, you need synchronization primitives:

Synchronization Primitives for Shared Memory
Primitive	Mechanism	Use Case	API
Semaphore	Integer counter with atomic decrement/increment	Counting resources, binary mutex behavior	POSIX: `sem_open()`, `sem_wait()`, `sem_post()`
Mutexes	Binary lock for mutual exclusion	Protecting critical sections in shared memory	`pthread_mutex_t` with `PTHREAD_PROCESS_SHARED`
Condition Variables	Wait for a condition with mutex protection	Producer-consumer, event notification	`pthread_cond_t` with `PTHREAD_PROCESS_SHARED`
Read-Write Locks	Multiple readers or one writer	Read-heavy workloads with occasional writes	`pthread_rwlock_t` with `PTHREAD_PROCESS_SHARED`
Spin Locks	Busy-wait lock for short critical sections	Very short operations, low contention	Atomic operations or `pthread_spin_lock`

synchronized_shared_memory.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <fcntl.h>
#include <sys/mman.h>
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
 
// Shared memory structure with embedded synchronization
typedef struct {
    pthread_mutex_t mutex;  // MUST be in shared memory too!
    int counter;
} shared_data_t;
 
int main() {
    // Create shared memory
    int fd = shm_open("/safe_shared", O_CREAT | O_RDWR, 0666);
    ftruncate(fd, sizeof(shared_data_t));
    
    shared_data_t *shared = mmap(NULL, sizeof(shared_data_t),
                                  PROT_READ | PROT_WRITE,
                                  MAP_SHARED, fd, 0);
    close(fd);
    
    // Initialize mutex with PTHREAD_PROCESS_SHARED attribute
    // This is CRITICAL - default mutexes only work within one process!
    pthread_mutexattr_t attr;
    pthread_mutexattr_init(&attr);
    pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
    pthread_mutex_init(&shared->mutex, &attr);
    pthread_mutexattr_destroy(&attr);
    
    shared->counter = 0;
    
    // Safe increment - works correctly across processes
    pthread_mutex_lock(&shared->mutex);
    shared->counter++;  // Protected by mutex
    pthread_mutex_unlock(&shared->mutex);
    
    printf("Counter: %d\n", shared->counter);
    
    // Cleanup...
    // Note: Must coordinate mutex destruction across all processes!
    
    return 0;
}

PTHREAD_PROCESS_SHARED is Essential

Default pthread mutexes and condition variables only work between threads of the same process. For inter-process synchronization, you MUST set the PTHREAD_PROCESS_SHARED attribute. Forgetting this is a common bug that may appear to work during testing but fails under load.

Common Shared Memory Patterns

Certain patterns appear repeatedly in shared memory systems. Understanding these patterns helps you design robust shared memory IPC.

Pattern 1: Producer-Consumer with Circular Buffer

A classic pattern where one process writes data and another reads it, using a fixed-size buffer:

circular_buffer_shm.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <stdatomic.h>
#include <stdint.h>
 
#define BUFFER_SIZE 1024    // Must be power of 2 for efficient modulo
#define BUFFER_MASK (BUFFER_SIZE - 1)
 
typedef struct {
    // Use atomics for lock-free single-producer single-consumer
    atomic_uint write_idx;  // Only producer modifies
    atomic_uint read_idx;   // Only consumer modifies
    
    // Padding to avoid false sharing (cache line is typically 64 bytes)
    char pad1[64 - sizeof(atomic_uint)];
    char pad2[64 - sizeof(atomic_uint)];
    
    // The actual buffer
    uint8_t data[BUFFER_SIZE];
} spsc_queue_t;
 
// Producer side (only one producer!)
int push(spsc_queue_t *q, uint8_t value) {
    unsigned int write = atomic_load_explicit(&q->write_idx, memory_order_relaxed);
    unsigned int read = atomic_load_explicit(&q->read_idx, memory_order_acquire);
    
    // Check if buffer is full
    if (((write + 1) & BUFFER_MASK) == (read & BUFFER_MASK)) {
        return -1;  // Buffer full
    }
    
    // Write data
    q->data[write & BUFFER_MASK] = value;
    
    // Publish write (release semantics ensure data is visible before index update)
    atomic_store_explicit(&q->write_idx, write + 1, memory_order_release);
    
    return 0;
}
 
// Consumer side (only one consumer!)
int pop(spsc_queue_t *q, uint8_t *value) {
    unsigned int read = atomic_load_explicit(&q->read_idx, memory_order_relaxed);
    unsigned int write = atomic_load_explicit(&q->write_idx, memory_order_acquire);
    
    // Check if buffer is empty
    if ((read & BUFFER_MASK) == (write & BUFFER_MASK)) {
        return -1;  // Buffer empty
    }
    
    // Read data
    *value = q->data[read & BUFFER_MASK];
    
    // Publish read (release semantics)
    atomic_store_explicit(&q->read_idx, read + 1, memory_order_release);
    
    return 0;
}

Pattern 2: Read-Heavy Shared State with SeqLock

When readers vastly outnumber writers, a sequence lock provides excellent performance:

seqlock_pattern.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <stdatomic.h>
 
typedef struct {
    atomic_uint sequence;  // Even when unlocked, odd during write
    
    // Protected data - can be any structure
    int x;
    int y;
    int z;
} seqlock_data_t;
 
// Writer (must have external mutual exclusion between writers)
void write_data(seqlock_data_t *data, int x, int y, int z) {
    unsigned int seq = atomic_load(&data->sequence);
    atomic_store(&data->sequence, seq + 1);  // Mark as being written (odd)
    atomic_thread_fence(memory_order_release);
    
    // Write the data
    data->x = x;
    data->y = y;
    data->z = z;
    
    atomic_thread_fence(memory_order_release);
    atomic_store(&data->sequence, seq + 2);  // Mark complete (even again)
}
 
// Reader (multiple readers can proceed simultaneously)
int read_data(seqlock_data_t *data, int *x, int *y, int *z) {
    unsigned int seq1, seq2;
    
    do {
        seq1 = atomic_load_explicit(&data->sequence, memory_order_acquire);
        if (seq1 & 1) continue;  // Writer in progress, retry
        
        // Read the data
        *x = data->x;
        *y = data->y;
        *z = data->z;
        
        atomic_thread_fence(memory_order_acquire);
        seq2 = atomic_load_explicit(&data->sequence, memory_order_acquire);
    } while (seq1 != seq2);  // Retry if writer modified during read
    
    return 0;
}
 
// Readers never block writers, and writers never block readers!
// Readers may retry if a write occurs during their read.

Other Common Patterns

•Double Buffering: Two buffers alternate roles. Producer writes to one while consumer reads from the other. A single atomic pointer or flag indicates which buffer is active. Minimizes contention for graphics, audio, and sensor data.
•Shared Configuration: Read-only configuration data for multiple processes. One process updates (rarely), others read (frequently). RCU (Read-Copy-Update) or SeqLock patterns work well.
•Shared Cache/Memoization: Multiple processes share computed results. Hash tables in shared memory with appropriate locking per bucket or lock striping.
•Shared Counters/Statistics: Aggregated statistics across processes. Per-process counters with periodic aggregation, or atomic operations for low-contention counters.

Avoid Pointers in Shared Memory

Never store raw pointers in shared memory! Each process maps the shared region at a different virtual address, so pointers from one process are meaningless to another. Use offsets from the start of the shared region instead, or embed all data directly in the shared structure.

Performance Characteristics

Shared memory is the fastest IPC mechanism, but understanding why requires analyzing its performance characteristics.

Shared Memory vs. Other IPC Performance
Metric	Shared Memory	Pipes/Message Queues	Reason
Setup cost	Higher (mmap, shm_open)	Lower (pipe, socketpair)	Memory mapping has more kernel work
Per-operation cost	~0 (memory access)	System call overhead (1-10μs)	No kernel involvement for shm access
Data copy overhead	Zero copy	2 copies (user→kernel→user)	Data written directly to shared region
Synchronization	Explicit (your responsibility)	Implicit (kernel handles)	Shared memory requires manual sync
Cache behavior	Best (data in CPU cache)	Worse (kernel buffers involved)	Shared memory leverages CPU cache
Max throughput	Memory bandwidth (~50-100 GB/s)	Limited by copy bandwidth (~1-5 GB/s)	Direct memory access vs. copying

When Shared Memory Excels:

Large Data Transfers: Sending a 100MB data structure? With pipes, you copy 100MB into the kernel, then 100MB out—200MB of memory bandwidth consumed. With shared memory, the receiver accesses the same physical memory—zero copies.
High-Frequency Communication: If processes exchange data 100,000 times per second, the ~2μs system call overhead per pipe operation adds up to 200ms of pure overhead per second. Shared memory operations are memory accesses taking nanoseconds.
Random Access Patterns: Need to read just one field from a large shared structure? Shared memory accesses exactly those bytes. Pipes must send entire messages.
Multiple Readers: Multiple processes can simultaneously read shared memory without interference (with appropriate locking). Pipe data can only be read once.

Benchmark Example:

ipc_benchmark_results.txt
# Benchmark: Transfer 1 million 4KB messages between two processes
# (Single-threaded producer and consumer, Linux x86_64, pinned to same NUMA node)
 
Mechanism              Throughput        Latency (avg)    CPU Utilization
═══════════════════════════════════════════════════════════════════════════
Shared Memory (SPSC)   18.2 GB/s         ~50 ns           12% (spin wait)
Unix Domain Socket     2.3 GB/s          ~1.7 μs          48%
TCP Socket (loopback)  1.8 GB/s          ~2.2 μs          62%
Pipe                   1.9 GB/s          ~2.0 μs          55%
POSIX Message Queue    0.8 GB/s          ~4.8 μs          78%
 
# Notes:
# - Shared memory is 8-20x faster in throughput
# - Shared memory latency is ~40x lower
# - Socket/pipe CPU usage is higher due to system call overhead
# - Shared memory spins waiting for data (configurable)
 
# For mixed or bursty workloads, differences may be less dramatic due to:
# - CPU cache effects
# - Synchronization contention
# - Application processing time dominating IPC time

The Hidden Costs:

Shared memory's raw performance can be misleading. Consider these hidden costs:

Synchronization overhead: Mutex locks, especially contended ones, can add microseconds of latency
False sharing: If two processes access different variables that happen to be on the same cache line, cache invalidation traffic kills performance
Memory ordering complexity: Incorrect use of atomics or missing memory barriers causes subtle bugs that can take weeks to diagnose
Debugging difficulty: Shared memory bugs don't produce nice error messages. Data corruption manifests as seemingly random failures far from the actual bug.

The Complexity Cost

Shared memory is faster, but it's also harder to get right. A pipe that's 'slow enough' but correct will serve you better than shared memory that's fast but subtly corrupts data once a month. Choose shared memory when you've measured that other IPC is a bottleneck, not as a premature optimization.

Real-World Applications

Shared memory is used extensively in performance-critical systems. Understanding these real-world applications illustrates when shared memory is the right choice.

Production Use Cases

•PostgreSQL Shared Buffers: PostgreSQL uses a large shared memory segment to cache database pages. Multiple backend processes read and write this buffer cache, coordinated by lock managers. This avoids duplicating cached pages in each process's memory and ensures consistent buffer state.
•Chromium/Chrome Mojo: Chrome's multi-process architecture uses Mojo IPC, which often transfers large data (images, compiled code) via shared memory regions. The V8 JavaScript engine shares compiled code across renderer processes this way.
•Video/Audio Processing: Video editors, music production software, and real-time video processing tools use shared memory to pass frames between processing stages. At 4K 60fps, you're passing 1.4GB/s of data—shared memory is often the only viable option.
•Trading Systems: High-frequency trading platforms use shared memory for ultra-low-latency market data distribution. One process receives market data and writes to shared memory; multiple strategy processes read it with minimal latency.
•Scientific Computing: MPI (Message Passing Interface) implementations use shared memory for intra-node communication when processes are on the same machine, falling back to network for inter-node. This gives the best of both worlds.
•Game Engines: Multi-threaded game engines share world state, physics simulation results, and rendered frames between subsystems. While technically threads (not processes), the patterns are identical.

Case Study: PostgreSQL Shared Memory Architecture

PostgreSQL's use of shared memory is a master class in practical shared memory design:

postgresql_shared_memory.txt
┌──────────────────────────────────────────────────────────────────────┐
│                     PostgreSQL Shared Memory                          │
├──────────────────────────────────────────────────────────────────────┤
│  Postmaster Process (main)                                            │
│  ┌──────────────────────────────────────────────────────────────────┐│
│  │            Creates and initializes shared memory                 ││
│  └──────────────────────────────────────────────────────────────────┘│
│                              │                                        │
│                              ▼                                        │
│  ┌──────────────────────────────────────────────────────────────────┐│
│  │                    Shared Memory Segment                         ││
│  │ ┌────────────────────────────────────────────────────────────┐  ││
│  │ │  Shared Buffers (shared_buffers config parameter)          │  ││
│  │ │  - Cached database pages (typically 25% of RAM)            │  ││
│  │ │  - Buffer descriptors with pin counts, dirty flags         │  ││
│  │ │  - Protected by buffer manager locks                       │  ││
│  │ └────────────────────────────────────────────────────────────┘  ││
│  │ ┌────────────────────────────────────────────────────────────┐  ││
│  │ │  Lock Tables                                               │  ││
│  │ │  - Row locks, table locks, advisory locks                  │  ││
│  │ │  - Fast path for common cases                              │  ││
│  │ └────────────────────────────────────────────────────────────┘  ││
│  │ ┌────────────────────────────────────────────────────────────┐  ││
│  │ │  WAL Buffers (Write-Ahead Log)                             │  ││
│  │ │  - Transaction log before disk write                       │  ││
│  │ │  - Protected by WAL insert locks                           │  ││
│  │ └────────────────────────────────────────────────────────────┘  ││
│  │ ┌────────────────────────────────────────────────────────────┐  ││
│  │ │  Various Tables: CLOG, Subtrans, Proc Array, PGXACT...     │  ││
│  │ └────────────────────────────────────────────────────────────┘  ││
│  └──────────────────────────────────────────────────────────────────┘│
│               ▲              ▲             ▲                          │
│     ┌─────────┴───┐  ┌───────┴────┐  ┌─────┴──────┐                  │
│     │  Backend 1  │  │  Backend 2  │  │  Backend N  │                 │
│     │  (Client 1) │  │  (Client 2) │  │  (Client N) │                 │
│     └─────────────┘  └────────────┘  └────────────┘                  │
│     Each backend attaches to the same shared memory segment          │
└──────────────────────────────────────────────────────────────────────┘
 
Why PostgreSQL uses shared memory:
1. Buffer cache shared across all connections (no duplication)
2. Lock visibility - all processes see same lock state
3. Transaction status visible to all (MVCC coordination)
4. Extremely high-frequency access (every query touches shared_buffers)

PostgreSQL's Synchronization

PostgreSQL uses lightweight locks (LWLocks), spinlocks, and buffer pins for synchronization. The lock manager itself is in shared memory. This design evolved over 25+ years and handles thousands of concurrent connections efficiently—but it required immense expertise to get right.

Pitfalls and Best Practices

Shared memory's power comes with serious pitfalls. Here are the most common mistakes and how to avoid them.

Common Pitfalls

•Storing pointers: Virtual addresses differ between processes; pointers become garbage
•Forgetting PTHREAD_PROCESS_SHARED: Mutexes default to intra-process only
•Resource leaks: System V segments persist until explicit removal; forgetting cleanup wastes memory indefinitely
•Insufficient synchronization: Race conditions are silent killers
•False sharing: Unrelated data on same cache line causes invalidation storms
•Size mismatches: One process maps 4KB, another maps 8KB—undefined behavior when accessing past the smaller size

Best Practices

•Use offsets: Store offsets from base address, not pointers
•Always set process-shared: Explicitly initialize mutex attributes for multi-process
•Design cleanup handlers: Use shm_unlink, shmctl(IPC_RMID), or signal handlers
•Use proven patterns: Prefer established patterns like SPSC queues or SeqLocks
•Pad to cache lines: Align frequently-written data to 64-byte boundaries
•Version the structure: Include a version number to detect layout mismatches

shared_memory_best_practices.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
// Best Practice: Versioned, self-describing shared memory structure
 
#define SHM_VERSION 1
#define CACHE_LINE_SIZE 64
 
typedef struct {
    // Header - always at the start
    uint32_t version;       // Detect structure mismatches
    uint32_t size;          // Total size for validation
    uint32_t magic;         // 0xDEADBEEF to detect corruption
    
    // Synchronization primitives
    pthread_mutex_t mutex;
    pthread_cond_t cond;
    
    // Padding to align data to cache line
    char _pad[CACHE_LINE_SIZE - (sizeof(uint32_t)*3 + 
              sizeof(pthread_mutex_t) + sizeof(pthread_cond_t)) % CACHE_LINE_SIZE];
    
    // Application data - aligned to cache line
    struct {
        // Hot data that one process writes, others read
        // Separate cache lines for different writers!
        _Alignas(CACHE_LINE_SIZE) int producer_counter;
        _Alignas(CACHE_LINE_SIZE) int consumer_counter;
        
        // Bulk data
        _Alignas(CACHE_LINE_SIZE) char data[4096];
    } payload;
} shared_region_t;
 
_Static_assert(sizeof(shared_region_t) % CACHE_LINE_SIZE == 0,
               "shared_region_t must be cache-line aligned");
 
// Initialization with validation
shared_region_t* init_shared(const char *name, int create) {
    int flags = O_RDWR | (create ? O_CREAT | O_EXCL : 0);
    int fd = shm_open(name, flags, 0666);
    if (fd < 0) return NULL;
    
    if (create) {
        ftruncate(fd, sizeof(shared_region_t));
    }
    
    shared_region_t *shm = mmap(NULL, sizeof(shared_region_t),
                                 PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    close(fd);
    
    if (shm == MAP_FAILED) return NULL;
    
    if (create) {
        // Initialize the structure
        memset(shm, 0, sizeof(*shm));
        shm->version = SHM_VERSION;
        shm->size = sizeof(shared_region_t);
        shm->magic = 0xDEADBEEF;
        
        // Initialize mutex for multi-process
        pthread_mutexattr_t mattr;
        pthread_mutexattr_init(&mattr);
        pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
        pthread_mutex_init(&shm->mutex, &mattr);
        pthread_mutexattr_destroy(&mattr);
        
        // Initialize condvar similarly...
    } else {
        // Validate existing structure
        if (shm->magic != 0xDEADBEEF) {
            fprintf(stderr, "Shared memory corrupted!\n");
            munmap(shm, sizeof(*shm));
            return NULL;
        }
        if (shm->version != SHM_VERSION) {
            fprintf(stderr, "Version mismatch: expected %d, got %d\n",
                    SHM_VERSION, shm->version);
            munmap(shm, sizeof(*shm));
            return NULL;
        }
    }
    
    return shm;
}

Use Libraries When Possible

Writing correct shared memory code is hard. Consider using established libraries like Boost.Interprocess (C++), shared_memory (Rust), or similar. These handle the tricky details and provide tested, cross-platform implementations of common patterns.

Summary: The Shared Memory Model

We've explored the shared memory model for IPC in depth. Let's consolidate the key insights:

Key Takeaways

•Shared memory maps physical memory into multiple address spaces — Different processes see the same data at (usually) different virtual addresses, but it's the same underlying memory.
•Three APIs exist: System V (shmget/shmat), POSIX (shm_open/mmap), and memory-mapped files. POSIX is recommended for new development.
•Synchronization is mandatory — Shared memory provides zero automatic synchronization. You must use mutexes, semaphores, or atomic operations explicitly.
•Performance is unmatched — Zero-copy data sharing at memory bandwidth speeds, but only if synchronization doesn't become a bottleneck.
•Common patterns exist — Producer-consumer with circular buffers, SeqLocks for read-heavy workloads, and double-buffering for streaming data.
•Pointers don't work — Store offsets, not pointers. Use versioning and magic numbers to detect structure mismatches.
•Complexity is the cost — The performance advantage must be weighed against debugging difficulty and potential for subtle bugs.

What's Next: Message Passing Model

Shared memory represents one fundamental IPC paradigm—processes communicate by reading and writing common memory. The next page explores the contrasting approach: message passing, where processes exchange discrete messages through kernel-managed channels. Message passing provides automatic synchronization and cleaner semantics, trading some performance for safety and simplicity.

Page Complete

You now understand the shared memory model for IPC—its mechanisms, APIs, synchronization requirements, performance characteristics, and common patterns. This is the fastest IPC mechanism available, but it demands careful use. Next, we'll explore how message passing offers a different trade-off between performance and safety.

2 / 5

Loading learning content...

Operating SystemsInter-Process Communication

IPC Overview

LevelIntermediate

Duration60 mins

TopicInter-Process Communication

2 / 5

Shared Memory Model

Communicating Through Common Ground

What You Will Learn

The Shared Memory Concept

Virtual to Physical Mapping

Shared memory works by having multiple processes' page tables point to the same physical pages:

shared_memory_mapping.txt
Virtual Address Space              Physical Memory
═══════════════════════════        ═══════════════
 
Process A                          Physical RAM
┌─────────────────────┐            ┌─────────────────────┐
│ ...                 │            │                     │
├─────────────────────┤            │                     │
│ 0x7fff00000000      │──┐         │                     │
│ [Shared Region]     │  │   ┌────►│ Physical Page 0x1A3 │◄────┐
│ 4096 bytes          │  │   │     │ [Shared Data Here]  │     │
├─────────────────────┤  │   │     │                     │     │
│ ...                 │  │   │     │                     │     │
└─────────────────────┘  │   │     │                     │     │
                         │   │     │                     │     │
    Page Table A         │   │     └─────────────────────┘     │
    ┌────────────────┐   │   │                                 │
    │ VPN 0x7fff0000 ├───┴───┘                                 │
    │ → PFN 0x1A3    │                                         │
    └────────────────┘                                         │
                                                               │
Process B                              Page Table B            │
┌─────────────────────┐               ┌────────────────┐       │
│ ...                 │               │ VPN 0x4000000  ├───────┘
├─────────────────────┤               │ → PFN 0x1A3    │
│ 0x40000000          │──────────────►└────────────────┘
│ [Shared Region]     │
│ 4096 bytes          │
├─────────────────────┤    Same physical page (0x1A3) is
│ ...                 │    mapped at different virtual
└─────────────────────┘    addresses in each process!

Key Observations:

Same physical memory, different virtual addresses: Process A sees the shared region at 0x7fff00000000; Process B sees it at 0x40000000. The virtual addresses differ, but the physical page is identical.
No kernel involvement for access: Once mapped, reading and writing the shared region is just normal memory access. The CPU's memory bus handles it directly—no system call overhead.
Cache coherency is automatic: Modern multi-core CPUs maintain cache coherency through hardware protocols (like MESI). When one core writes to shared memory, other cores see the update (eventually—we'll discuss memory ordering later).
Explicit setup required: Unlike normal process memory, shared memory must be explicitly created and attached. This is where the IPC system calls come in.

Persistence of Shared Memory

Creating Shared Memory: APIs and Patterns

There are three primary mechanisms for creating shared memory in Unix-like systems:

System V Shared Memory (shmget, shmat)
POSIX Shared Memory (shm_open, mmap)
Memory-Mapped Files (open, mmap with MAP_SHARED)

Each has distinct characteristics. Let's examine them in detail.

System V shared memory is the oldest IPC shared memory mechanism, dating to Unix System V in 1983. It remains widely supported and is still used in many production systems.

API Overview:

sysv_shared_memory.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <sys/ipc.h>
#include <sys/shm.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
 
int main() {
    // Step 1: Generate a key
    // ftok() creates a key from a file path and project ID
    // The file must exist; the path and ID together create a unique key
    key_t key = ftok("/tmp/myapp", 'A');
    if (key == -1) {
        perror("ftok failed");
        exit(1);
    }
    
    // Step 2: Create or get the shared memory segment
    // Arguments: key, size in bytes, flags
    // IPC_CREAT: create if doesn't exist
    // IPC_EXCL: fail if already exists (use with IPC_CREAT for exclusive create)
    // 0666: permissions (read/write for owner, group, others)
    int shmid = shmget(key, 4096, IPC_CREAT | 0666);
    if (shmid == -1) {
        perror("shmget failed");
        exit(1);
    }
    printf("Created/accessed shared memory segment ID: %d\n", shmid);
    
    // Step 3: Attach the segment to our address space
    // shmat() maps the segment into our virtual address space
    // Arguments: segment ID, preferred address (NULL = let kernel choose), flags
    // Returns the address where the segment is attached
    void *addr = shmat(shmid, NULL, 0);
    if (addr == (void*)-1) {
        perror("shmat failed");
        exit(1);
    }
    printf("Attached at address: %p\n", addr);
    
    // Step 4: Use the shared memory like regular memory
    strcpy((char*)addr, "Hello from Process A!");
    
    // Step 5: Detach when done (optional before exit, but good practice)
    // This only removes mapping from this process; segment still exists
    if (shmdt(addr) == -1) {
        perror("shmdt failed");
        exit(1);
    }
    
    // Step 6: Optionally remove the segment entirely
    // Only do this when no processes need it anymore
    // if (shmctl(shmid, IPC_RMID, NULL) == -1) {
    //     perror("shmctl IPC_RMID failed");
    // }
    
    return 0;
}

System V Shared Memory Characteristics

•Numeric keys: Identified by key_t values generated via ftok() or manually
•Persistent until deleted: Survives process termination; must use shmctl(shmid, IPC_RMID, NULL) to remove
•Fixed size at creation: Size is set when segment is created and cannot be resized
•System-wide limits: /proc/sys/kernel/shmmax, /proc/sys/kernel/shmmni control max sizes
•ipcs/ipcrm commands: List (ipcs -m) and remove (ipcrm -m shmid) segments from shell

Choosing Between Mechanisms

The Synchronization Imperative

This section is critical. More bugs in shared memory systems come from synchronization failures than from the actual mapping code.

The Fundamental Danger

Why Is Synchronization Necessary?

Consider a simple counter shared between two processes:

race_condition_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Shared memory contains:
struct shared_data {
    int counter;  // Both processes increment this
};
 
// Process A and Process B both run:
void increment_counter(struct shared_data *shared) {
    shared->counter = shared->counter + 1;  // Looks atomic, but ISN'T!
}
 
// This simple line compiles to multiple CPU instructions:
// 1. LOAD  shared->counter from memory into register
// 2. ADD   1 to the register
// 3. STORE register value back to shared->counter
 
// Race condition scenario:
// Time   Process A                   Process B                   Memory
// ─────────────────────────────────────────────────────────────────────
//  t0    counter = 0                                              0
//  t1    LOAD counter (=0)                                        0
//  t2                                LOAD counter (=0)            0
//  t3    ADD 1 (register=1)                                       0
//  t4                                ADD 1 (register=1)           0
//  t5    STORE counter (=1)                                       1
//  t6                                STORE counter (=1)           1  ← BUG!
//
// Expected: counter = 2 (incremented twice)
// Actual:   counter = 1 (one increment lost!)

Memory Ordering and Visibility

The problem goes deeper than race conditions. Modern CPUs and compilers reorder memory operations for performance. Without proper memory barriers:

Writes may not be immediately visible: Process A writes to location X, then Y. Process B might see the new Y but the old X.
Reads may return stale values: Process B's CPU cache might hold an old copy of memory that hasn't been invalidated yet.
Compilers reorder code: The compiler might reorder statements for optimization, breaking assumptions about execution order.

Synchronization Primitives for Shared Memory:

To use shared memory safely, you need synchronization primitives:

Synchronization Primitives for Shared Memory
Primitive	Mechanism	Use Case	API
Semaphore	Integer counter with atomic decrement/increment	Counting resources, binary mutex behavior	POSIX: `sem_open()`, `sem_wait()`, `sem_post()`
Mutexes	Binary lock for mutual exclusion	Protecting critical sections in shared memory	`pthread_mutex_t` with `PTHREAD_PROCESS_SHARED`
Condition Variables	Wait for a condition with mutex protection	Producer-consumer, event notification	`pthread_cond_t` with `PTHREAD_PROCESS_SHARED`
Read-Write Locks	Multiple readers or one writer	Read-heavy workloads with occasional writes	`pthread_rwlock_t` with `PTHREAD_PROCESS_SHARED`
Spin Locks	Busy-wait lock for short critical sections	Very short operations, low contention	Atomic operations or `pthread_spin_lock`

synchronized_shared_memory.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <fcntl.h>
#include <sys/mman.h>
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
 
// Shared memory structure with embedded synchronization
typedef struct {
    pthread_mutex_t mutex;  // MUST be in shared memory too!
    int counter;
} shared_data_t;
 
int main() {
    // Create shared memory
    int fd = shm_open("/safe_shared", O_CREAT | O_RDWR, 0666);
    ftruncate(fd, sizeof(shared_data_t));
    
    shared_data_t *shared = mmap(NULL, sizeof(shared_data_t),
                                  PROT_READ | PROT_WRITE,
                                  MAP_SHARED, fd, 0);
    close(fd);
    
    // Initialize mutex with PTHREAD_PROCESS_SHARED attribute
    // This is CRITICAL - default mutexes only work within one process!
    pthread_mutexattr_t attr;
    pthread_mutexattr_init(&attr);
    pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
    pthread_mutex_init(&shared->mutex, &attr);
    pthread_mutexattr_destroy(&attr);
    
    shared->counter = 0;
    
    // Safe increment - works correctly across processes
    pthread_mutex_lock(&shared->mutex);
    shared->counter++;  // Protected by mutex
    pthread_mutex_unlock(&shared->mutex);
    
    printf("Counter: %d\n", shared->counter);
    
    // Cleanup...
    // Note: Must coordinate mutex destruction across all processes!
    
    return 0;
}

PTHREAD_PROCESS_SHARED is Essential

Common Shared Memory Patterns

Certain patterns appear repeatedly in shared memory systems. Understanding these patterns helps you design robust shared memory IPC.

Pattern 1: Producer-Consumer with Circular Buffer

A classic pattern where one process writes data and another reads it, using a fixed-size buffer:

circular_buffer_shm.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <stdatomic.h>
#include <stdint.h>
 
#define BUFFER_SIZE 1024    // Must be power of 2 for efficient modulo
#define BUFFER_MASK (BUFFER_SIZE - 1)
 
typedef struct {
    // Use atomics for lock-free single-producer single-consumer
    atomic_uint write_idx;  // Only producer modifies
    atomic_uint read_idx;   // Only consumer modifies
    
    // Padding to avoid false sharing (cache line is typically 64 bytes)
    char pad1[64 - sizeof(atomic_uint)];
    char pad2[64 - sizeof(atomic_uint)];
    
    // The actual buffer
    uint8_t data[BUFFER_SIZE];
} spsc_queue_t;
 
// Producer side (only one producer!)
int push(spsc_queue_t *q, uint8_t value) {
    unsigned int write = atomic_load_explicit(&q->write_idx, memory_order_relaxed);
    unsigned int read = atomic_load_explicit(&q->read_idx, memory_order_acquire);
    
    // Check if buffer is full
    if (((write + 1) & BUFFER_MASK) == (read & BUFFER_MASK)) {
        return -1;  // Buffer full
    }
    
    // Write data
    q->data[write & BUFFER_MASK] = value;
    
    // Publish write (release semantics ensure data is visible before index update)
    atomic_store_explicit(&q->write_idx, write + 1, memory_order_release);
    
    return 0;
}
 
// Consumer side (only one consumer!)
int pop(spsc_queue_t *q, uint8_t *value) {
    unsigned int read = atomic_load_explicit(&q->read_idx, memory_order_relaxed);
    unsigned int write = atomic_load_explicit(&q->write_idx, memory_order_acquire);
    
    // Check if buffer is empty
    if ((read & BUFFER_MASK) == (write & BUFFER_MASK)) {
        return -1;  // Buffer empty
    }
    
    // Read data
    *value = q->data[read & BUFFER_MASK];
    
    // Publish read (release semantics)
    atomic_store_explicit(&q->read_idx, read + 1, memory_order_release);
    
    return 0;
}

Pattern 2: Read-Heavy Shared State with SeqLock

When readers vastly outnumber writers, a sequence lock provides excellent performance:

seqlock_pattern.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <stdatomic.h>
 
typedef struct {
    atomic_uint sequence;  // Even when unlocked, odd during write
    
    // Protected data - can be any structure
    int x;
    int y;
    int z;
} seqlock_data_t;
 
// Writer (must have external mutual exclusion between writers)
void write_data(seqlock_data_t *data, int x, int y, int z) {
    unsigned int seq = atomic_load(&data->sequence);
    atomic_store(&data->sequence, seq + 1);  // Mark as being written (odd)
    atomic_thread_fence(memory_order_release);
    
    // Write the data
    data->x = x;
    data->y = y;
    data->z = z;
    
    atomic_thread_fence(memory_order_release);
    atomic_store(&data->sequence, seq + 2);  // Mark complete (even again)
}
 
// Reader (multiple readers can proceed simultaneously)
int read_data(seqlock_data_t *data, int *x, int *y, int *z) {
    unsigned int seq1, seq2;
    
    do {
        seq1 = atomic_load_explicit(&data->sequence, memory_order_acquire);
        if (seq1 & 1) continue;  // Writer in progress, retry
        
        // Read the data
        *x = data->x;
        *y = data->y;
        *z = data->z;
        
        atomic_thread_fence(memory_order_acquire);
        seq2 = atomic_load_explicit(&data->sequence, memory_order_acquire);
    } while (seq1 != seq2);  // Retry if writer modified during read
    
    return 0;
}
 
// Readers never block writers, and writers never block readers!
// Readers may retry if a write occurs during their read.

Other Common Patterns

•Double Buffering: Two buffers alternate roles. Producer writes to one while consumer reads from the other. A single atomic pointer or flag indicates which buffer is active. Minimizes contention for graphics, audio, and sensor data.
•Shared Configuration: Read-only configuration data for multiple processes. One process updates (rarely), others read (frequently). RCU (Read-Copy-Update) or SeqLock patterns work well.
•Shared Cache/Memoization: Multiple processes share computed results. Hash tables in shared memory with appropriate locking per bucket or lock striping.
•Shared Counters/Statistics: Aggregated statistics across processes. Per-process counters with periodic aggregation, or atomic operations for low-contention counters.

Avoid Pointers in Shared Memory

Performance Characteristics

Shared memory is the fastest IPC mechanism, but understanding why requires analyzing its performance characteristics.

Shared Memory vs. Other IPC Performance
Metric	Shared Memory	Pipes/Message Queues	Reason
Setup cost	Higher (mmap, shm_open)	Lower (pipe, socketpair)	Memory mapping has more kernel work
Per-operation cost	~0 (memory access)	System call overhead (1-10μs)	No kernel involvement for shm access
Data copy overhead	Zero copy	2 copies (user→kernel→user)	Data written directly to shared region
Synchronization	Explicit (your responsibility)	Implicit (kernel handles)	Shared memory requires manual sync
Cache behavior	Best (data in CPU cache)	Worse (kernel buffers involved)	Shared memory leverages CPU cache
Max throughput	Memory bandwidth (~50-100 GB/s)	Limited by copy bandwidth (~1-5 GB/s)	Direct memory access vs. copying

When Shared Memory Excels:

Large Data Transfers: Sending a 100MB data structure? With pipes, you copy 100MB into the kernel, then 100MB out—200MB of memory bandwidth consumed. With shared memory, the receiver accesses the same physical memory—zero copies.
High-Frequency Communication: If processes exchange data 100,000 times per second, the ~2μs system call overhead per pipe operation adds up to 200ms of pure overhead per second. Shared memory operations are memory accesses taking nanoseconds.
Random Access Patterns: Need to read just one field from a large shared structure? Shared memory accesses exactly those bytes. Pipes must send entire messages.
Multiple Readers: Multiple processes can simultaneously read shared memory without interference (with appropriate locking). Pipe data can only be read once.

Benchmark Example:

ipc_benchmark_results.txt
# Benchmark: Transfer 1 million 4KB messages between two processes
# (Single-threaded producer and consumer, Linux x86_64, pinned to same NUMA node)
 
Mechanism              Throughput        Latency (avg)    CPU Utilization
═══════════════════════════════════════════════════════════════════════════
Shared Memory (SPSC)   18.2 GB/s         ~50 ns           12% (spin wait)
Unix Domain Socket     2.3 GB/s          ~1.7 μs          48%
TCP Socket (loopback)  1.8 GB/s          ~2.2 μs          62%
Pipe                   1.9 GB/s          ~2.0 μs          55%
POSIX Message Queue    0.8 GB/s          ~4.8 μs          78%
 
# Notes:
# - Shared memory is 8-20x faster in throughput
# - Shared memory latency is ~40x lower
# - Socket/pipe CPU usage is higher due to system call overhead
# - Shared memory spins waiting for data (configurable)
 
# For mixed or bursty workloads, differences may be less dramatic due to:
# - CPU cache effects
# - Synchronization contention
# - Application processing time dominating IPC time

The Hidden Costs:

Shared memory's raw performance can be misleading. Consider these hidden costs:

Synchronization overhead: Mutex locks, especially contended ones, can add microseconds of latency
False sharing: If two processes access different variables that happen to be on the same cache line, cache invalidation traffic kills performance
Memory ordering complexity: Incorrect use of atomics or missing memory barriers causes subtle bugs that can take weeks to diagnose
Debugging difficulty: Shared memory bugs don't produce nice error messages. Data corruption manifests as seemingly random failures far from the actual bug.

The Complexity Cost

Real-World Applications

Shared memory is used extensively in performance-critical systems. Understanding these real-world applications illustrates when shared memory is the right choice.

Production Use Cases

•PostgreSQL Shared Buffers: PostgreSQL uses a large shared memory segment to cache database pages. Multiple backend processes read and write this buffer cache, coordinated by lock managers. This avoids duplicating cached pages in each process's memory and ensures consistent buffer state.
•Chromium/Chrome Mojo: Chrome's multi-process architecture uses Mojo IPC, which often transfers large data (images, compiled code) via shared memory regions. The V8 JavaScript engine shares compiled code across renderer processes this way.
•Video/Audio Processing: Video editors, music production software, and real-time video processing tools use shared memory to pass frames between processing stages. At 4K 60fps, you're passing 1.4GB/s of data—shared memory is often the only viable option.
•Trading Systems: High-frequency trading platforms use shared memory for ultra-low-latency market data distribution. One process receives market data and writes to shared memory; multiple strategy processes read it with minimal latency.
•Scientific Computing: MPI (Message Passing Interface) implementations use shared memory for intra-node communication when processes are on the same machine, falling back to network for inter-node. This gives the best of both worlds.
•Game Engines: Multi-threaded game engines share world state, physics simulation results, and rendered frames between subsystems. While technically threads (not processes), the patterns are identical.

Case Study: PostgreSQL Shared Memory Architecture

PostgreSQL's use of shared memory is a master class in practical shared memory design:

postgresql_shared_memory.txt
┌──────────────────────────────────────────────────────────────────────┐
│                     PostgreSQL Shared Memory                          │
├──────────────────────────────────────────────────────────────────────┤
│  Postmaster Process (main)                                            │
│  ┌──────────────────────────────────────────────────────────────────┐│
│  │            Creates and initializes shared memory                 ││
│  └──────────────────────────────────────────────────────────────────┘│
│                              │                                        │
│                              ▼                                        │
│  ┌──────────────────────────────────────────────────────────────────┐│
│  │                    Shared Memory Segment                         ││
│  │ ┌────────────────────────────────────────────────────────────┐  ││
│  │ │  Shared Buffers (shared_buffers config parameter)          │  ││
│  │ │  - Cached database pages (typically 25% of RAM)            │  ││
│  │ │  - Buffer descriptors with pin counts, dirty flags         │  ││
│  │ │  - Protected by buffer manager locks                       │  ││
│  │ └────────────────────────────────────────────────────────────┘  ││
│  │ ┌────────────────────────────────────────────────────────────┐  ││
│  │ │  Lock Tables                                               │  ││
│  │ │  - Row locks, table locks, advisory locks                  │  ││
│  │ │  - Fast path for common cases                              │  ││
│  │ └────────────────────────────────────────────────────────────┘  ││
│  │ ┌────────────────────────────────────────────────────────────┐  ││
│  │ │  WAL Buffers (Write-Ahead Log)                             │  ││
│  │ │  - Transaction log before disk write                       │  ││
│  │ │  - Protected by WAL insert locks                           │  ││
│  │ └────────────────────────────────────────────────────────────┘  ││
│  │ ┌────────────────────────────────────────────────────────────┐  ││
│  │ │  Various Tables: CLOG, Subtrans, Proc Array, PGXACT...     │  ││
│  │ └────────────────────────────────────────────────────────────┘  ││
│  └──────────────────────────────────────────────────────────────────┘│
│               ▲              ▲             ▲                          │
│     ┌─────────┴───┐  ┌───────┴────┐  ┌─────┴──────┐                  │
│     │  Backend 1  │  │  Backend 2  │  │  Backend N  │                 │
│     │  (Client 1) │  │  (Client 2) │  │  (Client N) │                 │
│     └─────────────┘  └────────────┘  └────────────┘                  │
│     Each backend attaches to the same shared memory segment          │
└──────────────────────────────────────────────────────────────────────┘
 
Why PostgreSQL uses shared memory:
1. Buffer cache shared across all connections (no duplication)
2. Lock visibility - all processes see same lock state
3. Transaction status visible to all (MVCC coordination)
4. Extremely high-frequency access (every query touches shared_buffers)

PostgreSQL's Synchronization

Pitfalls and Best Practices

Shared memory's power comes with serious pitfalls. Here are the most common mistakes and how to avoid them.

Common Pitfalls

•Storing pointers: Virtual addresses differ between processes; pointers become garbage
•Forgetting PTHREAD_PROCESS_SHARED: Mutexes default to intra-process only
•Resource leaks: System V segments persist until explicit removal; forgetting cleanup wastes memory indefinitely
•Insufficient synchronization: Race conditions are silent killers
•False sharing: Unrelated data on same cache line causes invalidation storms
•Size mismatches: One process maps 4KB, another maps 8KB—undefined behavior when accessing past the smaller size

Best Practices

•Use offsets: Store offsets from base address, not pointers
•Always set process-shared: Explicitly initialize mutex attributes for multi-process
•Design cleanup handlers: Use shm_unlink, shmctl(IPC_RMID), or signal handlers
•Use proven patterns: Prefer established patterns like SPSC queues or SeqLocks
•Pad to cache lines: Align frequently-written data to 64-byte boundaries
•Version the structure: Include a version number to detect layout mismatches

shared_memory_best_practices.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
// Best Practice: Versioned, self-describing shared memory structure
 
#define SHM_VERSION 1
#define CACHE_LINE_SIZE 64
 
typedef struct {
    // Header - always at the start
    uint32_t version;       // Detect structure mismatches
    uint32_t size;          // Total size for validation
    uint32_t magic;         // 0xDEADBEEF to detect corruption
    
    // Synchronization primitives
    pthread_mutex_t mutex;
    pthread_cond_t cond;
    
    // Padding to align data to cache line
    char _pad[CACHE_LINE_SIZE - (sizeof(uint32_t)*3 + 
              sizeof(pthread_mutex_t) + sizeof(pthread_cond_t)) % CACHE_LINE_SIZE];
    
    // Application data - aligned to cache line
    struct {
        // Hot data that one process writes, others read
        // Separate cache lines for different writers!
        _Alignas(CACHE_LINE_SIZE) int producer_counter;
        _Alignas(CACHE_LINE_SIZE) int consumer_counter;
        
        // Bulk data
        _Alignas(CACHE_LINE_SIZE) char data[4096];
    } payload;
} shared_region_t;
 
_Static_assert(sizeof(shared_region_t) % CACHE_LINE_SIZE == 0,
               "shared_region_t must be cache-line aligned");
 
// Initialization with validation
shared_region_t* init_shared(const char *name, int create) {
    int flags = O_RDWR | (create ? O_CREAT | O_EXCL : 0);
    int fd = shm_open(name, flags, 0666);
    if (fd < 0) return NULL;
    
    if (create) {
        ftruncate(fd, sizeof(shared_region_t));
    }
    
    shared_region_t *shm = mmap(NULL, sizeof(shared_region_t),
                                 PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    close(fd);
    
    if (shm == MAP_FAILED) return NULL;
    
    if (create) {
        // Initialize the structure
        memset(shm, 0, sizeof(*shm));
        shm->version = SHM_VERSION;
        shm->size = sizeof(shared_region_t);
        shm->magic = 0xDEADBEEF;
        
        // Initialize mutex for multi-process
        pthread_mutexattr_t mattr;
        pthread_mutexattr_init(&mattr);
        pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
        pthread_mutex_init(&shm->mutex, &mattr);
        pthread_mutexattr_destroy(&mattr);
        
        // Initialize condvar similarly...
    } else {
        // Validate existing structure
        if (shm->magic != 0xDEADBEEF) {
            fprintf(stderr, "Shared memory corrupted!\n");
            munmap(shm, sizeof(*shm));
            return NULL;
        }
        if (shm->version != SHM_VERSION) {
            fprintf(stderr, "Version mismatch: expected %d, got %d\n",
                    SHM_VERSION, shm->version);
            munmap(shm, sizeof(*shm));
            return NULL;
        }
    }
    
    return shm;
}

Use Libraries When Possible

Summary: The Shared Memory Model

We've explored the shared memory model for IPC in depth. Let's consolidate the key insights:

Key Takeaways

•Shared memory maps physical memory into multiple address spaces — Different processes see the same data at (usually) different virtual addresses, but it's the same underlying memory.
•Three APIs exist: System V (shmget/shmat), POSIX (shm_open/mmap), and memory-mapped files. POSIX is recommended for new development.
•Synchronization is mandatory — Shared memory provides zero automatic synchronization. You must use mutexes, semaphores, or atomic operations explicitly.
•Performance is unmatched — Zero-copy data sharing at memory bandwidth speeds, but only if synchronization doesn't become a bottleneck.
•Common patterns exist — Producer-consumer with circular buffers, SeqLocks for read-heavy workloads, and double-buffering for streaming data.
•Pointers don't work — Store offsets, not pointers. Use versioning and magic numbers to detect structure mismatches.
•Complexity is the cost — The performance advantage must be weighed against debugging difficulty and potential for subtle bugs.

What's Next: Message Passing Model

Page Complete

2 / 5