Loading learning content...
Shared memory is seductively simple: map the same physical memory into multiple processes and let them communicate directly. No system calls, no kernel involvement, no copying—just raw memory access at full speed. But this simplicity conceals a fundamental problem: the kernel provides no synchronization.
When two processes (or threads) access the same memory location, and at least one is writing, you have a data race. The outcome depends on the precise timing of operations—timing that varies with CPU load, scheduling decisions, and even temperature-induced clock variations. Your code might work perfectly in testing and fail catastrophically in production, or vice versa.
This page explores why synchronization is essential for shared memory, what can go wrong without it, and the mechanisms available to coordinate concurrent access safely.
By the end of this page, you will understand: (1) Why unsynchronized shared memory access is dangerous, (2) Memory ordering and visibility guarantees (or lack thereof), (3) Atomic operations and their limitations, (4) POSIX semaphores for shared memory, (5) Process-shared mutexes and condition variables, and (6) Patterns for reader-writer access.
A race condition occurs when the correctness of a program depends on the relative timing of operations. With shared memory, race conditions manifest in several insidious ways:
1. Lost Updates:
Two processes read a counter, increment it, and write back. Without atomicity, both might read the same value and write the same incremented value—one update is lost.
Time Process A Process B Counter Value
---- --------- --------- -------------
1 Read counter (0) 0
2 Read counter (0) 0
3 Increment (→1) 0
4 Increment (→1) 0
5 Write counter (1) 1
6 Write counter (1) 1
Expected: 2, Actual: 1 — One update lost!
2. Torn Reads/Writes:
When reading or writing data larger than the CPU's natural word size, the operation may not be atomic. A 64-bit value on a 32-bit system requires two memory operations. If a reader interleaves with a writer, it might see half of the old value and half of the new.
// Writer stores 0x0000000100000001
uint64_t value = 0x0000000100000001;
// Process A writes high word first: [NEW_HIGH][OLD_LOW]
// Process B reads in between: sees 0x0000000100000000 (corrupted!)
// Process A writes low word: [NEW_HIGH][NEW_LOW]
3. Inconsistent State:
Complex data structures often have invariants (e.g., a linked list's prev/next pointers must be consistent). A reader observing a partially-completed update sees an inconsistent state, leading to crashes or incorrect behavior.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
#include <stdio.h>#include <stdlib.h>#include <string.h>#include <fcntl.h>#include <sys/mman.h>#include <sys/wait.h>#include <unistd.h> #define SHM_NAME "/race_demo"#define SHM_SIZE sizeof(SharedCounter)#define ITERATIONS 100000 /** * Demonstrates a race condition in shared memory. * Two processes increment a counter without synchronization. * Expected result: 2 * ITERATIONS = 200000 * Actual result: Less than expected due to lost updates. */ typedef struct { volatile int counter; // volatile doesn't prevent races!} SharedCounter; void increment_counter(SharedCounter *shared, int iterations) { for (int i = 0; i < iterations; i++) { // This is NOT atomic! It's: load → add → store int temp = shared->counter; // Read temp = temp + 1; // Modify shared->counter = temp; // Write // Even this "simpler" version is not atomic: // shared->counter++; // The compiler generates: load, add, store }} int main() { int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666); ftruncate(shm_fd, SHM_SIZE); SharedCounter *shared = mmap(NULL, SHM_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, shm_fd, 0); close(shm_fd); // Initialize shared->counter = 0; printf("Expected final counter: %d\n", 2 * ITERATIONS); printf("Starting race condition test...\n\n"); pid_t pid = fork(); if (pid == 0) { // Child process increment_counter(shared, ITERATIONS); munmap(shared, SHM_SIZE); exit(0); } // Parent process increment_counter(shared, ITERATIONS); // Wait for child waitpid(pid, NULL, 0); printf("Actual final counter: %d\n", shared->counter); printf("Lost updates: %d\n", (2 * ITERATIONS) - shared->counter); if (shared->counter < 2 * ITERATIONS) { printf("\n⚠️ Race condition demonstrated!\n"); printf("This is why synchronization is REQUIRED.\n"); } munmap(shared, SHM_SIZE); shm_unlink(SHM_NAME); return 0;}The 'volatile' keyword tells the compiler not to optimize away memory accesses—useful for memory-mapped hardware registers. But it provides NO atomicity, NO memory ordering between different processes or CPUs, and NO protection against race conditions. It is NOT a synchronization primitive.
Beyond race conditions on individual values, modern processors introduce another layer of complexity: memory reordering. Both compilers and CPUs reorder memory operations for performance, and these reorderings can break assumptions in multi-process code.
Compiler Reordering:
Compilers are free to reorder memory accesses if the single-threaded behavior is unchanged. Consider:
data = 42; // Write data
ready = 1; // Signal that data is ready
The compiler might reorder these to:
ready = 1; // Oops! Signaled before data was written
data = 42;
A reader checking ready would see stale data.
CPU Reordering:
Even if the compiler preserves order, the CPU may reorder stores and loads:
Memory Barriers:
To enforce ordering, you need memory barriers (also called fences). These instructions ensure that:
// GCC/Clang compiler barrier
asm volatile("" ::: "memory");
// Full hardware memory barrier (x86)
asm volatile("mfence" ::: "memory");
// C11/C++11 atomic with release/acquire semantics
atomic_store_explicit(&ready, 1, memory_order_release);
// ...other CPU...
if (atomic_load_explicit(&ready, memory_order_acquire)) {
// Data written before the release is now visible
}
Memory barriers and ordering are complex, subtle, and error-prone. Instead of using them directly, prefer high-level synchronization primitives (mutexes, semaphores) that encapsulate the correct barriers. Only use explicit atomics and barriers when absolutely necessary for performance, and document the reasoning extensively.
Atomic operations are indivisible—they complete entirely or not at all, with no observable intermediate state. Modern CPUs provide hardware support for atomic read-modify-write operations on aligned words.
C11 Atomics:
C11 introduced the <stdatomic.h> header with portable atomic types and operations:
#include <stdatomic.h>
atomic_int counter; // Atomic integer
// Atomic operations
atomic_fetch_add(&counter, 1); // Atomically: counter += 1, return old value
atomic_compare_exchange_strong(&x, &expected, desired); // CAS
atomic_load(&counter); // Atomic read
atomic_store(&counter, 42); // Atomic write
| Operation | Function | Semantics |
|---|---|---|
| Load | atomic_load(&x) | Read value atomically |
| Store | atomic_store(&x, val) | Write value atomically |
| Exchange | atomic_exchange(&x, val) | Write and return old value |
| Fetch-Add | atomic_fetch_add(&x, val) | Add and return old value |
| Fetch-Sub | atomic_fetch_sub(&x, val) | Subtract and return old value |
| Fetch-Or | atomic_fetch_or(&x, val) | Bitwise OR and return old |
| Fetch-And | atomic_fetch_and(&x, val) | Bitwise AND and return old |
| CAS | atomic_compare_exchange_strong(&x, &expected, desired) | If x==expected, x=desired; else expected=x |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
#include <stdio.h>#include <stdlib.h>#include <stdatomic.h>#include <fcntl.h>#include <sys/mman.h>#include <sys/wait.h>#include <unistd.h> #define SHM_NAME "/atomic_demo"#define ITERATIONS 100000 /** * Demonstrates correct atomic counter increment in shared memory. * Compare to the race condition demo—this version produces correct results. */ typedef struct { atomic_int counter;} SharedAtomicCounter; void atomic_increment(SharedAtomicCounter *shared, int iterations) { for (int i = 0; i < iterations; i++) { // Correct way: atomic fetch-and-add atomic_fetch_add(&shared->counter, 1); // Alternative with explicit memory ordering: // atomic_fetch_add_explicit(&shared->counter, 1, memory_order_relaxed); }} int main() { int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666); ftruncate(shm_fd, sizeof(SharedAtomicCounter)); SharedAtomicCounter *shared = mmap(NULL, sizeof(SharedAtomicCounter), PROT_READ | PROT_WRITE, MAP_SHARED, shm_fd, 0); close(shm_fd); // Initialize (before forking!) atomic_init(&shared->counter, 0); printf("Expected final counter: %d\n", 2 * ITERATIONS); printf("Starting atomic counter test...\n\n"); pid_t pid = fork(); if (pid == 0) { atomic_increment(shared, ITERATIONS); munmap(shared, sizeof(SharedAtomicCounter)); exit(0); } atomic_increment(shared, ITERATIONS); waitpid(pid, NULL, 0); int final = atomic_load(&shared->counter); printf("Actual final counter: %d\n", final); if (final == 2 * ITERATIONS) { printf("\n✓ Perfect! Atomics prevent race conditions.\n"); } else { printf("\n⚠️ Unexpected: check if atomic is lock-free on this system.\n"); } munmap(shared, sizeof(SharedAtomicCounter)); shm_unlink(SHM_NAME); return 0;}C11 atomics may use locks internally if the type isn't natively atomic on your platform. Check with atomic_is_lock_free(&x). Lock-based 'atomics' are NOT safe for shared memory between unrelated processes—the lock is process-local! For shared memory, verify lock-free status or use POSIX semaphores instead.
Limitations of Atomics:
Single-word operations only: Atomics work for individual variables. You cannot atomically update two related fields (e.g., a pointer and a size).
No waiting: Atomics provide no mechanism to wait for a condition. Spinning (busy-waiting) wastes CPU cycles.
Complex algorithms needed: Lock-free algorithms using only atomics are notoriously difficult to design and verify.
For most shared memory use cases, higher-level primitives like semaphores or mutexes are more appropriate.
POSIX semaphores provide a robust, well-tested synchronization mechanism suitable for shared memory IPC. Two varieties exist:
Named Semaphores: Created with a name in the filesystem namespace (like POSIX shared memory). Any process knowing the name can access them.
#include <semaphore.h>
sem_t *sem_open(const char *name, int oflag, mode_t mode, unsigned int value);
int sem_close(sem_t *sem);
int sem_unlink(const char *name);
Unnamed Semaphores: Allocated in shared memory itself. Initialized with pshared=1 for inter-process use.
int sem_init(sem_t *sem, int pshared, unsigned int value);
int sem_destroy(sem_t *sem);
| Operation | Function | Behavior |
|---|---|---|
| Wait (P) | sem_wait(sem) | Decrement; block if value is 0 |
| Try Wait | sem_trywait(sem) | Decrement if >0; else return EAGAIN |
| Timed Wait | sem_timedwait(sem, &ts) | Wait with timeout |
| Post (V) | sem_post(sem) | Increment; wake one waiter |
| Get Value | sem_getvalue(sem, &val) | Read current counter value |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
#include <stdio.h>#include <stdlib.h>#include <string.h>#include <fcntl.h>#include <sys/mman.h>#include <sys/wait.h>#include <semaphore.h>#include <unistd.h> #define SHM_NAME "/sem_shm_demo"#define SEM_NAME "/sem_demo" /** * Demonstrates producer-consumer pattern using shared memory * and a named semaphore for synchronization. */ typedef struct { int data; int ready;} SharedBuffer; int main() { // Create shared memory int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666); ftruncate(shm_fd, sizeof(SharedBuffer)); SharedBuffer *buf = mmap(NULL, sizeof(SharedBuffer), PROT_READ | PROT_WRITE, MAP_SHARED, shm_fd, 0); close(shm_fd); // Create named semaphore // O_CREAT | O_EXCL: Create exclusively // Initial value 0: Consumer will wait sem_unlink(SEM_NAME); // Remove any stale semaphore sem_t *sem = sem_open(SEM_NAME, O_CREAT | O_EXCL, 0666, 0); if (sem == SEM_FAILED) { perror("sem_open"); exit(EXIT_FAILURE); } // Initialize buffer buf->data = 0; buf->ready = 0; pid_t pid = fork(); if (pid == 0) { // Child: Consumer printf("Consumer: waiting for data...\n"); // Wait on semaphore (blocks until producer posts) sem_wait(sem); printf("Consumer: received data = %d\n", buf->data); printf("Consumer: ready flag = %d\n", buf->ready); sem_close(sem); munmap(buf, sizeof(SharedBuffer)); exit(0); } // Parent: Producer sleep(1); // Simulate work before producing printf("Producer: writing data...\n"); buf->data = 42; buf->ready = 1; // Post semaphore (wakes consumer) sem_post(sem); printf("Producer: signaled consumer\n"); // Wait for child waitpid(pid, NULL, 0); // Cleanup (only parent) sem_close(sem); sem_unlink(SEM_NAME); munmap(buf, sizeof(SharedBuffer)); shm_unlink(SHM_NAME); printf("Clean exit\n"); return 0;}Unnamed Semaphores in Shared Memory:
For tighter coupling, you can embed the semaphore directly in the shared memory region:
typedef struct {
sem_t sem; // Semaphore embedded in shared memory
int data;
} SharedRegion;
// In creator process:
SharedRegion *region = /* mmap shared memory */;
sem_init(®ion->sem, 1, 0); // pshared=1 for inter-process
// Any process can now use:
sem_wait(®ion->sem);
sem_post(®ion->sem);
Advantages of embedded semaphores:
When using sem_init() for inter-process synchronization, the pshared parameter MUST be non-zero (typically 1). If pshared is 0, the semaphore is only valid for threads within a single process—using it across processes results in undefined behavior.
POSIX pthread mutexes and condition variables can be configured for inter-process use by embedding them in shared memory and setting the PTHREAD_PROCESS_SHARED attribute.
Configuring a Process-Shared Mutex:
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
pthread_mutex_t *mutex = /* pointer into shared memory */;
pthread_mutex_init(mutex, &attr);
pthread_mutexattr_destroy(&attr);
Once initialized, processes can lock/unlock the mutex with standard pthread calls.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
#include <stdio.h>#include <stdlib.h>#include <string.h>#include <fcntl.h>#include <sys/mman.h>#include <sys/wait.h>#include <pthread.h>#include <unistd.h> #define SHM_NAME "/mutex_demo"#define ITERATIONS 50000 /** * Demonstrates process-shared mutex protecting a counter in shared memory. */ typedef struct { pthread_mutex_t mutex; int counter;} SharedData; void increment_with_mutex(SharedData *shared, int iterations) { for (int i = 0; i < iterations; i++) { pthread_mutex_lock(&shared->mutex); shared->counter++; pthread_mutex_unlock(&shared->mutex); }} int main() { int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666); ftruncate(shm_fd, sizeof(SharedData)); SharedData *shared = mmap(NULL, sizeof(SharedData), PROT_READ | PROT_WRITE, MAP_SHARED, shm_fd, 0); close(shm_fd); // Initialize mutex with PTHREAD_PROCESS_SHARED pthread_mutexattr_t attr; pthread_mutexattr_init(&attr); pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); // Optionally make it robust (survives holder crash) pthread_mutexattr_setrobust(&attr, PTHREAD_MUTEX_ROBUST); pthread_mutex_init(&shared->mutex, &attr); pthread_mutexattr_destroy(&attr); shared->counter = 0; printf("Expected: %d\n", 2 * ITERATIONS); pid_t pid = fork(); if (pid == 0) { // Child increment_with_mutex(shared, ITERATIONS); munmap(shared, sizeof(SharedData)); exit(0); } // Parent increment_with_mutex(shared, ITERATIONS); waitpid(pid, NULL, 0); printf("Actual: %d\n", shared->counter); if (shared->counter == 2 * ITERATIONS) { printf("✓ Process-shared mutex works correctly!\n"); } pthread_mutex_destroy(&shared->mutex); munmap(shared, sizeof(SharedData)); shm_unlink(SHM_NAME); return 0;} /** * ROBUST MUTEX HANDLING: * * If a process holding a robust mutex dies, the next pthread_mutex_lock() * returns EOWNERDEAD. The new owner must call: * * pthread_mutex_consistent(&mutex); * * to mark the mutex as consistent again, then unlock normally. * This prevents permanent deadlock when a mutex holder crashes. * * Example: * * int rc = pthread_mutex_lock(&shared->mutex); * if (rc == EOWNERDEAD) { * // Previous owner died while holding the lock * // Check/repair shared data state here * pthread_mutex_consistent(&shared->mutex); * } * // ... use shared data ... * pthread_mutex_unlock(&shared->mutex); */Process-shared mutexes can be made 'robust' using pthread_mutexattr_setrobust(). A robust mutex can recover when its holder dies, returning EOWNERDEAD to the next locker. This is critical for shared memory systems where process crashes must not cause permanent deadlocks.
Process-Shared Condition Variables:
Condition variables work similarly:
pthread_condattr_t cond_attr;
pthread_condattr_init(&cond_attr);
pthread_condattr_setpshared(&cond_attr, PTHREAD_PROCESS_SHARED);
pthread_cond_t *cond = /* pointer into shared memory */;
pthread_cond_init(cond, &cond_attr);
pthread_condattr_destroy(&cond_attr);
// Usage:
pthread_mutex_lock(&mutex);
while (!condition) {
pthread_cond_wait(&cond, &mutex); // Atomically unlock and wait
}
// condition is true, mutex is held
pthread_mutex_unlock(&mutex);
// Signaling:
pthread_mutex_lock(&mutex);
condition = true;
pthread_cond_signal(&cond); // Wake one waiter
pthread_mutex_unlock(&mutex);
Many shared memory scenarios have asymmetric access patterns: multiple readers can safely access data simultaneously, but writers need exclusive access. Reader-writer locks optimize for this pattern.
POSIX Reader-Writer Locks:
pthread_rwlockattr_t attr;
pthread_rwlockattr_init(&attr);
pthread_rwlockattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
pthread_rwlock_t *rwlock = /* in shared memory */;
pthread_rwlock_init(rwlock, &attr);
// Reader:
pthread_rwlock_rdlock(rwlock); // Multiple readers OK
// ... read shared data ...
pthread_rwlock_unlock(rwlock);
// Writer:
pthread_rwlock_wrlock(rwlock); // Exclusive access
// ... modify shared data ...
pthread_rwlock_unlock(rwlock);
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
#include <stdio.h>#include <stdlib.h>#include <string.h>#include <fcntl.h>#include <sys/mman.h>#include <sys/wait.h>#include <pthread.h>#include <unistd.h>#include <time.h> #define SHM_NAME "/rwlock_demo" /** * Demonstrates reader-writer lock for shared memory. * Multiple readers can access simultaneously; writers get exclusive access. */ typedef struct { pthread_rwlock_t rwlock; int value; int read_count; int write_count;} SharedState; void reader_process(SharedState *state, int id, int iterations) { for (int i = 0; i < iterations; i++) { pthread_rwlock_rdlock(&state->rwlock); // Reading - multiple readers can be here simultaneously int val = state->value; __sync_fetch_and_add(&state->read_count, 1); // Simulate read work usleep(100); pthread_rwlock_unlock(&state->rwlock); usleep(rand() % 1000); } printf("Reader %d: completed %d reads\n", id, iterations);} void writer_process(SharedState *state, int id, int iterations) { for (int i = 0; i < iterations; i++) { pthread_rwlock_wrlock(&state->rwlock); // Writing - exclusive access state->value++; __sync_fetch_and_add(&state->write_count, 1); // Simulate write work usleep(500); pthread_rwlock_unlock(&state->rwlock); usleep(rand() % 2000); } printf("Writer %d: completed %d writes\n", id, iterations);} int main() { int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666); ftruncate(shm_fd, sizeof(SharedState)); SharedState *state = mmap(NULL, sizeof(SharedState), PROT_READ | PROT_WRITE, MAP_SHARED, shm_fd, 0); close(shm_fd); // Initialize rwlock for process sharing pthread_rwlockattr_t attr; pthread_rwlockattr_init(&attr); pthread_rwlockattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); pthread_rwlock_init(&state->rwlock, &attr); pthread_rwlockattr_destroy(&attr); state->value = 0; state->read_count = 0; state->write_count = 0; srand(time(NULL)); printf("Starting reader-writer demo...\n"); printf("Spawning 3 readers and 1 writer\n\n"); pid_t pids[4]; // Fork 3 readers for (int i = 0; i < 3; i++) { pids[i] = fork(); if (pids[i] == 0) { srand(time(NULL) + i); reader_process(state, i + 1, 20); munmap(state, sizeof(SharedState)); exit(0); } } // Fork 1 writer pids[3] = fork(); if (pids[3] == 0) { srand(time(NULL) + 100); writer_process(state, 1, 10); munmap(state, sizeof(SharedState)); exit(0); } // Wait for all children for (int i = 0; i < 4; i++) { waitpid(pids[i], NULL, 0); } printf("\n--- Results ---\n"); printf("Final value: %d\n", state->value); printf("Total reads: %d\n", state->read_count); printf("Total writes: %d\n", state->write_count); pthread_rwlock_destroy(&state->rwlock); munmap(state, sizeof(SharedState)); shm_unlink(SHM_NAME); return 0;}Synchronization bugs are among the most difficult to diagnose—they're often non-deterministic, timing-dependent, and may not appear in testing. Following established best practices significantly reduces risk.
In lock-free programming with CAS (Compare-And-Swap), the ABA problem occurs when a value changes from A→B→A between your read and CAS. The CAS succeeds because the value is A, but the state may be inconsistent. Solutions include version counters or hazard pointers. This is one reason to prefer mutex-based synchronization unless you have specific expertise in lock-free algorithms.
We've explored why synchronization is mandatory for shared memory and the mechanisms available to achieve it safely. Let's consolidate the key takeaways:
What's Next:
With the challenges of synchronization understood, we turn to the rewards: performance benefits. The next page explores why shared memory, despite its complexity, delivers performance that other IPC mechanisms cannot match—and when that performance advantage justifies the added design burden.
You now understand why shared memory requires explicit synchronization and the mechanisms available to provide it safely. You've learned about race conditions, memory ordering, atomics, semaphores, and process-shared mutexes. Next, we'll explore the performance benefits that justify this complexity.