Operating SystemsShared Memory

Shared Memory: The Fastest IPC Mechanism

LevelIntermediate

Duration75 mins

TopicShared Memory

3 / 5

Synchronization Requirements

The Hidden Danger of Shared Memory

Shared memory is seductively simple: map the same physical memory into multiple processes and let them communicate directly. No system calls, no kernel involvement, no copying—just raw memory access at full speed. But this simplicity conceals a fundamental problem: the kernel provides no synchronization.

When two processes (or threads) access the same memory location, and at least one is writing, you have a data race. The outcome depends on the precise timing of operations—timing that varies with CPU load, scheduling decisions, and even temperature-induced clock variations. Your code might work perfectly in testing and fail catastrophically in production, or vice versa.

This page explores why synchronization is essential for shared memory, what can go wrong without it, and the mechanisms available to coordinate concurrent access safely.

What You Will Learn

By the end of this page, you will understand: (1) Why unsynchronized shared memory access is dangerous, (2) Memory ordering and visibility guarantees (or lack thereof), (3) Atomic operations and their limitations, (4) POSIX semaphores for shared memory, (5) Process-shared mutexes and condition variables, and (6) Patterns for reader-writer access.

Race Conditions: What Can Go Wrong

A race condition occurs when the correctness of a program depends on the relative timing of operations. With shared memory, race conditions manifest in several insidious ways:

1. Lost Updates:

Two processes read a counter, increment it, and write back. Without atomicity, both might read the same value and write the same incremented value—one update is lost.

Time    Process A           Process B          Counter Value
----    ---------           ---------          -------------
 1      Read counter (0)                            0
 2                          Read counter (0)        0
 3      Increment (→1)                              0
 4                          Increment (→1)          0
 5      Write counter (1)                           1
 6                          Write counter (1)       1

Expected: 2, Actual: 1 — One update lost!

2. Torn Reads/Writes:

When reading or writing data larger than the CPU's natural word size, the operation may not be atomic. A 64-bit value on a 32-bit system requires two memory operations. If a reader interleaves with a writer, it might see half of the old value and half of the new.

// Writer stores 0x0000000100000001
uint64_t value = 0x0000000100000001;

// Process A writes high word first:  [NEW_HIGH][OLD_LOW]
// Process B reads in between:         sees 0x0000000100000000 (corrupted!)
// Process A writes low word:          [NEW_HIGH][NEW_LOW]

3. Inconsistent State:

Complex data structures often have invariants (e.g., a linked list's prev/next pointers must be consistent). A reader observing a partially-completed update sees an inconsistent state, leading to crashes or incorrect behavior.

race_condition_demo.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>
 
#define SHM_NAME "/race_demo"
#define SHM_SIZE sizeof(SharedCounter)
#define ITERATIONS 100000
 
/**
 * Demonstrates a race condition in shared memory.
 * Two processes increment a counter without synchronization.
 * Expected result: 2 * ITERATIONS = 200000
 * Actual result: Less than expected due to lost updates.
 */
 
typedef struct {
    volatile int counter;  // volatile doesn't prevent races!
} SharedCounter;
 
void increment_counter(SharedCounter *shared, int iterations) {
    for (int i = 0; i < iterations; i++) {
        // This is NOT atomic! It's: load → add → store
        int temp = shared->counter;  // Read
        temp = temp + 1;              // Modify
        shared->counter = temp;       // Write
        
        // Even this "simpler" version is not atomic:
        // shared->counter++;
        // The compiler generates: load, add, store
    }
}
 
int main() {
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(shm_fd, SHM_SIZE);
    
    SharedCounter *shared = mmap(NULL, SHM_SIZE, PROT_READ | PROT_WRITE,
                                  MAP_SHARED, shm_fd, 0);
    close(shm_fd);
    
    // Initialize
    shared->counter = 0;
    
    printf("Expected final counter: %d\n", 2 * ITERATIONS);
    printf("Starting race condition test...\n\n");
    
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child process
        increment_counter(shared, ITERATIONS);
        munmap(shared, SHM_SIZE);
        exit(0);
    }
    
    // Parent process
    increment_counter(shared, ITERATIONS);
    
    // Wait for child
    waitpid(pid, NULL, 0);
    
    printf("Actual final counter: %d\n", shared->counter);
    printf("Lost updates: %d\n", (2 * ITERATIONS) - shared->counter);
    
    if (shared->counter < 2 * ITERATIONS) {
        printf("\n⚠️  Race condition demonstrated!\n");
        printf("This is why synchronization is REQUIRED.\n");
    }
    
    munmap(shared, SHM_SIZE);
    shm_unlink(SHM_NAME);
    
    return 0;
}

volatile Does NOT Prevent Race Conditions

The 'volatile' keyword tells the compiler not to optimize away memory accesses—useful for memory-mapped hardware registers. But it provides NO atomicity, NO memory ordering between different processes or CPUs, and NO protection against race conditions. It is NOT a synchronization primitive.

Memory Ordering and Visibility

Beyond race conditions on individual values, modern processors introduce another layer of complexity: memory reordering. Both compilers and CPUs reorder memory operations for performance, and these reorderings can break assumptions in multi-process code.

Compiler Reordering:

Compilers are free to reorder memory accesses if the single-threaded behavior is unchanged. Consider:

data = 42;      // Write data
ready = 1;      // Signal that data is ready

The compiler might reorder these to:

ready = 1;      // Oops! Signaled before data was written
data = 42;

A reader checking ready would see stale data.

CPU Reordering:

Even if the compiler preserves order, the CPU may reorder stores and loads:

Store buffers: Writes go to a buffer before RAM. Other CPUs see updated values later.
Speculative execution: Loads can be executed before earlier instructions complete.
Memory model: x86 has relatively strong ordering; ARM and RISC-V have weaker guarantees.

Converting Mermaid diagram...

Memory Barriers:

To enforce ordering, you need memory barriers (also called fences). These instructions ensure that:

Compiler barrier: Prevents compiler from reordering across the barrier
Hardware barrier: Ensures CPU store buffers are flushed or loads wait for prior stores

// GCC/Clang compiler barrier
asm volatile("" ::: "memory");

// Full hardware memory barrier (x86)
asm volatile("mfence" ::: "memory");

// C11/C++11 atomic with release/acquire semantics
atomic_store_explicit(&ready, 1, memory_order_release);
// ...other CPU...
if (atomic_load_explicit(&ready, memory_order_acquire)) {
    // Data written before the release is now visible
}

Use High-Level Primitives

Memory barriers and ordering are complex, subtle, and error-prone. Instead of using them directly, prefer high-level synchronization primitives (mutexes, semaphores) that encapsulate the correct barriers. Only use explicit atomics and barriers when absolutely necessary for performance, and document the reasoning extensively.

Atomic Operations

Atomic operations are indivisible—they complete entirely or not at all, with no observable intermediate state. Modern CPUs provide hardware support for atomic read-modify-write operations on aligned words.

C11 Atomics:

C11 introduced the <stdatomic.h> header with portable atomic types and operations:

#include <stdatomic.h>

atomic_int counter;  // Atomic integer

// Atomic operations
atomic_fetch_add(&counter, 1);       // Atomically: counter += 1, return old value
atomic_compare_exchange_strong(&x, &expected, desired);  // CAS
atomic_load(&counter);               // Atomic read
atomic_store(&counter, 42);          // Atomic write

Common Atomic Operations
Operation	Function	Semantics
Load	atomic_load(&x)	Read value atomically
Store	atomic_store(&x, val)	Write value atomically
Exchange	atomic_exchange(&x, val)	Write and return old value
Fetch-Add	atomic_fetch_add(&x, val)	Add and return old value
Fetch-Sub	atomic_fetch_sub(&x, val)	Subtract and return old value
Fetch-Or	atomic_fetch_or(&x, val)	Bitwise OR and return old
Fetch-And	atomic_fetch_and(&x, val)	Bitwise AND and return old
CAS	atomic_compare_exchange_strong(&x, &expected, desired)	If x==expected, x=desired; else expected=x

atomic_counter_demo.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#include <stdio.h>
#include <stdlib.h>
#include <stdatomic.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>
 
#define SHM_NAME "/atomic_demo"
#define ITERATIONS 100000
 
/**
 * Demonstrates correct atomic counter increment in shared memory.
 * Compare to the race condition demo—this version produces correct results.
 */
 
typedef struct {
    atomic_int counter;
} SharedAtomicCounter;
 
void atomic_increment(SharedAtomicCounter *shared, int iterations) {
    for (int i = 0; i < iterations; i++) {
        // Correct way: atomic fetch-and-add
        atomic_fetch_add(&shared->counter, 1);
        
        // Alternative with explicit memory ordering:
        // atomic_fetch_add_explicit(&shared->counter, 1, memory_order_relaxed);
    }
}
 
int main() {
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(shm_fd, sizeof(SharedAtomicCounter));
    
    SharedAtomicCounter *shared = mmap(NULL, sizeof(SharedAtomicCounter),
                                        PROT_READ | PROT_WRITE,
                                        MAP_SHARED, shm_fd, 0);
    close(shm_fd);
    
    // Initialize (before forking!)
    atomic_init(&shared->counter, 0);
    
    printf("Expected final counter: %d\n", 2 * ITERATIONS);
    printf("Starting atomic counter test...\n\n");
    
    pid_t pid = fork();
    
    if (pid == 0) {
        atomic_increment(shared, ITERATIONS);
        munmap(shared, sizeof(SharedAtomicCounter));
        exit(0);
    }
    
    atomic_increment(shared, ITERATIONS);
    waitpid(pid, NULL, 0);
    
    int final = atomic_load(&shared->counter);
    printf("Actual final counter: %d\n", final);
    
    if (final == 2 * ITERATIONS) {
        printf("\n✓ Perfect! Atomics prevent race conditions.\n");
    } else {
        printf("\n⚠️  Unexpected: check if atomic is lock-free on this system.\n");
    }
    
    munmap(shared, sizeof(SharedAtomicCounter));
    shm_unlink(SHM_NAME);
    
    return 0;
}

Lock-Free Atomics in Shared Memory

C11 atomics may use locks internally if the type isn't natively atomic on your platform. Check with atomic_is_lock_free(&x). Lock-based 'atomics' are NOT safe for shared memory between unrelated processes—the lock is process-local! For shared memory, verify lock-free status or use POSIX semaphores instead.

Limitations of Atomics:

Single-word operations only: Atomics work for individual variables. You cannot atomically update two related fields (e.g., a pointer and a size).
No waiting: Atomics provide no mechanism to wait for a condition. Spinning (busy-waiting) wastes CPU cycles.
Complex algorithms needed: Lock-free algorithms using only atomics are notoriously difficult to design and verify.

For most shared memory use cases, higher-level primitives like semaphores or mutexes are more appropriate.

POSIX Semaphores for Shared Memory

POSIX semaphores provide a robust, well-tested synchronization mechanism suitable for shared memory IPC. Two varieties exist:

Named Semaphores: Created with a name in the filesystem namespace (like POSIX shared memory). Any process knowing the name can access them.

#include <semaphore.h>

sem_t *sem_open(const char *name, int oflag, mode_t mode, unsigned int value);
int sem_close(sem_t *sem);
int sem_unlink(const char *name);

Unnamed Semaphores: Allocated in shared memory itself. Initialized with pshared=1 for inter-process use.

int sem_init(sem_t *sem, int pshared, unsigned int value);
int sem_destroy(sem_t *sem);

Semaphore Operations
Operation	Function	Behavior
Wait (P)	sem_wait(sem)	Decrement; block if value is 0
Try Wait	sem_trywait(sem)	Decrement if >0; else return EAGAIN
Timed Wait	sem_timedwait(sem, &ts)	Wait with timeout
Post (V)	sem_post(sem)	Increment; wake one waiter
Get Value	sem_getvalue(sem, &val)	Read current counter value

semaphore_shared_memory.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <semaphore.h>
#include <unistd.h>
 
#define SHM_NAME "/sem_shm_demo"
#define SEM_NAME "/sem_demo"
 
/**
 * Demonstrates producer-consumer pattern using shared memory
 * and a named semaphore for synchronization.
 */
 
typedef struct {
    int data;
    int ready;
} SharedBuffer;
 
int main() {
    // Create shared memory
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(shm_fd, sizeof(SharedBuffer));
    
    SharedBuffer *buf = mmap(NULL, sizeof(SharedBuffer),
                              PROT_READ | PROT_WRITE,
                              MAP_SHARED, shm_fd, 0);
    close(shm_fd);
    
    // Create named semaphore
    // O_CREAT | O_EXCL: Create exclusively
    // Initial value 0: Consumer will wait
    sem_unlink(SEM_NAME);  // Remove any stale semaphore
    sem_t *sem = sem_open(SEM_NAME, O_CREAT | O_EXCL, 0666, 0);
    if (sem == SEM_FAILED) {
        perror("sem_open");
        exit(EXIT_FAILURE);
    }
    
    // Initialize buffer
    buf->data = 0;
    buf->ready = 0;
    
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child: Consumer
        printf("Consumer: waiting for data...\n");
        
        // Wait on semaphore (blocks until producer posts)
        sem_wait(sem);
        
        printf("Consumer: received data = %d\n", buf->data);
        printf("Consumer: ready flag = %d\n", buf->ready);
        
        sem_close(sem);
        munmap(buf, sizeof(SharedBuffer));
        exit(0);
    }
    
    // Parent: Producer
    sleep(1);  // Simulate work before producing
    
    printf("Producer: writing data...\n");
    buf->data = 42;
    buf->ready = 1;
    
    // Post semaphore (wakes consumer)
    sem_post(sem);
    printf("Producer: signaled consumer\n");
    
    // Wait for child
    waitpid(pid, NULL, 0);
    
    // Cleanup (only parent)
    sem_close(sem);
    sem_unlink(SEM_NAME);
    munmap(buf, sizeof(SharedBuffer));
    shm_unlink(SHM_NAME);
    
    printf("Clean exit\n");
    
    return 0;
}

Unnamed Semaphores in Shared Memory:

For tighter coupling, you can embed the semaphore directly in the shared memory region:

typedef struct {
    sem_t sem;       // Semaphore embedded in shared memory
    int data;
} SharedRegion;

// In creator process:
SharedRegion *region = /* mmap shared memory */;
sem_init(&region->sem, 1, 0);  // pshared=1 for inter-process

// Any process can now use:
sem_wait(&region->sem);
sem_post(&region->sem);

Advantages of embedded semaphores:

No separate namespace management
Semaphore and data are allocated/deallocated together
Cleaner design for self-contained shared data structures

sem_init pshared Parameter

When using sem_init() for inter-process synchronization, the pshared parameter MUST be non-zero (typically 1). If pshared is 0, the semaphore is only valid for threads within a single process—using it across processes results in undefined behavior.

Process-Shared Mutexes and Condition Variables

POSIX pthread mutexes and condition variables can be configured for inter-process use by embedding them in shared memory and setting the PTHREAD_PROCESS_SHARED attribute.

Configuring a Process-Shared Mutex:

pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);

pthread_mutex_t *mutex = /* pointer into shared memory */;
pthread_mutex_init(mutex, &attr);

pthread_mutexattr_destroy(&attr);

Once initialized, processes can lock/unlock the mutex with standard pthread calls.

process_shared_mutex.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <pthread.h>
#include <unistd.h>
 
#define SHM_NAME "/mutex_demo"
#define ITERATIONS 50000
 
/**
 * Demonstrates process-shared mutex protecting a counter in shared memory.
 */
 
typedef struct {
    pthread_mutex_t mutex;
    int counter;
} SharedData;
 
void increment_with_mutex(SharedData *shared, int iterations) {
    for (int i = 0; i < iterations; i++) {
        pthread_mutex_lock(&shared->mutex);
        shared->counter++;
        pthread_mutex_unlock(&shared->mutex);
    }
}
 
int main() {
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(shm_fd, sizeof(SharedData));
    
    SharedData *shared = mmap(NULL, sizeof(SharedData),
                               PROT_READ | PROT_WRITE,
                               MAP_SHARED, shm_fd, 0);
    close(shm_fd);
    
    // Initialize mutex with PTHREAD_PROCESS_SHARED
    pthread_mutexattr_t attr;
    pthread_mutexattr_init(&attr);
    pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
    
    // Optionally make it robust (survives holder crash)
    pthread_mutexattr_setrobust(&attr, PTHREAD_MUTEX_ROBUST);
    
    pthread_mutex_init(&shared->mutex, &attr);
    pthread_mutexattr_destroy(&attr);
    
    shared->counter = 0;
    
    printf("Expected: %d\n", 2 * ITERATIONS);
    
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child
        increment_with_mutex(shared, ITERATIONS);
        munmap(shared, sizeof(SharedData));
        exit(0);
    }
    
    // Parent
    increment_with_mutex(shared, ITERATIONS);
    waitpid(pid, NULL, 0);
    
    printf("Actual: %d\n", shared->counter);
    
    if (shared->counter == 2 * ITERATIONS) {
        printf("✓ Process-shared mutex works correctly!\n");
    }
    
    pthread_mutex_destroy(&shared->mutex);
    munmap(shared, sizeof(SharedData));
    shm_unlink(SHM_NAME);
    
    return 0;
}
 
/**
 * ROBUST MUTEX HANDLING:
 * 
 * If a process holding a robust mutex dies, the next pthread_mutex_lock()
 * returns EOWNERDEAD. The new owner must call:
 * 
 *     pthread_mutex_consistent(&mutex);
 * 
 * to mark the mutex as consistent again, then unlock normally.
 * This prevents permanent deadlock when a mutex holder crashes.
 * 
 * Example:
 * 
 *     int rc = pthread_mutex_lock(&shared->mutex);
 *     if (rc == EOWNERDEAD) {
 *         // Previous owner died while holding the lock
 *         // Check/repair shared data state here
 *         pthread_mutex_consistent(&shared->mutex);
 *     }
 *     // ... use shared data ...
 *     pthread_mutex_unlock(&shared->mutex);
 */

Robust Mutexes

Process-shared mutexes can be made 'robust' using pthread_mutexattr_setrobust(). A robust mutex can recover when its holder dies, returning EOWNERDEAD to the next locker. This is critical for shared memory systems where process crashes must not cause permanent deadlocks.

Process-Shared Condition Variables:

Condition variables work similarly:

pthread_condattr_t cond_attr;
pthread_condattr_init(&cond_attr);
pthread_condattr_setpshared(&cond_attr, PTHREAD_PROCESS_SHARED);

pthread_cond_t *cond = /* pointer into shared memory */;
pthread_cond_init(cond, &cond_attr);
pthread_condattr_destroy(&cond_attr);

// Usage:
pthread_mutex_lock(&mutex);
while (!condition) {
    pthread_cond_wait(&cond, &mutex);  // Atomically unlock and wait
}
// condition is true, mutex is held
pthread_mutex_unlock(&mutex);

// Signaling:
pthread_mutex_lock(&mutex);
condition = true;
pthread_cond_signal(&cond);  // Wake one waiter
pthread_mutex_unlock(&mutex);

Reader-Writer Synchronization Patterns

Many shared memory scenarios have asymmetric access patterns: multiple readers can safely access data simultaneously, but writers need exclusive access. Reader-writer locks optimize for this pattern.

POSIX Reader-Writer Locks:

pthread_rwlockattr_t attr;
pthread_rwlockattr_init(&attr);
pthread_rwlockattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);

pthread_rwlock_t *rwlock = /* in shared memory */;
pthread_rwlock_init(rwlock, &attr);

// Reader:
pthread_rwlock_rdlock(rwlock);     // Multiple readers OK
// ... read shared data ...
pthread_rwlock_unlock(rwlock);

// Writer:
pthread_rwlock_wrlock(rwlock);     // Exclusive access
// ... modify shared data ...
pthread_rwlock_unlock(rwlock);

reader_writer_shm.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <pthread.h>
#include <unistd.h>
#include <time.h>
 
#define SHM_NAME "/rwlock_demo"
 
/**
 * Demonstrates reader-writer lock for shared memory.
 * Multiple readers can access simultaneously; writers get exclusive access.
 */
 
typedef struct {
    pthread_rwlock_t rwlock;
    int value;
    int read_count;
    int write_count;
} SharedState;
 
void reader_process(SharedState *state, int id, int iterations) {
    for (int i = 0; i < iterations; i++) {
        pthread_rwlock_rdlock(&state->rwlock);
        
        // Reading - multiple readers can be here simultaneously
        int val = state->value;
        __sync_fetch_and_add(&state->read_count, 1);
        
        // Simulate read work
        usleep(100);
        
        pthread_rwlock_unlock(&state->rwlock);
        usleep(rand() % 1000);
    }
    printf("Reader %d: completed %d reads\n", id, iterations);
}
 
void writer_process(SharedState *state, int id, int iterations) {
    for (int i = 0; i < iterations; i++) {
        pthread_rwlock_wrlock(&state->rwlock);
        
        // Writing - exclusive access
        state->value++;
        __sync_fetch_and_add(&state->write_count, 1);
        
        // Simulate write work
        usleep(500);
        
        pthread_rwlock_unlock(&state->rwlock);
        usleep(rand() % 2000);
    }
    printf("Writer %d: completed %d writes\n", id, iterations);
}
 
int main() {
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(shm_fd, sizeof(SharedState));
    
    SharedState *state = mmap(NULL, sizeof(SharedState),
                               PROT_READ | PROT_WRITE,
                               MAP_SHARED, shm_fd, 0);
    close(shm_fd);
    
    // Initialize rwlock for process sharing
    pthread_rwlockattr_t attr;
    pthread_rwlockattr_init(&attr);
    pthread_rwlockattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
    pthread_rwlock_init(&state->rwlock, &attr);
    pthread_rwlockattr_destroy(&attr);
    
    state->value = 0;
    state->read_count = 0;
    state->write_count = 0;
    
    srand(time(NULL));
    
    printf("Starting reader-writer demo...\n");
    printf("Spawning 3 readers and 1 writer\n\n");
    
    pid_t pids[4];
    
    // Fork 3 readers
    for (int i = 0; i < 3; i++) {
        pids[i] = fork();
        if (pids[i] == 0) {
            srand(time(NULL) + i);
            reader_process(state, i + 1, 20);
            munmap(state, sizeof(SharedState));
            exit(0);
        }
    }
    
    // Fork 1 writer
    pids[3] = fork();
    if (pids[3] == 0) {
        srand(time(NULL) + 100);
        writer_process(state, 1, 10);
        munmap(state, sizeof(SharedState));
        exit(0);
    }
    
    // Wait for all children
    for (int i = 0; i < 4; i++) {
        waitpid(pids[i], NULL, 0);
    }
    
    printf("\n--- Results ---\n");
    printf("Final value: %d\n", state->value);
    printf("Total reads: %d\n", state->read_count);
    printf("Total writes: %d\n", state->write_count);
    
    pthread_rwlock_destroy(&state->rwlock);
    munmap(state, sizeof(SharedState));
    shm_unlink(SHM_NAME);
    
    return 0;
}

When to Use Reader-Writer Locks

•Good fit: Read-heavy workloads (e.g., configuration data read by many workers, updated occasionally)
•Good fit: Large shared data structures where parallel reads provide significant speedup
•Poor fit: Write-heavy workloads (writers block all readers; use regular mutex instead)
•Poor fit: Very short critical sections (rwlock overhead may exceed benefit)
•Be aware: Default policy may starve writers if readers continuously arrive

Synchronization Best Practices

Synchronization bugs are among the most difficult to diagnose—they're often non-deterministic, timing-dependent, and may not appear in testing. Following established best practices significantly reduces risk.

Essential Practices

•Minimize critical section size — Hold locks for the shortest time possible. Do preparation work outside the lock.
•Use RAII patterns — In C++, use lock guards (std::scoped_lock) to ensure locks are released even if exceptions occur.
•Avoid nested locks — If you must, always acquire locks in a consistent order to prevent deadlock.
•Use robust mutexes — For shared memory, PTHREAD_MUTEX_ROBUST prevents permanent deadlock if a holder crashes.
•Initialize before fork — Initialize synchronization primitives before fork() to avoid double-initialization.
•Consider timeout variants — sem_timedwait(), pthread_mutex_timedlock() prevent infinite waits.
•Document lock ordering — If multiple locks exist, document and enforce acquisition order.
•Test under load — Race conditions often only appear under high concurrency. Stress test thoroughly.

The ABA Problem

In lock-free programming with CAS (Compare-And-Swap), the ABA problem occurs when a value changes from A→B→A between your read and CAS. The CAS succeeds because the value is A, but the state may be inconsistent. Solutions include version counters or hazard pointers. This is one reason to prefer mutex-based synchronization unless you have specific expertise in lock-free algorithms.

Summary: Synchronization Requirements

We've explored why synchronization is mandatory for shared memory and the mechanisms available to achieve it safely. Let's consolidate the key takeaways:

Key Takeaways

•Shared memory has no built-in synchronization — The kernel maps memory but provides no coordination. You must add it.
•Race conditions cause lost updates, torn reads, and inconsistent state — Bugs may be intermittent and extremely difficult to diagnose.
•Memory ordering adds another layer of complexity — Compilers and CPUs reorder operations. Memory barriers enforce ordering.
•Atomics work for single values but have limitations — Ensure they're lock-free for cross-process use. Not suitable for complex structures.
•POSIX semaphores are robust for IPC — Named semaphores or embedded semaphores with pshared=1 work reliably.
•Process-shared mutexes provide familiar semantics — Set PTHREAD_PROCESS_SHARED attribute. Prefer robust mutexes.
•Reader-writer locks optimize read-heavy workloads — Multiple readers OK; writers get exclusive access.

What's Next:

With the challenges of synchronization understood, we turn to the rewards: performance benefits. The next page explores why shared memory, despite its complexity, delivers performance that other IPC mechanisms cannot match—and when that performance advantage justifies the added design burden.

Page Complete

You now understand why shared memory requires explicit synchronization and the mechanisms available to provide it safely. You've learned about race conditions, memory ordering, atomics, semaphores, and process-shared mutexes. Next, we'll explore the performance benefits that justify this complexity.

3 / 5

Loading learning content...

Operating SystemsShared Memory

Shared Memory: The Fastest IPC Mechanism

LevelIntermediate

Duration75 mins

TopicShared Memory

3 / 5

Synchronization Requirements

The Hidden Danger of Shared Memory

This page explores why synchronization is essential for shared memory, what can go wrong without it, and the mechanisms available to coordinate concurrent access safely.

What You Will Learn

Race Conditions: What Can Go Wrong

A race condition occurs when the correctness of a program depends on the relative timing of operations. With shared memory, race conditions manifest in several insidious ways:

1. Lost Updates:

Two processes read a counter, increment it, and write back. Without atomicity, both might read the same value and write the same incremented value—one update is lost.

Time    Process A           Process B          Counter Value
----    ---------           ---------          -------------
 1      Read counter (0)                            0
 2                          Read counter (0)        0
 3      Increment (→1)                              0
 4                          Increment (→1)          0
 5      Write counter (1)                           1
 6                          Write counter (1)       1

Expected: 2, Actual: 1 — One update lost!

2. Torn Reads/Writes:

// Writer stores 0x0000000100000001
uint64_t value = 0x0000000100000001;

// Process A writes high word first:  [NEW_HIGH][OLD_LOW]
// Process B reads in between:         sees 0x0000000100000000 (corrupted!)
// Process A writes low word:          [NEW_HIGH][NEW_LOW]

3. Inconsistent State:

race_condition_demo.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>
 
#define SHM_NAME "/race_demo"
#define SHM_SIZE sizeof(SharedCounter)
#define ITERATIONS 100000
 
/**
 * Demonstrates a race condition in shared memory.
 * Two processes increment a counter without synchronization.
 * Expected result: 2 * ITERATIONS = 200000
 * Actual result: Less than expected due to lost updates.
 */
 
typedef struct {
    volatile int counter;  // volatile doesn't prevent races!
} SharedCounter;
 
void increment_counter(SharedCounter *shared, int iterations) {
    for (int i = 0; i < iterations; i++) {
        // This is NOT atomic! It's: load → add → store
        int temp = shared->counter;  // Read
        temp = temp + 1;              // Modify
        shared->counter = temp;       // Write
        
        // Even this "simpler" version is not atomic:
        // shared->counter++;
        // The compiler generates: load, add, store
    }
}
 
int main() {
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(shm_fd, SHM_SIZE);
    
    SharedCounter *shared = mmap(NULL, SHM_SIZE, PROT_READ | PROT_WRITE,
                                  MAP_SHARED, shm_fd, 0);
    close(shm_fd);
    
    // Initialize
    shared->counter = 0;
    
    printf("Expected final counter: %d\n", 2 * ITERATIONS);
    printf("Starting race condition test...\n\n");
    
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child process
        increment_counter(shared, ITERATIONS);
        munmap(shared, SHM_SIZE);
        exit(0);
    }
    
    // Parent process
    increment_counter(shared, ITERATIONS);
    
    // Wait for child
    waitpid(pid, NULL, 0);
    
    printf("Actual final counter: %d\n", shared->counter);
    printf("Lost updates: %d\n", (2 * ITERATIONS) - shared->counter);
    
    if (shared->counter < 2 * ITERATIONS) {
        printf("\n⚠️  Race condition demonstrated!\n");
        printf("This is why synchronization is REQUIRED.\n");
    }
    
    munmap(shared, SHM_SIZE);
    shm_unlink(SHM_NAME);
    
    return 0;
}

volatile Does NOT Prevent Race Conditions

Memory Ordering and Visibility

Compiler Reordering:

Compilers are free to reorder memory accesses if the single-threaded behavior is unchanged. Consider:

data = 42;      // Write data
ready = 1;      // Signal that data is ready

The compiler might reorder these to:

ready = 1;      // Oops! Signaled before data was written
data = 42;

A reader checking ready would see stale data.

CPU Reordering:

Even if the compiler preserves order, the CPU may reorder stores and loads:

Store buffers: Writes go to a buffer before RAM. Other CPUs see updated values later.
Speculative execution: Loads can be executed before earlier instructions complete.
Memory model: x86 has relatively strong ordering; ARM and RISC-V have weaker guarantees.

Converting Mermaid diagram...

Memory Barriers:

To enforce ordering, you need memory barriers (also called fences). These instructions ensure that:

Compiler barrier: Prevents compiler from reordering across the barrier
Hardware barrier: Ensures CPU store buffers are flushed or loads wait for prior stores

// GCC/Clang compiler barrier
asm volatile("" ::: "memory");

// Full hardware memory barrier (x86)
asm volatile("mfence" ::: "memory");

// C11/C++11 atomic with release/acquire semantics
atomic_store_explicit(&ready, 1, memory_order_release);
// ...other CPU...
if (atomic_load_explicit(&ready, memory_order_acquire)) {
    // Data written before the release is now visible
}

Use High-Level Primitives

Atomic Operations

C11 Atomics:

C11 introduced the <stdatomic.h> header with portable atomic types and operations:

#include <stdatomic.h>

atomic_int counter;  // Atomic integer

// Atomic operations
atomic_fetch_add(&counter, 1);       // Atomically: counter += 1, return old value
atomic_compare_exchange_strong(&x, &expected, desired);  // CAS
atomic_load(&counter);               // Atomic read
atomic_store(&counter, 42);          // Atomic write

Common Atomic Operations
Operation	Function	Semantics
Load	atomic_load(&x)	Read value atomically
Store	atomic_store(&x, val)	Write value atomically
Exchange	atomic_exchange(&x, val)	Write and return old value
Fetch-Add	atomic_fetch_add(&x, val)	Add and return old value
Fetch-Sub	atomic_fetch_sub(&x, val)	Subtract and return old value
Fetch-Or	atomic_fetch_or(&x, val)	Bitwise OR and return old
Fetch-And	atomic_fetch_and(&x, val)	Bitwise AND and return old
CAS	atomic_compare_exchange_strong(&x, &expected, desired)	If x==expected, x=desired; else expected=x

atomic_counter_demo.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#include <stdio.h>
#include <stdlib.h>
#include <stdatomic.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>
 
#define SHM_NAME "/atomic_demo"
#define ITERATIONS 100000
 
/**
 * Demonstrates correct atomic counter increment in shared memory.
 * Compare to the race condition demo—this version produces correct results.
 */
 
typedef struct {
    atomic_int counter;
} SharedAtomicCounter;
 
void atomic_increment(SharedAtomicCounter *shared, int iterations) {
    for (int i = 0; i < iterations; i++) {
        // Correct way: atomic fetch-and-add
        atomic_fetch_add(&shared->counter, 1);
        
        // Alternative with explicit memory ordering:
        // atomic_fetch_add_explicit(&shared->counter, 1, memory_order_relaxed);
    }
}
 
int main() {
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(shm_fd, sizeof(SharedAtomicCounter));
    
    SharedAtomicCounter *shared = mmap(NULL, sizeof(SharedAtomicCounter),
                                        PROT_READ | PROT_WRITE,
                                        MAP_SHARED, shm_fd, 0);
    close(shm_fd);
    
    // Initialize (before forking!)
    atomic_init(&shared->counter, 0);
    
    printf("Expected final counter: %d\n", 2 * ITERATIONS);
    printf("Starting atomic counter test...\n\n");
    
    pid_t pid = fork();
    
    if (pid == 0) {
        atomic_increment(shared, ITERATIONS);
        munmap(shared, sizeof(SharedAtomicCounter));
        exit(0);
    }
    
    atomic_increment(shared, ITERATIONS);
    waitpid(pid, NULL, 0);
    
    int final = atomic_load(&shared->counter);
    printf("Actual final counter: %d\n", final);
    
    if (final == 2 * ITERATIONS) {
        printf("\n✓ Perfect! Atomics prevent race conditions.\n");
    } else {
        printf("\n⚠️  Unexpected: check if atomic is lock-free on this system.\n");
    }
    
    munmap(shared, sizeof(SharedAtomicCounter));
    shm_unlink(SHM_NAME);
    
    return 0;
}

Lock-Free Atomics in Shared Memory

Limitations of Atomics:

Single-word operations only: Atomics work for individual variables. You cannot atomically update two related fields (e.g., a pointer and a size).
No waiting: Atomics provide no mechanism to wait for a condition. Spinning (busy-waiting) wastes CPU cycles.
Complex algorithms needed: Lock-free algorithms using only atomics are notoriously difficult to design and verify.

For most shared memory use cases, higher-level primitives like semaphores or mutexes are more appropriate.

POSIX Semaphores for Shared Memory

POSIX semaphores provide a robust, well-tested synchronization mechanism suitable for shared memory IPC. Two varieties exist:

Named Semaphores: Created with a name in the filesystem namespace (like POSIX shared memory). Any process knowing the name can access them.

#include <semaphore.h>

sem_t *sem_open(const char *name, int oflag, mode_t mode, unsigned int value);
int sem_close(sem_t *sem);
int sem_unlink(const char *name);

Unnamed Semaphores: Allocated in shared memory itself. Initialized with pshared=1 for inter-process use.

int sem_init(sem_t *sem, int pshared, unsigned int value);
int sem_destroy(sem_t *sem);

Semaphore Operations
Operation	Function	Behavior
Wait (P)	sem_wait(sem)	Decrement; block if value is 0
Try Wait	sem_trywait(sem)	Decrement if >0; else return EAGAIN
Timed Wait	sem_timedwait(sem, &ts)	Wait with timeout
Post (V)	sem_post(sem)	Increment; wake one waiter
Get Value	sem_getvalue(sem, &val)	Read current counter value

semaphore_shared_memory.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <semaphore.h>
#include <unistd.h>
 
#define SHM_NAME "/sem_shm_demo"
#define SEM_NAME "/sem_demo"
 
/**
 * Demonstrates producer-consumer pattern using shared memory
 * and a named semaphore for synchronization.
 */
 
typedef struct {
    int data;
    int ready;
} SharedBuffer;
 
int main() {
    // Create shared memory
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(shm_fd, sizeof(SharedBuffer));
    
    SharedBuffer *buf = mmap(NULL, sizeof(SharedBuffer),
                              PROT_READ | PROT_WRITE,
                              MAP_SHARED, shm_fd, 0);
    close(shm_fd);
    
    // Create named semaphore
    // O_CREAT | O_EXCL: Create exclusively
    // Initial value 0: Consumer will wait
    sem_unlink(SEM_NAME);  // Remove any stale semaphore
    sem_t *sem = sem_open(SEM_NAME, O_CREAT | O_EXCL, 0666, 0);
    if (sem == SEM_FAILED) {
        perror("sem_open");
        exit(EXIT_FAILURE);
    }
    
    // Initialize buffer
    buf->data = 0;
    buf->ready = 0;
    
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child: Consumer
        printf("Consumer: waiting for data...\n");
        
        // Wait on semaphore (blocks until producer posts)
        sem_wait(sem);
        
        printf("Consumer: received data = %d\n", buf->data);
        printf("Consumer: ready flag = %d\n", buf->ready);
        
        sem_close(sem);
        munmap(buf, sizeof(SharedBuffer));
        exit(0);
    }
    
    // Parent: Producer
    sleep(1);  // Simulate work before producing
    
    printf("Producer: writing data...\n");
    buf->data = 42;
    buf->ready = 1;
    
    // Post semaphore (wakes consumer)
    sem_post(sem);
    printf("Producer: signaled consumer\n");
    
    // Wait for child
    waitpid(pid, NULL, 0);
    
    // Cleanup (only parent)
    sem_close(sem);
    sem_unlink(SEM_NAME);
    munmap(buf, sizeof(SharedBuffer));
    shm_unlink(SHM_NAME);
    
    printf("Clean exit\n");
    
    return 0;
}

Unnamed Semaphores in Shared Memory:

For tighter coupling, you can embed the semaphore directly in the shared memory region:

typedef struct {
    sem_t sem;       // Semaphore embedded in shared memory
    int data;
} SharedRegion;

// In creator process:
SharedRegion *region = /* mmap shared memory */;
sem_init(&region->sem, 1, 0);  // pshared=1 for inter-process

// Any process can now use:
sem_wait(&region->sem);
sem_post(&region->sem);

Advantages of embedded semaphores:

No separate namespace management
Semaphore and data are allocated/deallocated together
Cleaner design for self-contained shared data structures

sem_init pshared Parameter

Process-Shared Mutexes and Condition Variables

POSIX pthread mutexes and condition variables can be configured for inter-process use by embedding them in shared memory and setting the PTHREAD_PROCESS_SHARED attribute.

Configuring a Process-Shared Mutex:

pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);

pthread_mutex_t *mutex = /* pointer into shared memory */;
pthread_mutex_init(mutex, &attr);

pthread_mutexattr_destroy(&attr);

Once initialized, processes can lock/unlock the mutex with standard pthread calls.

process_shared_mutex.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <pthread.h>
#include <unistd.h>
 
#define SHM_NAME "/mutex_demo"
#define ITERATIONS 50000
 
/**
 * Demonstrates process-shared mutex protecting a counter in shared memory.
 */
 
typedef struct {
    pthread_mutex_t mutex;
    int counter;
} SharedData;
 
void increment_with_mutex(SharedData *shared, int iterations) {
    for (int i = 0; i < iterations; i++) {
        pthread_mutex_lock(&shared->mutex);
        shared->counter++;
        pthread_mutex_unlock(&shared->mutex);
    }
}
 
int main() {
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(shm_fd, sizeof(SharedData));
    
    SharedData *shared = mmap(NULL, sizeof(SharedData),
                               PROT_READ | PROT_WRITE,
                               MAP_SHARED, shm_fd, 0);
    close(shm_fd);
    
    // Initialize mutex with PTHREAD_PROCESS_SHARED
    pthread_mutexattr_t attr;
    pthread_mutexattr_init(&attr);
    pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
    
    // Optionally make it robust (survives holder crash)
    pthread_mutexattr_setrobust(&attr, PTHREAD_MUTEX_ROBUST);
    
    pthread_mutex_init(&shared->mutex, &attr);
    pthread_mutexattr_destroy(&attr);
    
    shared->counter = 0;
    
    printf("Expected: %d\n", 2 * ITERATIONS);
    
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child
        increment_with_mutex(shared, ITERATIONS);
        munmap(shared, sizeof(SharedData));
        exit(0);
    }
    
    // Parent
    increment_with_mutex(shared, ITERATIONS);
    waitpid(pid, NULL, 0);
    
    printf("Actual: %d\n", shared->counter);
    
    if (shared->counter == 2 * ITERATIONS) {
        printf("✓ Process-shared mutex works correctly!\n");
    }
    
    pthread_mutex_destroy(&shared->mutex);
    munmap(shared, sizeof(SharedData));
    shm_unlink(SHM_NAME);
    
    return 0;
}
 
/**
 * ROBUST MUTEX HANDLING:
 * 
 * If a process holding a robust mutex dies, the next pthread_mutex_lock()
 * returns EOWNERDEAD. The new owner must call:
 * 
 *     pthread_mutex_consistent(&mutex);
 * 
 * to mark the mutex as consistent again, then unlock normally.
 * This prevents permanent deadlock when a mutex holder crashes.
 * 
 * Example:
 * 
 *     int rc = pthread_mutex_lock(&shared->mutex);
 *     if (rc == EOWNERDEAD) {
 *         // Previous owner died while holding the lock
 *         // Check/repair shared data state here
 *         pthread_mutex_consistent(&shared->mutex);
 *     }
 *     // ... use shared data ...
 *     pthread_mutex_unlock(&shared->mutex);
 */

Robust Mutexes

Process-Shared Condition Variables:

Condition variables work similarly:

pthread_condattr_t cond_attr;
pthread_condattr_init(&cond_attr);
pthread_condattr_setpshared(&cond_attr, PTHREAD_PROCESS_SHARED);

pthread_cond_t *cond = /* pointer into shared memory */;
pthread_cond_init(cond, &cond_attr);
pthread_condattr_destroy(&cond_attr);

// Usage:
pthread_mutex_lock(&mutex);
while (!condition) {
    pthread_cond_wait(&cond, &mutex);  // Atomically unlock and wait
}
// condition is true, mutex is held
pthread_mutex_unlock(&mutex);

// Signaling:
pthread_mutex_lock(&mutex);
condition = true;
pthread_cond_signal(&cond);  // Wake one waiter
pthread_mutex_unlock(&mutex);

Reader-Writer Synchronization Patterns

POSIX Reader-Writer Locks:

pthread_rwlockattr_t attr;
pthread_rwlockattr_init(&attr);
pthread_rwlockattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);

pthread_rwlock_t *rwlock = /* in shared memory */;
pthread_rwlock_init(rwlock, &attr);

// Reader:
pthread_rwlock_rdlock(rwlock);     // Multiple readers OK
// ... read shared data ...
pthread_rwlock_unlock(rwlock);

// Writer:
pthread_rwlock_wrlock(rwlock);     // Exclusive access
// ... modify shared data ...
pthread_rwlock_unlock(rwlock);

reader_writer_shm.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <pthread.h>
#include <unistd.h>
#include <time.h>
 
#define SHM_NAME "/rwlock_demo"
 
/**
 * Demonstrates reader-writer lock for shared memory.
 * Multiple readers can access simultaneously; writers get exclusive access.
 */
 
typedef struct {
    pthread_rwlock_t rwlock;
    int value;
    int read_count;
    int write_count;
} SharedState;
 
void reader_process(SharedState *state, int id, int iterations) {
    for (int i = 0; i < iterations; i++) {
        pthread_rwlock_rdlock(&state->rwlock);
        
        // Reading - multiple readers can be here simultaneously
        int val = state->value;
        __sync_fetch_and_add(&state->read_count, 1);
        
        // Simulate read work
        usleep(100);
        
        pthread_rwlock_unlock(&state->rwlock);
        usleep(rand() % 1000);
    }
    printf("Reader %d: completed %d reads\n", id, iterations);
}
 
void writer_process(SharedState *state, int id, int iterations) {
    for (int i = 0; i < iterations; i++) {
        pthread_rwlock_wrlock(&state->rwlock);
        
        // Writing - exclusive access
        state->value++;
        __sync_fetch_and_add(&state->write_count, 1);
        
        // Simulate write work
        usleep(500);
        
        pthread_rwlock_unlock(&state->rwlock);
        usleep(rand() % 2000);
    }
    printf("Writer %d: completed %d writes\n", id, iterations);
}
 
int main() {
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(shm_fd, sizeof(SharedState));
    
    SharedState *state = mmap(NULL, sizeof(SharedState),
                               PROT_READ | PROT_WRITE,
                               MAP_SHARED, shm_fd, 0);
    close(shm_fd);
    
    // Initialize rwlock for process sharing
    pthread_rwlockattr_t attr;
    pthread_rwlockattr_init(&attr);
    pthread_rwlockattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
    pthread_rwlock_init(&state->rwlock, &attr);
    pthread_rwlockattr_destroy(&attr);
    
    state->value = 0;
    state->read_count = 0;
    state->write_count = 0;
    
    srand(time(NULL));
    
    printf("Starting reader-writer demo...\n");
    printf("Spawning 3 readers and 1 writer\n\n");
    
    pid_t pids[4];
    
    // Fork 3 readers
    for (int i = 0; i < 3; i++) {
        pids[i] = fork();
        if (pids[i] == 0) {
            srand(time(NULL) + i);
            reader_process(state, i + 1, 20);
            munmap(state, sizeof(SharedState));
            exit(0);
        }
    }
    
    // Fork 1 writer
    pids[3] = fork();
    if (pids[3] == 0) {
        srand(time(NULL) + 100);
        writer_process(state, 1, 10);
        munmap(state, sizeof(SharedState));
        exit(0);
    }
    
    // Wait for all children
    for (int i = 0; i < 4; i++) {
        waitpid(pids[i], NULL, 0);
    }
    
    printf("\n--- Results ---\n");
    printf("Final value: %d\n", state->value);
    printf("Total reads: %d\n", state->read_count);
    printf("Total writes: %d\n", state->write_count);
    
    pthread_rwlock_destroy(&state->rwlock);
    munmap(state, sizeof(SharedState));
    shm_unlink(SHM_NAME);
    
    return 0;
}

When to Use Reader-Writer Locks

•Good fit: Read-heavy workloads (e.g., configuration data read by many workers, updated occasionally)
•Good fit: Large shared data structures where parallel reads provide significant speedup
•Poor fit: Write-heavy workloads (writers block all readers; use regular mutex instead)
•Poor fit: Very short critical sections (rwlock overhead may exceed benefit)
•Be aware: Default policy may starve writers if readers continuously arrive

Synchronization Best Practices

Essential Practices

•Minimize critical section size — Hold locks for the shortest time possible. Do preparation work outside the lock.
•Use RAII patterns — In C++, use lock guards (std::scoped_lock) to ensure locks are released even if exceptions occur.
•Avoid nested locks — If you must, always acquire locks in a consistent order to prevent deadlock.
•Use robust mutexes — For shared memory, PTHREAD_MUTEX_ROBUST prevents permanent deadlock if a holder crashes.
•Initialize before fork — Initialize synchronization primitives before fork() to avoid double-initialization.
•Consider timeout variants — sem_timedwait(), pthread_mutex_timedlock() prevent infinite waits.
•Document lock ordering — If multiple locks exist, document and enforce acquisition order.
•Test under load — Race conditions often only appear under high concurrency. Stress test thoroughly.

The ABA Problem

Summary: Synchronization Requirements

We've explored why synchronization is mandatory for shared memory and the mechanisms available to achieve it safely. Let's consolidate the key takeaways:

Key Takeaways

•Shared memory has no built-in synchronization — The kernel maps memory but provides no coordination. You must add it.
•Race conditions cause lost updates, torn reads, and inconsistent state — Bugs may be intermittent and extremely difficult to diagnose.
•Memory ordering adds another layer of complexity — Compilers and CPUs reorder operations. Memory barriers enforce ordering.
•Atomics work for single values but have limitations — Ensure they're lock-free for cross-process use. Not suitable for complex structures.
•POSIX semaphores are robust for IPC — Named semaphores or embedded semaphores with pshared=1 work reliably.
•Process-shared mutexes provide familiar semantics — Set PTHREAD_PROCESS_SHARED attribute. Prefer robust mutexes.
•Reader-writer locks optimize read-heavy workloads — Multiple readers OK; writers get exclusive access.

What's Next:

Page Complete

3 / 5