Operating SystemsSemaphores & Classic Problems

Other Classic Synchronization Problems

LevelIntermediate

Duration90 mins

TopicSemaphores & Classic Problems

5 / 5

Solution Strategies for Synchronization Problems

From Problems to Solutions: A Systematic Approach

We've explored classic problems—Sleeping Barber, Cigarette Smokers, bounded buffer variations—and identified recurring patterns. Now we synthesize everything into a systematic methodology for solving any synchronization problem.

Expert practitioners don't just know solutions to classic problems; they have strategies that work on novel problems they've never seen before. This page distills those strategies into a framework you can apply when facing unfamiliar synchronization challenges.

Think of this as the culmination of our study: not just what solutions exist, but how to derive solutions from first principles.

Learning Objectives

By the end of this page, you will:

• Apply a five-phase methodology for solving synchronization problems • Use invariants to design and verify correct solutions • Identify and resolve common issues: deadlock, starvation, race conditions • Choose between semaphores, monitors, and other primitives appropriately • Validate solutions through systematic testing and reasoning

The Five-Phase Methodology

Every synchronization problem can be approached through five phases:

Phase 1: Understand and Model

Precisely define the problem (who are the actors? what resources?)
Identify all constraints (exclusion, ordering, capacity, fairness)
Draw diagrams if helpful (state machines, resource graphs)

Phase 2: Identify Properties

State the safety properties (what must never happen)
State the liveness properties (what must eventually happen)
State any fairness requirements (bounded waiting, FIFO order)

Phase 3: Design with Patterns

Recognize which fundamental patterns apply
Compose patterns into a solution structure
Choose synchronization primitives (semaphores, monitors, etc.)

Phase 4: Verify Correctness

Prove safety properties (usually by invariants)
Prove liveness properties (progress arguments)
Check for deadlock, starvation, and race conditions

Phase 5: Implement and Test

Translate the design into code
Test with stress testing, varied interleavings
Use tools: race detectors, model checkers when available

Phase-by-Phase Summary
Phase	Key Question	Output
Understand	What exactly is the problem?	Precise specification, constraints list
Properties	What must be true?	Safety, liveness, fairness requirements
Design	How to realize the properties?	Pattern-based solution architecture
Verify	Is the solution correct?	Proofs or arguments for each property
Implement	Does it work in practice?	Working code, test results

Don't Skip Phase 2

Many bugs stem from unclear requirements. Before writing any code, explicitly list safety properties (mutex, capacity limits, etc.) and liveness properties (deadlock-freedom, eventual progress). These become your verification checklist in Phase 4.

Phase 1: Understanding the Problem Deeply

Correct solutions begin with correct understanding. Many wrong solutions arise from incomplete problem analysis.

Technique 1: Actor-Resource Decomposition

Identify all actors (threads/processes) and resources:

Who are the active entities that execute code?
What passive resources do they access?
What are the access patterns (read, write, both)?
Are actors symmetric (identical) or asymmetric (different roles)?

actor_resource_analysis.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
## Example: Sleeping Barber - Actor-Resource Decomposition
 
### Actors:
1. Barber (1 instance, service provider)
2. Customers (N instances, service consumers)
 
### Resources:
1. Barber chair (exclusive, 1 instance)
2. Waiting chairs (shared, N instances)
3. Barber's attention (exclusive, 1 instance)
 
### Access Patterns:
- Customer accesses: waiting chairs (claim/release), barber chair (claim/release)
- Barber accesses: barber chair (service), waiting chairs (check count)
 
### Symmetry:
- Customers are symmetric (all behave identically)
- Barber is asymmetric (different role from customers)
 
### Insight from Analysis:
This structure suggests producer-consumer (customers produce work, 
barber consumes) with bounded waiting capacity and asymmetric roles.

Technique 2: State Machine Modeling

Draw the states each actor can be in, and transitions between states:

What event causes each transition?
Are transitions atomic or interruptible?
What invariants hold in each state?

Converting Mermaid diagram...

Technique 3: Constraint Cataloging

Explicitly list every constraint. Missing a constraint leads to incorrect solutions:

Constraint Types to Consider

•Exclusion constraints: At most K actors in region (mutex: K=1)
•Ordering constraints: Action A must precede action B
•Capacity constraints: Resource has limited slots/capacity
•Matching constraints: Actors need specific combinations of resources
•Timing constraints: Actions must complete within deadlines
•Fairness constraints: No actor should wait indefinitely

Phase 2: Identifying Correctness Properties

A solution is correct if it satisfies all required properties. Properties fall into three categories:

Safety Properties ("Nothing bad happens")

These are conditions that must hold at all times:

Mutual exclusion is never violated
Buffer never exceeds capacity
Invariants are preserved

Safety violations are typically detected as assertion failures or corrupted state.

Liveness Properties ("Something good eventually happens")

These are conditions about eventual progress:

If a thread requests access, it eventually gets it
The system never deadlocks
Work eventually completes

Liveness failures manifest as hangs or starved threads.

Fairness Properties ("Good things happen to everyone")

These are about equitable treatment:

Bounded waiting: every request is satisfied within N operations
FIFO ordering: requests are served in arrival order
Proportional fairness: each class gets its weighted share

Property Examples Across Classic Problems
Problem	Safety Property	Liveness Property	Fairness Property
Producer-Consumer	Buffer never overflows or underflows	Producer/consumer eventually proceeds	No indefinite blocking
Readers-Writers	No concurrent reader-writer access	Writers eventually write	Writers not starved by readers
Dining Philosophers	No two adjacent philosophers eat simultaneously	Every hungry philosopher eventually eats	No philosopher starves
Sleeping Barber	One customer served at a time	Waiting customers eventually served	FIFO preferred but not required

property_specification.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/*
 * Example: Formal Property Specification for Bounded Buffer
 * 
 * These properties guide design and verification.
 */
 
/*
 * SAFETY PROPERTIES
 * ==================
 * 
 * S1: 0 <= count <= BUFFER_SIZE (capacity invariant)
 *     At all times, buffer contents are within bounds.
 * 
 * S2: No concurrent modifications to the same slot
 *     Each buffer slot is accessed by at most one thread at any instant.
 * 
 * S3: Items are consumed in order of production within same producer
 *     (But total order may interleave across producers)
 */
 
// Can express in code as assertions:
void buffer_invariant_check(Buffer* b) {
    assert(b->count >= 0);
    assert(b->count <= BUFFER_SIZE);
    assert(b->head >= 0 && b->head < BUFFER_SIZE);
    assert(b->tail >= 0 && b->tail < BUFFER_SIZE);
}
 
/*
 * LIVENESS PROPERTIES
 * ====================
 * 
 * L1: Deadlock-free - The system always makes progress
 * L2: If a producer calls produce(), it eventually returns
 * L3: If a consumer calls consume(), it eventually returns
 *     (Assuming finite pauses between operations)
 * 
 * FAIRNESS PROPERTIES
 * ===================
 * 
 * F1: Bounded waiting - No producer/consumer waits indefinitely
 *     while others make progress
 * F2: Strong fairness - If a producer/consumer is continuously
 *     waiting, it is eventually served
 */
 
// Liveness is harder to test; often proven by argument or model checking

Safety vs. Liveness Trade-offs

Safety and liveness can conflict. The trivial way to ensure safety ("nothing bad happens") is to do nothing at all—but that violates liveness. A correct solution must balance both.

Example: Preventing deadlock (liveness) might require breaking strict FIFO order (fairness). Understand which properties are mandatory vs. desirable.

Phase 3: Designing the Solution

With problems understood and properties defined, we design the solution by composing patterns and choosing primitives.

Step 3.1: Match Constraints to Patterns

For each constraint, identify the appropriate pattern:

Constraint Type	Pattern	Implementation
Exclusive access	Mutex	Binary semaphore / pthread_mutex
Limited concurrent access	Multiplex	Counting semaphore
Event notification	Signaling	Semaphore initial 0
Mutual waiting	Rendezvous	Two semaphores
All-before-any	Barrier	Counter + semaphore/turnstile
Matching combinations	Pusher	Intermediate coordinator

Step 3.2: Choose Primitives

Different primitives have different trade-offs:

Use Semaphores When

•Simple signaling (event happened)
•Counting resources (N permits)
•Signal persistence matters
•Porting legacy code
•Kernel-level programming

Use Monitors When

•Complex conditions (wait until X AND Y)
•Need while-loop checking
•Object-oriented design
•Java, Python, or similar languages
•Encapsulation is important

Step 3.3: Assemble the Solution

Combine patterns into a complete solution:

Define shared state (counters, flags, queues)
Define synchronization variables (semaphores, mutexes, conditions)
Write each actor's code using the patterns
Establish invariants that relate shared state to semaphore values

solution_assembly_example.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
/*
 * Example: Assembling a Readers-Writers Solution
 * 
 * Constraints identified:
 * - C1: Multiple readers can read simultaneously
 * - C2: Writers need exclusive access (no readers or other writers)
 * - C3: Writers should not starve (fairness)
 * 
 * Patterns matched:
 * - Lightswitch for readers (first locks, last unlocks)
 * - Mutex for writer exclusion
 * - Turnstile for writer priority
 */
 
/* Step 1: Define shared state */
int readers = 0;           // Count of active readers
 
/* Step 2: Define synchronization variables */
pthread_mutex_t mutex;     // Protects readers count
sem_t room_empty;          // 1 when no readers/writers, 0 otherwise
sem_t turnstile;           // Writer priority mechanism
 
/* Step 3: Initialize */
void init() {
    pthread_mutex_init(&mutex, NULL);
    sem_init(&room_empty, 0, 1);   // Room starts empty
    sem_init(&turnstile, 0, 1);    // Turnstile open
}
 
/* Step 4: Reader code using Lightswitch pattern */
void reader() {
    // Turnstile: let waiting writers through first
    sem_wait(&turnstile);
    sem_post(&turnstile);
    
    // Lightswitch: first reader locks out writers
    pthread_mutex_lock(&mutex);
    readers++;
    if (readers == 1) {
        sem_wait(&room_empty);  // First reader locks room
    }
    pthread_mutex_unlock(&mutex);
    
    // READ (critical section)
    read_data();
    
    // Lightswitch: last reader allows writers
    pthread_mutex_lock(&mutex);
    readers--;
    if (readers == 0) {
        sem_post(&room_empty);  // Last reader unlocks room
    }
    pthread_mutex_unlock(&mutex);
}
 
/* Step 5: Writer code */
void writer() {
    // Lock turnstile to block new readers
    sem_wait(&turnstile);
    
    // Wait for room to be empty
    sem_wait(&room_empty);
    
    // WRITE (critical section)
    write_data();
    
    // Release room and turnstile
    sem_post(&room_empty);
    sem_post(&turnstile);
}
 
/* 
 * Invariants:
 * - room_empty == 1 iff (readers == 0 AND no writer active)
 * - readers >= 0 always
 * - At most one writer holds room_empty at any time
 */

Phase 4: Verifying Correctness

A solution that looks correct may still have subtle bugs. Verification gives confidence that the solution actually satisfies its properties.

Technique 1: Invariant Reasoning

Define invariants—conditions that hold at all times—and prove they are preserved by every operation:

Establish: Show the invariant holds initially
Preserve: For each operation, show that if the invariant holds before, it holds after
Use: Derive safety properties from the invariant

invariant_reasoning.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/*
 * Example: Proving Bounded Buffer Correctness via Invariants
 */
 
/*
 * INVARIANT I1: 
 *   empty_slots.value + full_slots.value + items_in_flight == BUFFER_SIZE
 * 
 * Where items_in_flight = items currently being written (between
 * sem_wait(empty) and sem_post(full))
 * 
 * PROOF OF I1:
 * 
 * Initial state:
 *   empty_slots = BUFFER_SIZE, full_slots = 0, in_flight = 0
 *   Sum = BUFFER_SIZE ✓
 * 
 * Producer before insert (after sem_wait(empty)):
 *   empty_slots decremented, in_flight incremented
 *   Sum unchanged ✓
 * 
 * Producer after insert (after sem_post(full)):
 *   full_slots incremented, in_flight decremented
 *   Sum unchanged ✓
 * 
 * Consumer operations: symmetric argument ✓
 * 
 * Therefore I1 is preserved by all operations. ∎
 */
 
/*
 * From I1, derive SAFETY PROPERTY S1: 0 <= buffer_count <= BUFFER_SIZE
 * 
 * buffer_count == full_slots.value + items_being_consumed
 * 
 * Since semaphore values are non-negative and <= BUFFER_SIZE (by I1),
 * buffer_count is bounded. ∎
 */

Technique 2: Interleaving Analysis

Consider all possible interleavings of operations. For N threads each with M operations, there are potentially (N×M)! / (M!)^N interleavings. Practically, we focus on critical interleavings:

Identify the "danger zones" where races could occur
Enumerate the orderings of those specific operations
Verify each ordering maintains correctness

interleaving_analysis.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## Interleaving Analysis: Rendezvous Deadlock Check
 
### Operations to analyze:
Thread A: [A1: post(a_arrived)] [A2: wait(b_arrived)]
Thread B: [B1: post(b_arrived)] [B2: wait(a_arrived)]
 
### Case 1: A1 → A2 → B1 → B2
- A posts a_arrived (a_arrived = 1)
- A waits on b_arrived (A blocks, b_arrived = 0)
- B posts b_arrived (b_arrived = 1, A unblocks)
- B waits on a_arrived (B continues, a_arrived = 1)
Result: Both complete ✓
 
### Case 2: A1 → B1 → A2 → B2
- A posts a_arrived (a_arrived = 1)
- B posts b_arrived (b_arrived = 1)
- A waits on b_arrived (A continues, b_arrived was 1)
- B waits on a_arrived (B continues, a_arrived was 1)
Result: Both complete ✓
 
### Case 3: B1 → B2 → A1 → A2
- B posts b_arrived (b_arrived = 1)
- B waits on a_arrived (B blocks, a_arrived = 0)
- A posts a_arrived (a_arrived = 1, B unblocks)
- A waits on b_arrived (A continues, b_arrived = 1)
Result: Both complete ✓
 
### All orderings checked: No deadlock possible ✓

Technique 3: Model Checking

For complex solutions, automated tools can explore all interleavings:

SPIN model checker with Promela language
TLA+ specification and model checking
CHESS for .NET concurrency testing
ThreadSanitizer for runtime race detection

When to Use Formal Methods

For most problems: Hand-written proofs and careful testing are sufficient. For safety-critical systems: Consider formal verification. The investment is worthwhile when bugs have severe consequences (medical devices, avionics, financial systems).

Common Issues and How to Fix Them

When solutions don't work, these are the most common culprits:

Issue 1: Deadlock

Symptoms: All threads blocked, no progress

Causes:

Wait before signal in rendezvous
Circular wait in resource acquisition
Holding mutex across blocking wait

Fixes:

Always signal before wait in rendezvous
Use resource ordering to break circular wait
Release mutex before blocking waits

deadlock_fix.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// DEADLOCK: Circular wait
void transfer(Account* from, Account* to, int amount) {
    lock(&from->mutex);
    lock(&to->mutex);    // If another thread does opposite order: deadlock!
    // transfer...
    unlock(&to->mutex);
    unlock(&from->mutex);
}
 
// FIX: Resource ordering by account ID
void transfer_fixed(Account* from, Account* to, int amount) {
    Account* first = (from->id < to->id) ? from : to;
    Account* second = (from->id < to->id) ? to : from;
    
    lock(&first->mutex);     // Always acquire lower ID first
    lock(&second->mutex);    // Then higher ID
    // transfer...
    unlock(&second->mutex);
    unlock(&first->mutex);
}

Issue 2: Race Condition

Symptoms: Intermittent incorrect results, depends on timing

Causes:

Non-atomic check-then-act
Incomplete critical section protection
Forgotten mutex around shared variable access

Fixes:

Extend critical section to cover related operations
Use atomic operations where available
Audit all shared variable accesses

race_condition_fix.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// RACE: Non-atomic check-then-act
if (counter < MAX) {        // RACE: another thread might
    counter++;              // increment between check and act
}
 
// FIX 1: Extend critical section
mutex_lock(&mutex);
if (counter < MAX) {
    counter++;
}
mutex_unlock(&mutex);
 
// FIX 2: Atomic compare-and-swap
int old;
do {
    old = atomic_load(&counter);
    if (old >= MAX) break;
} while (!atomic_compare_exchange_weak(&counter, &old, old + 1));

Issue 3: Starvation

Symptoms: Some threads never make progress while others do

Causes:

Unfair scheduling or semaphore ordering
Strict priority without aging
Writers vs readers in RW locks without writer preference

Fixes:

Use FIFO semaphores if available
Implement priority aging
Add fairness mechanisms (turnstiles, quotas)

Debugging Checklist

•Are all shared variables protected by synchronization?
•Is the mutex released on ALL code paths (including error paths)?
•Are semaphores initialized to correct values?
•Is the signal-wait order correct for rendezvous?
•Is the mutex released BEFORE blocking waits?
•Are resource acquisitions in consistent order?
•Do all integer counters have correct bounds checking?
•Are spurious wakeups handled (while loops, not if)?

Testing Concurrent Code

Concurrent bugs are notoriously difficult to reproduce. A solution that passes 1000 test runs might fail on run 1001. Effective testing requires special techniques.

Strategy 1: Stress Testing

Run many threads for extended periods to expose races:

More threads = more interleavings explored
Longer runs = rare races more likely to manifest
Vary timing (add random delays) to explore different schedules

stress_test.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
/*
 * Stress Test Framework for Concurrent Code
 */
 
#define NUM_THREADS 100
#define OPERATIONS_PER_THREAD 10000
#define TEST_ITERATIONS 1000
 
typedef struct {
    atomic_int produced_count;
    atomic_int consumed_count;
    atomic_int checksum_produced;
    atomic_int checksum_consumed;
} TestStats;
 
void* producer_test(void* arg) {
    TestStats* stats = (TestStats*)arg;
    
    for (int i = 0; i < OPERATIONS_PER_THREAD; i++) {
        int value = random();
        produce(value);
        
        atomic_fetch_add(&stats->produced_count, 1);
        atomic_fetch_add(&stats->checksum_produced, value);
        
        // Random delay to vary interleaving
        if (random() % 10 == 0) {
            usleep(random() % 100);
        }
    }
    return NULL;
}
 
void* consumer_test(void* arg) {
    TestStats* stats = (TestStats*)arg;
    
    for (int i = 0; i < OPERATIONS_PER_THREAD; i++) {
        int value = consume();
        
        atomic_fetch_add(&stats->consumed_count, 1);
        atomic_fetch_add(&stats->checksum_consumed, value);
        
        if (random() % 10 == 0) {
            usleep(random() % 100);
        }
    }
    return NULL;
}
 
void run_stress_test() {
    for (int iter = 0; iter < TEST_ITERATIONS; iter++) {
        TestStats stats = {0};
        pthread_t producers[NUM_THREADS], consumers[NUM_THREADS];
        
        // Launch threads
        for (int i = 0; i < NUM_THREADS; i++) {
            pthread_create(&producers[i], NULL, producer_test, &stats);
            pthread_create(&consumers[i], NULL, consumer_test, &stats);
        }
        
        // Wait for completion
        for (int i = 0; i < NUM_THREADS; i++) {
            pthread_join(producers[i], NULL);
            pthread_join(consumers[i], NULL);
        }
        
        // Verify invariants
        assert(stats.produced_count == stats.consumed_count);
        assert(stats.checksum_produced == stats.checksum_consumed);
        
        printf("Iteration %d: PASS\n", iter);
    }
}

Strategy 2: Dynamic Analysis Tools

ThreadSanitizer (TSan): Detects data races at runtime
Helgrind: Valgrind tool for threading errors
Intel Inspector: Commercial race detector

These tools instrument memory accesses and track happens-before relationships to detect potential races even if they don't manifest in a particular run.

Strategy 3: Deterministic Testing

Use frameworks that control thread scheduling to systematically explore interleavings:

CHESS (Microsoft): Systematically explores schedules
rr (Mozilla): Record and replay for debugging
Custom test harnesses with explicit yield points

Testing is Necessary but Not Sufficient

Testing can show the presence of bugs but not their absence. A passing test suite gives some confidence, but for critical code, combine testing with:

• Code review by concurrency experts • Static analysis tools • Formal verification for the most critical paths • Runtime defensive checks (assertions, invariant monitoring)

Practical Decision Framework

When facing a real synchronization challenge, use this decision tree:

1. Can you avoid sharing?

Thread-local storage, message passing, or immutable data eliminate most synchronization needs
The best synchronization is no synchronization

2. Is it a well-known pattern?

Producer-Consumer → Use language/library queue (Java's BlockingQueue, Python's queue.Queue)
Readers-Writers → Use language/library RWLock
Don't reinvent; use battle-tested implementations

3. Is simple mutual exclusion enough?

If you just need exclusive access, use a mutex
Resist over-engineering; complexity breeds bugs

4. Do you need counting or signaling?

Counting resources → Semaphore
Event notification → Semaphore (init 0) or condition variable

5. Do you need complex conditions?

"Wait until X AND Y" → Monitor with condition variable
Semaphores struggle with compound conditions

Technology Choice Quick Reference
Scenario	First Choice	Why
Simple exclusive access	pthread_mutex / language mutex	Simplest, well-understood
Counting resources	Semaphore	Designed for counting
Wait for condition	Monitor + condition variable	Natural for while loops
Producer-consumer	Library queue class	Already implemented correctly
Readers-writers	Library RWLock	Complex to implement correctly
High-performance SPSC	Lock-free ring buffer	When profiling shows need
Complex coordination	Channels / actors	Higher-level abstraction

Prefer Libraries Over Custom Implementations

For production code, use standard library or well-tested third-party implementations whenever possible. Custom synchronization code is a source of subtle bugs. Implement yourself only when:

• Learning (like in this course!) • Specific requirements not met by libraries • Performance-critical and profiled to matter

Summary and Key Takeaways

We've developed a comprehensive methodology for solving synchronization problems. This systematic approach, combined with pattern recognition from previous sections, equips you to tackle novel challenges.

Key Takeaways

•Follow the five phases: Understand → Properties → Design → Verify → Implement. Skipping phases causes bugs.
•Explicitly state properties: Write down safety, liveness, and fairness requirements before coding.
•Match constraints to patterns: Each constraint maps to a known pattern; compose patterns for full solutions.
•Prove correctness via invariants: Define what's always true; verify operations preserve it.
•Know common issues: Deadlock, race conditions, and starvation have characteristic symptoms and fixes.
•Test rigorously but don't rely solely on testing: Stress tests, tools, and reasoning together provide confidence.
•Prefer standard libraries: Well-tested implementations beat custom code for production use.

Module Complete

Congratulations! You've completed Module 6: Other Classic Problems. You've explored the Sleeping Barber, Cigarette Smokers, bounded buffer variations, common patterns, and now solution strategies. You possess a comprehensive toolkit for concurrent programming.

What's next? Apply these concepts to real systems. The true mastery comes from practice—implementing, debugging, and refining concurrent code in production contexts.

5 / 5

Loading learning content...

Operating SystemsSemaphores & Classic Problems

Other Classic Synchronization Problems

LevelIntermediate

Duration90 mins

TopicSemaphores & Classic Problems

5 / 5

Solution Strategies for Synchronization Problems

From Problems to Solutions: A Systematic Approach

Think of this as the culmination of our study: not just what solutions exist, but how to derive solutions from first principles.

Learning Objectives

By the end of this page, you will:

The Five-Phase Methodology

Every synchronization problem can be approached through five phases:

Phase 1: Understand and Model

Precisely define the problem (who are the actors? what resources?)
Identify all constraints (exclusion, ordering, capacity, fairness)
Draw diagrams if helpful (state machines, resource graphs)

Phase 2: Identify Properties

State the safety properties (what must never happen)
State the liveness properties (what must eventually happen)
State any fairness requirements (bounded waiting, FIFO order)

Phase 3: Design with Patterns

Recognize which fundamental patterns apply
Compose patterns into a solution structure
Choose synchronization primitives (semaphores, monitors, etc.)

Phase 4: Verify Correctness

Prove safety properties (usually by invariants)
Prove liveness properties (progress arguments)
Check for deadlock, starvation, and race conditions

Phase 5: Implement and Test

Translate the design into code
Test with stress testing, varied interleavings
Use tools: race detectors, model checkers when available

Phase-by-Phase Summary
Phase	Key Question	Output
Understand	What exactly is the problem?	Precise specification, constraints list
Properties	What must be true?	Safety, liveness, fairness requirements
Design	How to realize the properties?	Pattern-based solution architecture
Verify	Is the solution correct?	Proofs or arguments for each property
Implement	Does it work in practice?	Working code, test results

Don't Skip Phase 2

Phase 1: Understanding the Problem Deeply

Correct solutions begin with correct understanding. Many wrong solutions arise from incomplete problem analysis.

Technique 1: Actor-Resource Decomposition

Identify all actors (threads/processes) and resources:

Who are the active entities that execute code?
What passive resources do they access?
What are the access patterns (read, write, both)?
Are actors symmetric (identical) or asymmetric (different roles)?

actor_resource_analysis.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
## Example: Sleeping Barber - Actor-Resource Decomposition
 
### Actors:
1. Barber (1 instance, service provider)
2. Customers (N instances, service consumers)
 
### Resources:
1. Barber chair (exclusive, 1 instance)
2. Waiting chairs (shared, N instances)
3. Barber's attention (exclusive, 1 instance)
 
### Access Patterns:
- Customer accesses: waiting chairs (claim/release), barber chair (claim/release)
- Barber accesses: barber chair (service), waiting chairs (check count)
 
### Symmetry:
- Customers are symmetric (all behave identically)
- Barber is asymmetric (different role from customers)
 
### Insight from Analysis:
This structure suggests producer-consumer (customers produce work, 
barber consumes) with bounded waiting capacity and asymmetric roles.

Technique 2: State Machine Modeling

Draw the states each actor can be in, and transitions between states:

What event causes each transition?
Are transitions atomic or interruptible?
What invariants hold in each state?

Converting Mermaid diagram...

Technique 3: Constraint Cataloging

Explicitly list every constraint. Missing a constraint leads to incorrect solutions:

Constraint Types to Consider

•Exclusion constraints: At most K actors in region (mutex: K=1)
•Ordering constraints: Action A must precede action B
•Capacity constraints: Resource has limited slots/capacity
•Matching constraints: Actors need specific combinations of resources
•Timing constraints: Actions must complete within deadlines
•Fairness constraints: No actor should wait indefinitely

Phase 2: Identifying Correctness Properties

A solution is correct if it satisfies all required properties. Properties fall into three categories:

Safety Properties ("Nothing bad happens")

These are conditions that must hold at all times:

Mutual exclusion is never violated
Buffer never exceeds capacity
Invariants are preserved

Safety violations are typically detected as assertion failures or corrupted state.

Liveness Properties ("Something good eventually happens")

These are conditions about eventual progress:

If a thread requests access, it eventually gets it
The system never deadlocks
Work eventually completes

Liveness failures manifest as hangs or starved threads.

Fairness Properties ("Good things happen to everyone")

These are about equitable treatment:

Bounded waiting: every request is satisfied within N operations
FIFO ordering: requests are served in arrival order
Proportional fairness: each class gets its weighted share

Property Examples Across Classic Problems
Problem	Safety Property	Liveness Property	Fairness Property
Producer-Consumer	Buffer never overflows or underflows	Producer/consumer eventually proceeds	No indefinite blocking
Readers-Writers	No concurrent reader-writer access	Writers eventually write	Writers not starved by readers
Dining Philosophers	No two adjacent philosophers eat simultaneously	Every hungry philosopher eventually eats	No philosopher starves
Sleeping Barber	One customer served at a time	Waiting customers eventually served	FIFO preferred but not required

property_specification.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/*
 * Example: Formal Property Specification for Bounded Buffer
 * 
 * These properties guide design and verification.
 */
 
/*
 * SAFETY PROPERTIES
 * ==================
 * 
 * S1: 0 <= count <= BUFFER_SIZE (capacity invariant)
 *     At all times, buffer contents are within bounds.
 * 
 * S2: No concurrent modifications to the same slot
 *     Each buffer slot is accessed by at most one thread at any instant.
 * 
 * S3: Items are consumed in order of production within same producer
 *     (But total order may interleave across producers)
 */
 
// Can express in code as assertions:
void buffer_invariant_check(Buffer* b) {
    assert(b->count >= 0);
    assert(b->count <= BUFFER_SIZE);
    assert(b->head >= 0 && b->head < BUFFER_SIZE);
    assert(b->tail >= 0 && b->tail < BUFFER_SIZE);
}
 
/*
 * LIVENESS PROPERTIES
 * ====================
 * 
 * L1: Deadlock-free - The system always makes progress
 * L2: If a producer calls produce(), it eventually returns
 * L3: If a consumer calls consume(), it eventually returns
 *     (Assuming finite pauses between operations)
 * 
 * FAIRNESS PROPERTIES
 * ===================
 * 
 * F1: Bounded waiting - No producer/consumer waits indefinitely
 *     while others make progress
 * F2: Strong fairness - If a producer/consumer is continuously
 *     waiting, it is eventually served
 */
 
// Liveness is harder to test; often proven by argument or model checking

Safety vs. Liveness Trade-offs

Safety and liveness can conflict. The trivial way to ensure safety ("nothing bad happens") is to do nothing at all—but that violates liveness. A correct solution must balance both.

Example: Preventing deadlock (liveness) might require breaking strict FIFO order (fairness). Understand which properties are mandatory vs. desirable.

Phase 3: Designing the Solution

With problems understood and properties defined, we design the solution by composing patterns and choosing primitives.

Step 3.1: Match Constraints to Patterns

For each constraint, identify the appropriate pattern:

Constraint Type	Pattern	Implementation
Exclusive access	Mutex	Binary semaphore / pthread_mutex
Limited concurrent access	Multiplex	Counting semaphore
Event notification	Signaling	Semaphore initial 0
Mutual waiting	Rendezvous	Two semaphores
All-before-any	Barrier	Counter + semaphore/turnstile
Matching combinations	Pusher	Intermediate coordinator

Step 3.2: Choose Primitives

Different primitives have different trade-offs:

Use Semaphores When

•Simple signaling (event happened)
•Counting resources (N permits)
•Signal persistence matters
•Porting legacy code
•Kernel-level programming

Use Monitors When

•Complex conditions (wait until X AND Y)
•Need while-loop checking
•Object-oriented design
•Java, Python, or similar languages
•Encapsulation is important

Step 3.3: Assemble the Solution

Combine patterns into a complete solution:

Define shared state (counters, flags, queues)
Define synchronization variables (semaphores, mutexes, conditions)
Write each actor's code using the patterns
Establish invariants that relate shared state to semaphore values

solution_assembly_example.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
/*
 * Example: Assembling a Readers-Writers Solution
 * 
 * Constraints identified:
 * - C1: Multiple readers can read simultaneously
 * - C2: Writers need exclusive access (no readers or other writers)
 * - C3: Writers should not starve (fairness)
 * 
 * Patterns matched:
 * - Lightswitch for readers (first locks, last unlocks)
 * - Mutex for writer exclusion
 * - Turnstile for writer priority
 */
 
/* Step 1: Define shared state */
int readers = 0;           // Count of active readers
 
/* Step 2: Define synchronization variables */
pthread_mutex_t mutex;     // Protects readers count
sem_t room_empty;          // 1 when no readers/writers, 0 otherwise
sem_t turnstile;           // Writer priority mechanism
 
/* Step 3: Initialize */
void init() {
    pthread_mutex_init(&mutex, NULL);
    sem_init(&room_empty, 0, 1);   // Room starts empty
    sem_init(&turnstile, 0, 1);    // Turnstile open
}
 
/* Step 4: Reader code using Lightswitch pattern */
void reader() {
    // Turnstile: let waiting writers through first
    sem_wait(&turnstile);
    sem_post(&turnstile);
    
    // Lightswitch: first reader locks out writers
    pthread_mutex_lock(&mutex);
    readers++;
    if (readers == 1) {
        sem_wait(&room_empty);  // First reader locks room
    }
    pthread_mutex_unlock(&mutex);
    
    // READ (critical section)
    read_data();
    
    // Lightswitch: last reader allows writers
    pthread_mutex_lock(&mutex);
    readers--;
    if (readers == 0) {
        sem_post(&room_empty);  // Last reader unlocks room
    }
    pthread_mutex_unlock(&mutex);
}
 
/* Step 5: Writer code */
void writer() {
    // Lock turnstile to block new readers
    sem_wait(&turnstile);
    
    // Wait for room to be empty
    sem_wait(&room_empty);
    
    // WRITE (critical section)
    write_data();
    
    // Release room and turnstile
    sem_post(&room_empty);
    sem_post(&turnstile);
}
 
/* 
 * Invariants:
 * - room_empty == 1 iff (readers == 0 AND no writer active)
 * - readers >= 0 always
 * - At most one writer holds room_empty at any time
 */

Phase 4: Verifying Correctness

A solution that looks correct may still have subtle bugs. Verification gives confidence that the solution actually satisfies its properties.

Technique 1: Invariant Reasoning

Define invariants—conditions that hold at all times—and prove they are preserved by every operation:

Establish: Show the invariant holds initially
Preserve: For each operation, show that if the invariant holds before, it holds after
Use: Derive safety properties from the invariant

invariant_reasoning.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/*
 * Example: Proving Bounded Buffer Correctness via Invariants
 */
 
/*
 * INVARIANT I1: 
 *   empty_slots.value + full_slots.value + items_in_flight == BUFFER_SIZE
 * 
 * Where items_in_flight = items currently being written (between
 * sem_wait(empty) and sem_post(full))
 * 
 * PROOF OF I1:
 * 
 * Initial state:
 *   empty_slots = BUFFER_SIZE, full_slots = 0, in_flight = 0
 *   Sum = BUFFER_SIZE ✓
 * 
 * Producer before insert (after sem_wait(empty)):
 *   empty_slots decremented, in_flight incremented
 *   Sum unchanged ✓
 * 
 * Producer after insert (after sem_post(full)):
 *   full_slots incremented, in_flight decremented
 *   Sum unchanged ✓
 * 
 * Consumer operations: symmetric argument ✓
 * 
 * Therefore I1 is preserved by all operations. ∎
 */
 
/*
 * From I1, derive SAFETY PROPERTY S1: 0 <= buffer_count <= BUFFER_SIZE
 * 
 * buffer_count == full_slots.value + items_being_consumed
 * 
 * Since semaphore values are non-negative and <= BUFFER_SIZE (by I1),
 * buffer_count is bounded. ∎
 */

Technique 2: Interleaving Analysis

Consider all possible interleavings of operations. For N threads each with M operations, there are potentially (N×M)! / (M!)^N interleavings. Practically, we focus on critical interleavings:

Identify the "danger zones" where races could occur
Enumerate the orderings of those specific operations
Verify each ordering maintains correctness

interleaving_analysis.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## Interleaving Analysis: Rendezvous Deadlock Check
 
### Operations to analyze:
Thread A: [A1: post(a_arrived)] [A2: wait(b_arrived)]
Thread B: [B1: post(b_arrived)] [B2: wait(a_arrived)]
 
### Case 1: A1 → A2 → B1 → B2
- A posts a_arrived (a_arrived = 1)
- A waits on b_arrived (A blocks, b_arrived = 0)
- B posts b_arrived (b_arrived = 1, A unblocks)
- B waits on a_arrived (B continues, a_arrived = 1)
Result: Both complete ✓
 
### Case 2: A1 → B1 → A2 → B2
- A posts a_arrived (a_arrived = 1)
- B posts b_arrived (b_arrived = 1)
- A waits on b_arrived (A continues, b_arrived was 1)
- B waits on a_arrived (B continues, a_arrived was 1)
Result: Both complete ✓
 
### Case 3: B1 → B2 → A1 → A2
- B posts b_arrived (b_arrived = 1)
- B waits on a_arrived (B blocks, a_arrived = 0)
- A posts a_arrived (a_arrived = 1, B unblocks)
- A waits on b_arrived (A continues, b_arrived = 1)
Result: Both complete ✓
 
### All orderings checked: No deadlock possible ✓

Technique 3: Model Checking

For complex solutions, automated tools can explore all interleavings:

SPIN model checker with Promela language
TLA+ specification and model checking
CHESS for .NET concurrency testing
ThreadSanitizer for runtime race detection

When to Use Formal Methods

Common Issues and How to Fix Them

When solutions don't work, these are the most common culprits:

Issue 1: Deadlock

Symptoms: All threads blocked, no progress

Causes:

Wait before signal in rendezvous
Circular wait in resource acquisition
Holding mutex across blocking wait

Fixes:

Always signal before wait in rendezvous
Use resource ordering to break circular wait
Release mutex before blocking waits

deadlock_fix.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// DEADLOCK: Circular wait
void transfer(Account* from, Account* to, int amount) {
    lock(&from->mutex);
    lock(&to->mutex);    // If another thread does opposite order: deadlock!
    // transfer...
    unlock(&to->mutex);
    unlock(&from->mutex);
}
 
// FIX: Resource ordering by account ID
void transfer_fixed(Account* from, Account* to, int amount) {
    Account* first = (from->id < to->id) ? from : to;
    Account* second = (from->id < to->id) ? to : from;
    
    lock(&first->mutex);     // Always acquire lower ID first
    lock(&second->mutex);    // Then higher ID
    // transfer...
    unlock(&second->mutex);
    unlock(&first->mutex);
}

Issue 2: Race Condition

Symptoms: Intermittent incorrect results, depends on timing

Causes:

Non-atomic check-then-act
Incomplete critical section protection
Forgotten mutex around shared variable access

Fixes:

Extend critical section to cover related operations
Use atomic operations where available
Audit all shared variable accesses

race_condition_fix.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// RACE: Non-atomic check-then-act
if (counter < MAX) {        // RACE: another thread might
    counter++;              // increment between check and act
}
 
// FIX 1: Extend critical section
mutex_lock(&mutex);
if (counter < MAX) {
    counter++;
}
mutex_unlock(&mutex);
 
// FIX 2: Atomic compare-and-swap
int old;
do {
    old = atomic_load(&counter);
    if (old >= MAX) break;
} while (!atomic_compare_exchange_weak(&counter, &old, old + 1));

Issue 3: Starvation

Symptoms: Some threads never make progress while others do

Causes:

Unfair scheduling or semaphore ordering
Strict priority without aging
Writers vs readers in RW locks without writer preference

Fixes:

Use FIFO semaphores if available
Implement priority aging
Add fairness mechanisms (turnstiles, quotas)

Debugging Checklist

•Are all shared variables protected by synchronization?
•Is the mutex released on ALL code paths (including error paths)?
•Are semaphores initialized to correct values?
•Is the signal-wait order correct for rendezvous?
•Is the mutex released BEFORE blocking waits?
•Are resource acquisitions in consistent order?
•Do all integer counters have correct bounds checking?
•Are spurious wakeups handled (while loops, not if)?

Testing Concurrent Code

Concurrent bugs are notoriously difficult to reproduce. A solution that passes 1000 test runs might fail on run 1001. Effective testing requires special techniques.

Strategy 1: Stress Testing

Run many threads for extended periods to expose races:

More threads = more interleavings explored
Longer runs = rare races more likely to manifest
Vary timing (add random delays) to explore different schedules

stress_test.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
/*
 * Stress Test Framework for Concurrent Code
 */
 
#define NUM_THREADS 100
#define OPERATIONS_PER_THREAD 10000
#define TEST_ITERATIONS 1000
 
typedef struct {
    atomic_int produced_count;
    atomic_int consumed_count;
    atomic_int checksum_produced;
    atomic_int checksum_consumed;
} TestStats;
 
void* producer_test(void* arg) {
    TestStats* stats = (TestStats*)arg;
    
    for (int i = 0; i < OPERATIONS_PER_THREAD; i++) {
        int value = random();
        produce(value);
        
        atomic_fetch_add(&stats->produced_count, 1);
        atomic_fetch_add(&stats->checksum_produced, value);
        
        // Random delay to vary interleaving
        if (random() % 10 == 0) {
            usleep(random() % 100);
        }
    }
    return NULL;
}
 
void* consumer_test(void* arg) {
    TestStats* stats = (TestStats*)arg;
    
    for (int i = 0; i < OPERATIONS_PER_THREAD; i++) {
        int value = consume();
        
        atomic_fetch_add(&stats->consumed_count, 1);
        atomic_fetch_add(&stats->checksum_consumed, value);
        
        if (random() % 10 == 0) {
            usleep(random() % 100);
        }
    }
    return NULL;
}
 
void run_stress_test() {
    for (int iter = 0; iter < TEST_ITERATIONS; iter++) {
        TestStats stats = {0};
        pthread_t producers[NUM_THREADS], consumers[NUM_THREADS];
        
        // Launch threads
        for (int i = 0; i < NUM_THREADS; i++) {
            pthread_create(&producers[i], NULL, producer_test, &stats);
            pthread_create(&consumers[i], NULL, consumer_test, &stats);
        }
        
        // Wait for completion
        for (int i = 0; i < NUM_THREADS; i++) {
            pthread_join(producers[i], NULL);
            pthread_join(consumers[i], NULL);
        }
        
        // Verify invariants
        assert(stats.produced_count == stats.consumed_count);
        assert(stats.checksum_produced == stats.checksum_consumed);
        
        printf("Iteration %d: PASS\n", iter);
    }
}

Strategy 2: Dynamic Analysis Tools

ThreadSanitizer (TSan): Detects data races at runtime
Helgrind: Valgrind tool for threading errors
Intel Inspector: Commercial race detector

These tools instrument memory accesses and track happens-before relationships to detect potential races even if they don't manifest in a particular run.

Strategy 3: Deterministic Testing

Use frameworks that control thread scheduling to systematically explore interleavings:

CHESS (Microsoft): Systematically explores schedules
rr (Mozilla): Record and replay for debugging
Custom test harnesses with explicit yield points

Testing is Necessary but Not Sufficient

Testing can show the presence of bugs but not their absence. A passing test suite gives some confidence, but for critical code, combine testing with:

• Code review by concurrency experts • Static analysis tools • Formal verification for the most critical paths • Runtime defensive checks (assertions, invariant monitoring)

Practical Decision Framework

When facing a real synchronization challenge, use this decision tree:

1. Can you avoid sharing?

Thread-local storage, message passing, or immutable data eliminate most synchronization needs
The best synchronization is no synchronization

2. Is it a well-known pattern?

Producer-Consumer → Use language/library queue (Java's BlockingQueue, Python's queue.Queue)
Readers-Writers → Use language/library RWLock
Don't reinvent; use battle-tested implementations

3. Is simple mutual exclusion enough?

If you just need exclusive access, use a mutex
Resist over-engineering; complexity breeds bugs

4. Do you need counting or signaling?

Counting resources → Semaphore
Event notification → Semaphore (init 0) or condition variable

5. Do you need complex conditions?

"Wait until X AND Y" → Monitor with condition variable
Semaphores struggle with compound conditions

Technology Choice Quick Reference
Scenario	First Choice	Why
Simple exclusive access	pthread_mutex / language mutex	Simplest, well-understood
Counting resources	Semaphore	Designed for counting
Wait for condition	Monitor + condition variable	Natural for while loops
Producer-consumer	Library queue class	Already implemented correctly
Readers-writers	Library RWLock	Complex to implement correctly
High-performance SPSC	Lock-free ring buffer	When profiling shows need
Complex coordination	Channels / actors	Higher-level abstraction

Prefer Libraries Over Custom Implementations

For production code, use standard library or well-tested third-party implementations whenever possible. Custom synchronization code is a source of subtle bugs. Implement yourself only when:

• Learning (like in this course!) • Specific requirements not met by libraries • Performance-critical and profiled to matter

Summary and Key Takeaways

Key Takeaways

•Follow the five phases: Understand → Properties → Design → Verify → Implement. Skipping phases causes bugs.
•Explicitly state properties: Write down safety, liveness, and fairness requirements before coding.
•Match constraints to patterns: Each constraint maps to a known pattern; compose patterns for full solutions.
•Prove correctness via invariants: Define what's always true; verify operations preserve it.
•Know common issues: Deadlock, race conditions, and starvation have characteristic symptoms and fixes.
•Test rigorously but don't rely solely on testing: Stress tests, tools, and reasoning together provide confidence.
•Prefer standard libraries: Well-tested implementations beat custom code for production use.

Module Complete

What's next? Apply these concepts to real systems. The true mastery comes from practice—implementing, debugging, and refining concurrent code in production contexts.

5 / 5