Condition Variables - Learning Module

Loading content...

0/227

Wait Operation

The Art of Waiting Efficiently

The wait operation is the heart of condition variables. It performs what seems like an impossible task: a thread releases the mutex it holds, goes to sleep, and then reacquires that same mutex—all while ensuring no signals are lost in the process.

This atomic release-and-block operation is deceptively complex. Get it wrong, and you face subtle bugs: lost wakeups where threads sleep forever, race conditions where threads proceed before conditions are met, or deadlocks where everyone waits and no one can proceed.

In this page, we'll dissect the wait operation at multiple levels:

The programmer's mental model: what wait appears to do
The semantic guarantees: what wait must accomplish
The implementation reality: how kernels actually achieve this
The edge cases: spurious wakeups, cancellation, and errors

Understanding wait thoroughly is essential—it's the foundation upon which all condition variable patterns are built.

What You Will Learn

By the end of this page, you will understand: the precise semantics of the wait operation; why atomic release-and-block is essential; how different operating systems implement wait; the reality of spurious wakeups and how to handle them; proper error handling and cancellation semantics.

The Wait Operation Defined

Let's begin with a precise definition of what the wait operation must accomplish. In POSIX terminology, the operation is pthread_cond_wait:

int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);

Precondition: The calling thread must hold mutex locked.

Postcondition: When the function returns, the calling thread holds mutex locked.

Behavior: The wait operation performs these steps as an atomic unit:

Add the calling thread to the condition variable's wait queue
Release the mutex
Block the thread until it is signaled (or spuriously woken)
When unblocked, reacquire the mutex before returning

The atomicity of steps 1-3 is crucial. There must be no window between releasing the mutex and blocking where a signal could be lost.

wait_semantics.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Conceptual breakdown of pthread_cond_wait
// This is NOT actual implementation - it's a mental model
 
int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex) {
    // === ATOMIC SECTION BEGIN ===
    // These operations happen atomically:
    
    // 1. Add self to wait queue
    enqueue(&cond->wait_queue, current_thread);
    
    // 2. Release the mutex
    pthread_mutex_unlock(mutex);
    
    // === BLOCK POINT ===
    // 3. Block until signaled
    block_until_signaled(current_thread);
    
    // === ATOMIC SECTION END ===
    // We are now awake and removed from wait queue
    
    // 4. Reacquire mutex (may block again here)
    pthread_mutex_lock(mutex);
    
    return 0;  // Success
}
 
// The "atomic section" is the key:
// From any other thread's perspective, steps 1-3 happen
// instantaneously. There is no observable state where we
// have released the mutex but are not yet in the wait queue.

Not Actually Atomic

The "atomic section" isn't implemented with atomics like compare-and-swap. Instead, it's protected by kernel-level synchronization. The kernel holds an internal lock on the condition variable while performing the release-and-block, ensuring the atomicity from the perspective of other threads.

Why Atomicity is Non-Negotiable

To truly understand the wait operation, we must see what goes wrong without atomicity. Consider this broken implementation:

broken_wait.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// BROKEN: Non-atomic wait implementation
// This demonstrates why atomicity is essential
 
void broken_cond_wait(cond_t *cond, mutex_t *mutex) {
    // Step 1: Release mutex
    mutex_unlock(mutex);
    
    // <<< THE DANGER ZONE >>>
    // Right here, between unlock and adding to queue:
    //   - Another thread could acquire the mutex
    //   - That thread could change the condition
    //   - That thread could call signal()
    //   - The signal finds no waiters (we're not in queue yet!)
    //   - The signal is LOST
    
    // Step 2: Add self to wait queue
    enqueue(&cond->wait_queue, current_thread);
    
    // Step 3: Block
    block();
    
    // We might NEVER wake up because the signal was lost!
    
    mutex_lock(mutex);
}
 
// TIMELINE OF THE BUG:
//
// Time    Waiter Thread         Signaler Thread
// ----    -------------         ---------------
// T1      mutex_unlock()        (blocked on mutex)
// T2      (preempted)           mutex_lock() succeeds!
// T3      -                     modifies_condition()
// T4      -                     cond_signal() - no waiters!
// T5      -                     mutex_unlock()
// T6      enqueue(self)         (done)
// T7      block() - FOREVER!    -

The lost wakeup problem:

This is one of the most notorious bugs in concurrent programming. The symptoms are terrifying:

A thread waits for a condition
The condition becomes true
The thread never wakes up
The system deadlocks or hangs indefinitely

The bug is non-deterministic—it depends on the exact timing of thread scheduling. It might not appear in development but can emerge under load in production.

Why atomic enqueue-then-release solves it:

When the operations are atomic:

The waiter is in the queue before releasing the mutex
The signaler cannot acquire the mutex until the waiter has been enqueued
When the signaler calls signal(), the waiter is guaranteed to be in the queue
The signal is delivered correctly

The Invariant

The key invariant is: a thread is EITHER holding the mutex OR in the wait queue (or waking up). There is never a moment where it's in neither state. This invariant is what makes condition variables correct.

The POSIX pthread_cond_wait API

POSIX provides the standard API for condition variables on Unix-like systems. Let's examine pthread_cond_wait in detail.

pthread_cond_wait_usage.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <pthread.h>
 
// Initialization
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int shared_data = 0;
 
// The proper wait pattern
void* waiter_thread(void* arg) {
    pthread_mutex_lock(&mutex);
    
    // ALWAYS use while, NEVER if
    while (shared_data == 0) {
        // Wait atomically releases mutex and blocks
        int result = pthread_cond_wait(&cond, &mutex);
        
        // Upon return, we hold the mutex again
        // But the condition might not be true!
        // (spurious wakeup, or another thread consumed it)
        
        if (result != 0) {
            // Error handling (rare but possible)
            handle_error(result);
        }
    }
    
    // Condition is NOW guaranteed true
    // (because we just checked it while holding mutex)
    process_data(shared_data);
    shared_data = 0;  // Reset for next iteration
    
    pthread_mutex_unlock(&mutex);
    return NULL;
}
 
// The signaler pattern
void* signaler_thread(void* arg) {
    pthread_mutex_lock(&mutex);
    
    // Modify shared state
    shared_data = 42;
    
    // Signal while holding lock (recommended)
    pthread_cond_signal(&cond);
    
    pthread_mutex_unlock(&mutex);
    return NULL;
}

pthread_cond_wait Return Values
Return Value	Meaning	Required Action
0	Success (or spurious wakeup)	Re-check the predicate in while loop
EINVAL	Invalid cond or mutex	Programming error; fix the code
EPERM	Mutex not owned by caller	Programming error; always lock first

Key observations:

The return value is rarely checked — In practice, pthread_cond_wait almost always returns 0. The while loop handles spurious wakeups, and serious errors (EINVAL, EPERM) indicate programmer mistakes that should be caught during development.
The mutex must be owned — Calling wait without holding the mutex is undefined behavior. On some systems, it silently corrupts state; on others, it returns EPERM.
The predicate is external — The condition variable doesn't know what you're waiting for. You must manage the predicate (shared_data == 0 in the example) yourself.

Timed Wait Operations

Waiting indefinitely is often inappropriate. What if the condition never becomes true? What if a producer thread crashes? Timed waits provide a solution: wait until either the condition might be true OR a timeout expires.

timed_wait.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <pthread.h>
#include <time.h>
#include <errno.h>
 
// POSIX timed wait uses absolute time
int pthread_cond_timedwait(
    pthread_cond_t *cond,
    pthread_mutex_t *mutex,
    const struct timespec *abstime  // Absolute deadline
);
 
// Example: Wait up to 5 seconds for data
int wait_with_timeout(void) {
    struct timespec deadline;
    
    // Get current time
    clock_gettime(CLOCK_REALTIME, &deadline);
    
    // Add 5 seconds
    deadline.tv_sec += 5;
    
    pthread_mutex_lock(&mutex);
    
    while (shared_data == 0) {
        int result = pthread_cond_timedwait(&cond, &mutex, &deadline);
        
        if (result == ETIMEDOUT) {
            // Timeout expired, condition still false
            pthread_mutex_unlock(&mutex);
            return -1;  // Indicate timeout
        } else if (result != 0 && result != ETIMEDOUT) {
            // Unexpected error
            pthread_mutex_unlock(&mutex);
            return -2;  // Indicate error
        }
        // result == 0: normal wakeup, loop will recheck condition
    }
    
    // Condition is true
    int data = shared_data;
    shared_data = 0;
    
    pthread_mutex_unlock(&mutex);
    return data;
}

Absolute vs. Relative Time

POSIX uses absolute time (deadline), not relative time (duration). This is intentional: if you're woken spuriously, you don't want to restart a full timeout. The deadline remains fixed regardless of spurious wakeups. However, this means you must calculate the deadline before entering the wait loop.

Clock selection:

POSIX allows condition variables to use different clocks:

pthread_condattr_t attr;
pthread_condattr_init(&attr);

// Use monotonic clock (immune to system time changes)
pthread_condattr_setclock(&attr, CLOCK_MONOTONIC);

pthread_cond_t cond;
pthread_cond_init(&cond, &attr);

CLOCK_REALTIME: Wall-clock time. Can jump if system time is adjusted.
CLOCK_MONOTONIC: Steady clock. Never adjusted, only moves forward.

For timeouts, CLOCK_MONOTONIC is usually preferred—you don't want a system time adjustment to cause a 5-second timeout to become 5 hours or expire immediately.

When to Use Timed Wait
Scenario	Recommendation	Rationale
Thread pool work queue	Bounded timeout	Allows periodic health checks and shutdown polling
Producer-consumer buffer	Usually indefinite	Producers should eventually produce; timeout adds complexity
Network event waiting	Bounded timeout	Remote peers may fail silently; need eventual detection
User request handling	Bounded timeout	Users expect responses in bounded time
System shutdown	Short timeout	Don't wait forever for threads that may be stuck

Spurious Wakeups Deep Dive

Spurious wakeups are a reality that every condition variable programmer must understand. A spurious wakeup occurs when pthread_cond_wait returns even though no thread called pthread_cond_signal or pthread_cond_broadcast.

Why do spurious wakeups happen?

Several factors contribute:

Sources of Spurious Wakeups

•Kernel implementation details — Some OS implementations use interruptible sleeps or signals internally, which can wake waiting threads
•Multiprocessor race conditions — On SMP systems, the internal state machine may have windows where wakeups propagate unexpectedly
•Signal handling — When a process receives a Unix signal, threads may be woken to handle it
•Futex semantics on Linux — The futex system call can return spuriously due to implementation limits
•Deliberate simplification — Allowing spurious wakeups permits simpler, faster kernel implementations

The specification deliberately allows spurious wakeups:

From the POSIX specification:

"Spurious wakeups from the pthread_cond_wait() or pthread_cond_timedwait() functions may occur. Since the return from pthread_cond_wait() or pthread_cond_timedwait() does not imply anything about the value of [the predicate], the predicate should be re-evaluated upon return."

This is not a bug or an implementation deficiency—it's a deliberate design choice that:

Enables simpler, faster kernel implementations
Allows portable code across different operating systems
Forces programmers to write correct loop-based waits

spurious_wakeup_handling.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// The while loop pattern handles spurious wakeups automatically
 
pthread_mutex_lock(&mutex);
 
// Good: Loop rechecks after any wakeup
while (!condition_is_satisfied()) {
    pthread_cond_wait(&cond, &mutex);
    // After ANY return (spurious or real), we recheck
}
 
// Here, condition IS satisfied because:
// 1. We hold the mutex (so no one can change it)
// 2. We just verified it in the while condition
 
pthread_mutex_unlock(&mutex);
 
 
// BAD: Using 'if' instead of 'while'
pthread_mutex_lock(&mutex);
if (!condition_is_satisfied()) {  // DANGEROUS!
    pthread_cond_wait(&cond, &mutex);
    // After spurious wakeup, we proceed without rechecking!
}
// BUG: condition might NOT be satisfied!
pthread_mutex_unlock(&mutex);

Embrace the While Loop

The while loop isn't just protection against spurious wakeups—it protects against ALL forms of unexpected returns: spurious wakeups, broadcast wakeups where another thread consumed the resource, and the Mesa semantics where the condition can change between signal and wakeup. The while loop is your friend; never use 'if'.

Implementation in the Kernel

Let's examine how operating systems actually implement the wait operation. Understanding the kernel-level mechanics illuminates why condition variables work the way they do.

Linux Implementation using Futexes:

On Linux, condition variables are implemented using futexes (fast userspace mutexes). The key insight is that the wait queue is maintained in the kernel, not in userspace.

linux_condvar_simplified.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// Simplified Linux implementation (conceptual)
// Based on NPTL (Native POSIX Threads Library)
 
struct pthread_cond_t {
    unsigned int __lock;           // Internal spinlock for cond state
    unsigned int __futex;          // Futex word for blocking
    uint64_t __total_seq;          // Total signals sent
    uint64_t __wakeup_seq;         // Number of wakeups
    uint64_t __woken_seq;          // Number consumed
    // ... more fields
};
 
int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex) {
    // 1. Lock internal cond state
    spin_lock(&cond->__lock);
    
    // 2. Record our sequence number (for ordering)
    uint64_t seq = cond->__total_seq;
    uint64_t futex_val = cond->__futex;
    
    // 3. Increment waiter count
    cond->__nwaiters++;
    
    // 4. Release internal lock and user mutex atomically
    //    (This is the tricky part)
    spin_unlock(&cond->__lock);
    pthread_mutex_unlock(mutex);
    
    // 5. Block using futex  
    //    FUTEX_WAIT only sleeps if __futex still equals futex_val
    //    This provides atomicity: if signal happened between
    //    our unlock and here, __futex will have changed
    do {
        futex(&cond->__futex, FUTEX_WAIT, futex_val);
        
        // Check if we should wake
        spin_lock(&cond->__lock);
        if (cond->__wakeup_seq != seq) {
            // A signal was for us
            cond->__woken_seq++;
            spin_unlock(&cond->__lock);
            break;
        }
        // Spurious wakeup, go back to sleep
        spin_unlock(&cond->__lock);
    } while (1);
    
    // 6. Reacquire user mutex
    pthread_mutex_lock(mutex);
    
    return 0;
}

The futex magic:

The futex() system call with FUTEX_WAIT is the key primitive:

futex(&word, FUTEX_WAIT, expected_value);

This atomically checks if word == expected_value. If so, it puts the thread to sleep. If not, it returns immediately. This provides the atomicity we need:

Before unlocking, we read the futex value
After unlocking, we call futex with that value
If a signal changed the futex value between steps 1 and 2, the futex call returns immediately (no lost wakeup!)

Windows Implementation:

Windows uses a different approach with the SRW Lock and Condition Variable APIs:

windows_condvar.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <windows.h>
 
// Windows condition variables (Vista+)
CONDITION_VARIABLE cv;
SRWLOCK srwlock;
 
void init(void) {
    InitializeConditionVariable(&cv);
    InitializeSRWLock(&srwlock);
}
 
// Wait pattern
void waiter(void) {
    AcquireSRWLockExclusive(&srwlock);
    
    while (!condition) {
        // SleepConditionVariableSRW releases and reacquires lock
        BOOL result = SleepConditionVariableSRW(
            &cv,
            &srwlock,
            INFINITE,           // Timeout (or specific ms)
            0                   // Flags (0 for exclusive)
        );
        
        if (!result && GetLastError() == ERROR_TIMEOUT) {
            // Handle timeout
        }
    }
    
    // Condition is true
    
    ReleaseSRWLockExclusive(&srwlock);
}
 
// Signal pattern
void signaler(void) {
    AcquireSRWLockExclusive(&srwlock);
    condition = true;
    WakeConditionVariable(&cv);       // Wake one
    // or WakeAllConditionVariable(&cv); // Wake all
    ReleaseSRWLockExclusive(&srwlock);
}

Implementation Variation

While the APIs differ, the semantics are similar across platforms. All implementations must provide: atomic release-and-block, proper wakeup delivery, and mutex reacquisition before return. The differences are in timeout handling, error codes, and integration with other OS primitives.

The Wait Queue

Every condition variable maintains a queue of waiting threads. The structure and ordering of this queue has important implications for fairness and performance.

FIFO Queues:

Most implementations use FIFO (First-In-First-Out) queues for waiters:

Threads are added to the tail when they wait
Signals remove threads from the head
Provides fairness: first to wait is first to wake

However, FIFO is not always guaranteed. Some implementations allow out-of-order wakeups for performance reasons.

Priority-Based Queues:

Some real-time systems use priority queues:

Threads are ordered by scheduling priority
Signals wake the highest-priority waiter
Necessary for real-time guarantees
More expensive to maintain than FIFO

wait_queue_structure.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Conceptual wait queue structure
 
// Simple linked list (common in POSIX implementations)
struct WaitNode {
    pthread_t thread;          // The waiting thread
    struct WaitNode* next;     // Next in queue
    volatile int woken;        // Has this node been signaled?
};
 
struct ConditionVariable {
    spinlock_t lock;           // Protects queue manipulation
    struct WaitNode* head;     // First waiter (dequeue here)
    struct WaitNode* tail;     // Last waiter (enqueue here)
};
 
// Adding to wait queue
void enqueue_waiter(ConditionVariable* cv, WaitNode* node) {
    node->next = NULL;
    node->woken = 0;
    
    if (cv->tail == NULL) {
        cv->head = cv->tail = node;
    } else {
        cv->tail->next = node;
        cv->tail = node;
    }
}
 
// Removing from wait queue (for signal)
WaitNode* dequeue_waiter(ConditionVariable* cv) {
    if (cv->head == NULL) return NULL;
    
    WaitNode* node = cv->head;
    cv->head = node->next;
    if (cv->head == NULL) {
        cv->tail = NULL;
    }
    return node;
}

Wait Queue Implementation Tradeoffs
Queue Type	Complexity	Fairness	Use Case
FIFO List	O(1) enqueue/dequeue	First-come-first-served	General purpose threading
Priority Queue	O(log n) operations	Priority-based	Real-time systems
Random	O(1) operations	None	Lock-free implementations
LIFO Stack	O(1) operations	Last-in-first-out	Cache-friendly but unfair

Queue steal problem:

In Mesa semantics (signal-and-continue), there's a subtle issue: between when a thread is signaled and when it actually runs and reacquires the mutex, another thread might "steal" the condition.

Example:

Thread A signals (intending for B)
Thread B is woken but hasn't run yet
Thread C does a new wait-check cycle: acquires mutex, sees condition is true, consumes resource
Thread B finally runs, acquires mutex, but condition is false again!

This is why the while loop is mandatory—Thread B must recheck the condition.

Thread Cancellation and Wait

The interaction between thread cancellation and condition variable wait is complex and often misunderstood. In POSIX systems, pthread_cond_wait is a cancellation point—a location where a cancelled thread will actually terminate.

The cancellation problem:

When a thread is cancelled while waiting on a condition variable, several things must happen:

The thread must be removed from the wait queue
Cancellation cleanup handlers must run
The mutex must be left in a consistent state

But which state? The POSIX specification says that if the thread is cancelled, the mutex is reacquired before the cancellation handlers run.

cancellation_handling.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <pthread.h>
 
pthread_mutex_t mutex;
pthread_cond_t cond;
 
// Cleanup handler - called if thread is cancelled
void cleanup_handler(void* arg) {
    pthread_mutex_t* m = (pthread_mutex_t*)arg;
    
    // The mutex is held when cleanup runs!
    // We must release it
    pthread_mutex_unlock(m);
    
    printf("Thread cancelled, mutex released\n");
}
 
void* cancellable_waiter(void* arg) {
    pthread_mutex_lock(&mutex);
    
    // Push cleanup handler
    // Will be called if cancelled during wait
    pthread_cleanup_push(cleanup_handler, &mutex);
    
    while (!condition) {
        // Cancellation can occur HERE
        pthread_cond_wait(&cond, &mutex);
        
        // If cancelled, we never reach this line
        // Instead, cleanup_handler is called with mutex held
    }
    
    // Pop cleanup handler (0 = don't execute)
    pthread_cleanup_pop(0);
    
    pthread_mutex_unlock(&mutex);
    return NULL;
}

Cancellation Complexity

Thread cancellation with condition variables is notoriously tricky. Many codebases avoid thread cancellation entirely, preferring to use a "shutdown" flag that threads check periodically. This is simpler and less error-prone. If you must use cancellation, always register cleanup handlers.

Best practices for cancellation:

Prefer cooperative shutdown: Use a boolean flag and signal the condition variable when requesting shutdown
Use cleanup handlers: If cancellation is possible, always push a cleanup handler before waiting
Test cancellation paths: Cancellation bugs often only appear in edge cases
Consider disabling cancellation: Use pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL) in critical sections

Summary: The Wait Operation

The wait operation is the foundation of condition variable synchronization. Let's consolidate the key points:

Key Takeaways

•Atomic release-and-block is the essential property — The thread must be in the wait queue before releasing the mutex to prevent lost wakeups
•Always use a while loop — Spurious wakeups, Mesa semantics, and broadcast all require rechecking the predicate
•The mutex must be held — Calling wait without the mutex is undefined behavior
•Timed waits use absolute time — This prevents timeout drift across spurious wakeups
•Spurious wakeups are by design — They enable simpler, faster kernel implementations
•Kernel implementations use futexes — Or similar primitives that provide atomic check-and-sleep
•Cancellation requires cleanup handlers — The mutex is held when cancellation handlers run

The canonical wait pattern:

pthread_mutex_lock(&mutex);
while (!predicate) {
    pthread_cond_wait(&cond, &mutex);
}
// predicate is true, proceed
pthread_mutex_unlock(&mutex);

This pattern is the safest and most portable way to use condition variables. Memorize it, understand it, and use it consistently.

What's next:

Now that we understand the wait operation, we'll examine its counterpart: the signal operation. We'll see how signals wake waiting threads, the difference between signal and broadcast, and the tradeoffs involved in choosing between them.

Page Complete

You now understand the wait operation in depth: its semantics, implementation, and edge cases. This knowledge is essential for writing correct concurrent code that uses condition variables for state-dependent synchronization.