Loading content...
Of all the bugs that plague concurrent systems, the lost wakeup problem is among the most insidious. It doesn't crash your system immediately. It doesn't produce error messages. It doesn't corrupt data (at least not directly). Instead, it causes processes to sleep forever—waiting for a wakeup that already happened, that they missed, and that will never come again.
The lost wakeup is a race condition between the act of going to sleep and the act of being awakened. If wakeup occurs in the narrow window between a process deciding to sleep and actually sleeping, the wakeup signal is lost. The process sleeps, blissfully unaware that its awaited event already occurred. Without external intervention, it never wakes.
This page provides a comprehensive examination of the lost wakeup problem: what it is, why it happens, how it manifests, and—critically—how to prevent it. Understanding this problem is essential for anyone implementing or using synchronization primitives.
Lost wakeup bugs are notoriously hard to detect and reproduce. They may occur only under specific timing conditions that happen once in millions of executions. A system may run perfectly for months, then mysteriously hang. Understanding the race condition is the only reliable defense.
To understand the lost wakeup, we must examine the non-atomic nature of decision-making and sleeping.
The vulnerable window:
Consider a naive sleep/wakeup implementation:
// Sleeper (Process A)
if (!condition) { // Step 1: Check condition
sleep(); // Step 2: Go to sleep
}
// Waker (Process B)
condition = true; // Step 1: Set condition
wakeup(process_A); // Step 2: Wake the sleeper
The problem is the window between checking the condition and actually sleeping. If Process B sets the condition and calls wakeup during this window, the wakeup is sent to a process that isn't sleeping yet—and is therefore lost.
Why this is a race:
The outcome depends on the relative timing of two independent event sequences:
If Process B's sequence completes between A's check and A's sleep, the wakeup is lost. This is a classic TOCTTOU (Time Of Check To Time Of Use) vulnerability—the condition was checked but may have changed before action was taken.
The non-atomic operation problem:
The fundamental issue is that "check condition and sleep if false" should be an atomic operation—indivisible, with no opportunity for interference. But in the naive implementation, it's two separate operations:
if (!condition) — reads shared statesleep() — changes process stateBetween these operations, anything can happen: context switches, interrupts, other CPU activity. The wakeup can arrive in this gap and be lost.
Unlike signals (which can be pending) or semaphores (which have counts), a simple wakeup is stateless. If a wakeup is sent to a non-sleeping process, it's discarded—there's no 'pending wakeup' flag. This is why the timing matters: wakeups must arrive when the process is actually sleeping.
Lost wakeups manifest in various ways, all of them bad:
Immediate symptoms:
Hung processes: A process sleeps forever, unable to make progress. In interactive systems, this is immediately noticed; in batch systems, it may go undetected for hours or days.
Deadlock-like behavior: If the sleeping process holds resources, those resources become unavailable. Other processes waiting for those resources also hang, creating a chain of blocked processes.
Performance degradation: In less severe cases, a process may eventually be awakened by another event (spurious wakeup, timer, signal). But the delay causes visible performance problems.
Resource leaks: If a process sleeps forever while holding file descriptors, memory, or locks, those resources are never released.
| Scenario | Consequence | Severity |
|---|---|---|
| Lock acquisition sleep | Lock appears held forever; other acquirers block | Critical - system-wide deadlock possible |
| Producer-consumer queue | Consumer sleeps, queue fills, producers block | Critical - complete pipeline stall |
| I/O completion wait | I/O buffer never consumed; device stalls | High - I/O subsystem failure |
| Timer-based sleep | May eventually wake on timeout; just delayed | Medium - degraded responsiveness |
| Conditional variable wait | Thread misses signal; waits forever or until broadcast | High - thread permanently blocked |
Why detection is hard:
Lost wakeup bugs are difficult to identify because:
Timing-dependent: They require specific interleaving that may occur only under particular load conditions. A system may run correctly in testing and fail only in production under heavy load.
No error signals: Unlike null pointer dereferences or assertion failures, lost wakeups produce no exceptions. The process simply blocks indefinitely.
Obscured by other mechanisms: If there's any timeout or periodic wakeup, the bug manifests as "slow" rather than "stuck," making it harder to identify as a lost wakeup.
Non-reproducible: By the time you notice the hang and attach a debugger, the context that would reveal the race has long since passed.
Lost wakeup bugs have caused major production outages at large companies. Systems that ran perfectly for months suddenly hang under traffic spikes that change timing just enough to expose the race. These are nightmare debugging scenarios—significant engineering effort is spent recreating conditions that trigger the bug.
Let's examine specific code patterns where lost wakeups occur.
Example 1: Naive Lock Implementation
1234567891011121314151617181920212223242526
// BROKEN: This lock implementation has a lost wakeup bug int locked = 0; void lock() { while (locked) { // Step 1: Check if locked sleep(); // Step 2: Sleep if locked } locked = 1; // Step 3: Acquire the lock} void unlock() { locked = 0; // Step 1: Release the lock wakeup_one(); // Step 2: Wake one waiter} // RACE SCENARIO:// Thread A is in lock(), about to sleep:// 1. A checks: locked == 1 (true, held by B)// 2. A prepares to call sleep()...// --- CONTEXT SWITCH to Thread B ---// 3. B calls unlock(): locked = 0, wakeup_one()// 4. Wakeup sent, but A isn't sleeping yet - LOST!// --- CONTEXT SWITCH back to A ---// 5. A calls sleep() - sleeps forever// 6. Lock is available, but A is stuckExample 2: Producer-Consumer Queue
123456789101112131415161718192021222324252627282930
// BROKEN: Producer-consumer with lost wakeup int count = 0; // Items in buffer#define MAX 10 void producer() { while (count == MAX) { // Buffer full? sleep(); // Wait for consumer } buffer[count++] = item; // Add item wakeup_consumer(); // Signal consumer} void consumer() { while (count == 0) { // Buffer empty? sleep(); // Wait for producer } item = buffer[--count]; // Remove item wakeup_producer(); // Signal producer} // RACE SCENARIO:// 1. Consumer checks: count == 0 (true, buffer empty)// 2. Consumer prepares to sleep...// --- INTERRUPT: Producer runs ---// 3. Producer adds item: count = 1// 4. Producer calls wakeup_consumer()// 5. Consumer not sleeping! Wakeup LOST!// --- Back to Consumer ---// 6. Consumer sleeps - item is waiting, but consumer sleeps foreverExample 3: Condition Variable Misuse
1234567891011121314151617181920212223242526
// BROKEN: Condition variable without holding lock during predicate check pthread_mutex_t mutex;pthread_cond_t cond;int ready = 0; void waiter() { if (!ready) { // Check WITHOUT lock! pthread_mutex_lock(&mutex); // Then acquire lock pthread_cond_wait(&cond, &mutex); // And wait pthread_mutex_unlock(&mutex); } // process ready condition} void signaler() { pthread_mutex_lock(&mutex); ready = 1; pthread_cond_signal(&cond); pthread_mutex_unlock(&mutex);} // RACE: // 1. Waiter sees ready == 0// 2. Signaler sets ready = 1, signals cond// 3. Waiter acquires lock, waits - signal already gone!All three examples share the same flaw: checking a condition and going to sleep are not atomic. Between the check and the sleep, another thread can change the condition and send a wakeup that arrives too early. The solution requires making check-and-sleep atomic.
The solution to lost wakeups is conceptually simple: make the check-and-sleep operation atomic. The condition check and the transition to sleeping state must be indivisible—no wakeup can slip between them.
There are two main approaches:
Approach 1: Hold a lock across check and sleep
Protect the condition variable with a lock. Check the condition while holding the lock. If you need to sleep, atomically release the lock and enter sleep state. The waker must hold the same lock when setting the condition and signaling.
// Waiter
lock(&mutex);
while (!condition) {
// Atomically: release mutex AND sleep on cond
// When wake: re-acquire mutex
cond_wait(&cond, &mutex);
}
// condition is true, mutex is held
unlock(&mutex);
// Signaler
lock(&mutex);
condition = true;
cond_signal(&cond);
unlock(&mutex);
The cond_wait is the key: it atomically releases the mutex and sleeps. This means:
Approach 2: Set sleeping state before checking condition
Mark yourself as sleeping before checking the condition. Then, if the condition is false, actually sleep. If the condition is true (or becomes true while you were checking), the wakeup will see your sleeping state and either wake you or set your state back to running.
This is the approach used by the Linux kernel's wait_event pattern:
1234567891011121314151617181920212223
// The correct pattern that prevents lost wakeupsDEFINE_WAIT(wait); while (!condition) { // Set state to SLEEPING and add to wait queue // BEFORE checking condition prepare_to_wait(&wq, &wait, TASK_INTERRUPTIBLE); // Now check condition with state already SLEEPING if (!condition) { schedule(); // Actually sleep } // If condition became true before schedule(), // wakeup set us back to RUNNING, schedule() returns immediately}finish_wait(&wq, &wait); // Why this works:// 1. prepare_to_wait sets state = SLEEPING// 2. If wakeup arrives NOW: sees SLEEPING, sets RUNNING// 3. schedule() checks state: if RUNNING, returns immediately (no sleep)// 4. If wakeup arrives LATER: process is actually sleeping, wakeup works// No window where wakeup can be lost!The prepare_to_wait function includes a memory barrier to ensure the state change is visible to other CPUs before the condition check. Without this barrier, the compiler or CPU might reorder the state change after the condition check, recreating the lost wakeup window on weakly-ordered architectures.
A subtle but critical aspect of the solution is double-checking the condition. The condition is checked:
Why both checks?
Check #1 (outer while): If the condition is already true, we don't even enter the waiting logic. This is the fast path—no need to set up wait queues or change state if the condition is immediately satisfied.
Check #2 (inner if): After setting ourselves as sleeping, we recheck. If the condition became true between our outer check and prepare_to_wait, we skip schedule() entirely. Yes, we might have briefly been in sleeping state, but we never actually slept.
12345678910111213141516171819202122232425262728
// Timeline showing why double-check is necessary // SCENARIO 1: Condition true before entering loop// - Outer check: condition == true// - Skip the entire while body// - Never call prepare_to_wait or schedule// - Maximum efficiency for the already-ready case // SCENARIO 2: Condition becomes true during prepare_to_wait// Timeline:// T1: Waiter - while (!cond): false, enters loop// T2: Waiter - prepare_to_wait() starts// T3: Signaler - sets cond = true// T4: Signaler - wake_up(): sees waiter in SLEEPING, sets RUNNING// T5: Waiter - prepare_to_wait() completes// T6: Waiter - if (!cond): cond is NOW true, skip schedule()// T7: Waiter - finish_wait(), proceed// Wakeup effective even though process never actually slept! // SCENARIO 3: Condition becomes true during schedule()// T1: Waiter - prepare_to_wait()// T2: Waiter - if (!cond): false, call schedule()// T3: Waiter - schedule() saves context, begins switch// T4: Signaler - sets cond = true // T5: Signaler - wake_up(): adds waiter to run queue// T6: ...later... Waiter scheduled, returns from schedule()// T7: Waiter - outer while (!cond): true now, exit loop// Normal wakeup operationVisualizing the protection:
The double-check creates an invariant:
If we call schedule() with sleeping state, then either: (a) The condition was false when we checked (so sleeping is correct), or (b) The wakeup arrived and changed our state back to running (so schedule() returns immediately)
This invariant closes the lost wakeup window. There is no moment where:
You could omit the outer while check and always call prepare_to_wait before checking. But this adds overhead: setting up wait queue entries, performing memory barriers, and potentially cache line traffic—all unnecessary if the condition is already true. The outer check is an optimization for the common case where no waiting is needed.
Condition variables are the abstraction that packages the lost-wakeup-safe pattern into a usable API. They encapsulate the atomic release-and-sleep operation.
The pthread_cond_wait contract:
int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
This function atomically:
The atomicity of steps 1-3 is what prevents lost wakeups. The mutex must be held when calling, and it's held when returning—creating a critical section around the entire check-and-wait sequence.
12345678910111213141516171819202122232425262728293031323334
// CORRECT: The canonical condition variable pattern pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;pthread_cond_t cond = PTHREAD_COND_INITIALIZER;int ready = 0; void waiter() { pthread_mutex_lock(&mutex); // 1. Acquire lock while (!ready) { // 2. Check in loop (spurious wakeups!) pthread_cond_wait(&cond, &mutex); // 3. Atomic: release & wait & reacquire } // 4. Condition is true AND lock is held process_ready_condition(); pthread_mutex_unlock(&mutex); // 5. Release lock} void signaler() { pthread_mutex_lock(&mutex); // 1. Acquire same lock ready = 1; // 2. Change condition pthread_cond_signal(&cond); // 3. Signal waiter pthread_mutex_unlock(&mutex); // 4. Release lock} // Why this is safe:// - Waiter holds mutex when checking 'ready'// - Signaler must hold mutex to set 'ready' and signal// - cond_wait atomically releases mutex and sleeps// - When signaler signals, waiter is either:// (a) Not yet waiting (still checking) - will see ready=1// (b) Already waiting - will receive signal// NO window for lost wakeup!| Mistake | Problem | Fix |
|---|---|---|
| Not holding mutex when calling wait() | Undefined behavior; atomicity broken | Always lock before calling wait |
| Using if instead of while for condition | Spurious wakeups proceed with false condition | Always use while loop |
| Checking condition outside lock | Lost wakeup window | Check condition only while holding lock |
| Signal without holding lock | Semantically wrong; timing races | Hold lock when signaling (or after setting state) |
| Multiple conditions on same cond_t | Signal wakes wrong waiter | Use separate condition variables per condition |
Debate exists about whether to call signal() before or after unlocking. Signaling inside the lock is always safe (waiter can't see inconsistent state). Signaling after unlocking can be slightly more efficient (waiter doesn't immediately block on mutex) but requires careful analysis. When in doubt, signal before unlocking.
Given the difficulty of detecting lost wakeups in production, what strategies help identify them?
Static analysis approaches:
Pattern matching: Look for code that checks a condition then sleeps without locking. Tools like Coverity, CodeQL, and specialized research tools can flag these patterns.
Lock analysis: Verify that all waits are inside critical sections, and all signals are also inside (or immediately after) critical sections on the same lock.
Data race detectors: Lost wakeups often accompany data races. Tools like ThreadSanitizer can identify unsynchronized access to shared condition variables.
Dynamic analysis approaches:
// Debug version with timeout detection
while (!condition) {
int ret = wait_event_timeout(&wq, condition, 5*HZ); // 5 second timeout
if (ret == 0 && !condition) {
printk(KERN_WARNING "Possible lost wakeup: still waiting after 5s
");
dump_stack(); // Log where we're stuck
}
}
Sleep state auditing: Periodically scan all blocked processes. If a process has been sleeping "too long" on a condition that logs show was signaled, investigate.
Instrumented wakeups: Log every sleep and wakeup with timestamps and process IDs. Analyze for wakeups that don't correspond to any sleeping process.
Debugging a suspected lost wakeup:
If a process is stuck sleeping:
Identify what it's waiting for: In Linux, check /proc/PID/stack or use echo t > /proc/sysrq-trigger for all stacks.
Check the wait queue: Is the process actually on the expected wait queue? If yes, is the condition still false? If condition is true but process is sleeping, wakeup was lost.
Review the code path: Trace the code from condition check to sleep. Is there any window where condition could change without proper synchronization?
Check for missing wakeups: Is the signaling code always called when the condition changes? Are there code paths that set the condition but forget to signal?
Stress test: Add intentional delays with random nanosleep() calls in the suspect code path to widen potential race windows.
Lost wakeup bugs are classic Heisenbugs—they disappear when you try to observe them. Adding logging, enabling debugging, or attaching a debugger changes timing and may mask the bug. This is why understanding the race condition theoretically is crucial—you may never reproduce it experimentally.
The lost wakeup problem is a fundamental challenge in concurrent systems. Mastering it is essential for correct synchronization.
You now understand the lost wakeup problem—one of the most insidious concurrency bugs. With this knowledge, you can design synchronization primitives that avoid the race condition. Next, we'll explore the broader implementation challenges of the sleep/wakeup mechanism.