Loading learning content...
In the previous page, we explored how processes enter sleep—voluntarily yielding the CPU when they cannot make progress. A sleeping process is inert: it consumes no CPU cycles and waits passively for its awakening. But sleep without wakeup would mean processes sleep forever—a useless suspended animation.
The wakeup mechanism is the essential complement to sleep. It is the signal that tells a sleeping process: "The event you awaited has occurred. Time to resume." Wakeup transitions a process from dormancy back to candidacy for CPU time, returning it to the scheduler's consideration.
But wakeup is more than a simple state flip. It must be:
This page provides a comprehensive examination of wakeup—the mechanism that restores sleeping processes to active life.
By the end of this page, you will understand: the fundamental concept and API of wakeup operations; the difference between waking one vs. all waiters; exclusive vs. non-exclusive wakeup; the internal implementation of wakeup; how wakeup integrates with the scheduler; and the critical role of wakeup in avoiding deadlock and ensuring progress.
Wakeup is the operation that transitions a sleeping process back to the runnable state. It is the inverse of sleep and forms the second half of the sleep/wakeup primitive pair.
Conceptually, wakeup does the following:
The semantic contract:
Wakeup establishes a guarantee: After wakeup is called, the target process will eventually be scheduled (assuming no further sleeps). Wakeup doesn't mean immediate execution—just eligibility for scheduling. The actual resumption depends on:
Unlike sleep, wakeup never blocks the caller. It's a quick state manipulation—typically a few dozen instructions. The waker continues immediately after calling wakeup, regardless of when or whether the awakened process actually runs. This asymmetry is fundamental: sleep blocks, wakeup notifies.
There are fundamentally two ways to specify who should be awakened:
1. Direct wakeup (by task reference): The waker has a direct pointer to the sleeping task's control block (task_struct in Linux) and awakens that specific process:
int wake_up_process(struct task_struct *p);
This is used when:
2. Wait queue wakeup (by event): The waker doesn't know or care which specific processes are waiting. It signals an event, and all relevant waiters are awakened:
int wake_up(wait_queue_head_t *q);
This is used when:
Wait queue wakeup strategies:
When waking by event (wait queue), the kernel offers several strategies:
| Function | Behavior | Use Case |
|---|---|---|
wake_up() | Wake all non-exclusive + one exclusive | Standard wakeup: handles both cases |
wake_up_all() | Wake every waiter, exclusive or not | Condition affecting all (e.g., shutdown) |
wake_up_nr(wq, n) | Wake up to n exclusive waiters | Resource with multiple available slots |
wake_up_interruptible() | Only wake TASK_INTERRUPTIBLE waiters | Don't disturb uninterruptible I/O |
wake_up_sync() | Wake but don't reschedule immediately | Batching multiple wakeups for efficiency |
wake_up_locked() | Wake while already holding queue lock | Avoid double-lock when already protected |
The wake_up_nr(wq, n) variant wakes exactly n exclusive waiters. This is useful for resources that become available in batches—for example, if 5 buffer slots become free, wake 5 waiters. This is more efficient than wake_up_all when you know how many can proceed.
A critical distinction in wait queue design is whether waiters are exclusive or non-exclusive. This affects how many processes are awakened.
Non-exclusive waiters (default):
Exclusive waiters:
The thundering herd problem:
If 100 processes wait for a lock and you wake all of them when the lock is released, 99 will immediately go back to sleep (only one can acquire the lock). This wastes CPU cycles on 99 pointless context switches. Exclusive wakeup avoids this—only one is awakened, and that one acquires the lock.
How wake_up handles mixed queues:
A wait queue can contain both exclusive and non-exclusive waiters. The standard wake_up() processes the queue as follows:
This means:
Practical implications:
123456789101112131415161718
// Wait queue with mixed waiters// Position: 1 2 3 4 5// Type: NE NE EX EX NE// Process: P1 P2 P3 P4 P5 // Linux wake_up() behavior:// 1. NE waiters at front: wake P1, P2 (non-exclusive)// 2. First EX waiter: wake P3, then STOP// 3. P4 (EX) and P5 (NE) remain sleeping // wake_up_all() behavior:// Wake P1, P2, P3, P4, P5 (everyone) // wake_up_nr(wq, 2) behavior (for exclusive):// 1. Wake P1, P2 (non-exclusive, always woken)// 2. Wake P3 (first exclusive)// 3. Wake P4 (second exclusive) // 4. P5 remains (exhausted the n=2 exclusive count)Setting exclusive mode:
When adding to a wait queue, processes indicate whether they're exclusive:
// Non-exclusive wait (default)
add_wait_queue(wq, &wait);
// Exclusive wait
add_wait_queue_exclusive(wq, &wait);
// Or set the flag directly
wait.flags |= WQ_FLAG_EXCLUSIVE;
prepare_to_wait_exclusive(wq, &wait, TASK_INTERRUPTIBLE);
Guidelines for choosing:
| Scenario | Wakeup Mode | Reason |
|---|---|---|
| Lock/mutex acquisition | Exclusive | Only one can hold the lock |
| Condition variable signal | Exclusive | Usually paired with one waker one waiter |
| Shared resource becomes available | Non-exclusive | Multiple can use simultaneously |
| Broadcast notification | Non-exclusive | All interested parties should know |
| Semaphore with count N | Wake N exclusive | Exactly N can proceed |
Let's examine the internal implementation of wakeup to understand precisely what happens when a process is awakened.
The wakeup call chain (Linux):
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
// High-level wakeup: wake all non-exclusive + one exclusivevoid wake_up(wait_queue_head_t *wq) { unsigned long flags; spin_lock_irqsave(&wq->lock, flags); __wake_up_common(wq, TASK_NORMAL, 1, 0, NULL); spin_unlock_irqrestore(&wq->lock, flags);} // Core wakeup logicstatic void __wake_up_common(wait_queue_head_t *wq, unsigned int mode, // Which states to wake int nr_exclusive, // How many exclusive int wake_flags, // Sync, etc. void *key) { wait_queue_entry_t *entry, *next; list_for_each_entry_safe(entry, next, &wq->head, entry) { unsigned flags = entry->flags; // Call the waiter's wake function // Default: default_wake_function int ret = entry->func(entry, mode, wake_flags, key); if (ret < 0) break; // Stop iteration // If exclusive, decrement counter if (ret && (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) break; // Woken enough exclusives }} // The actual process wakeupint default_wake_function(wait_queue_entry_t *entry, unsigned mode, int sync, void *key) { struct task_struct *p = entry->private; return try_to_wake_up(p, mode, sync);} // Core of process wakeupint try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) { unsigned long flags; int cpu; raw_spin_lock_irqsave(&p->pi_lock, flags); // Check if process is in acceptable sleep state if (!(p->state & state)) goto out_nostat; // Not sleeping in matching state // Atomically change state to TASK_RUNNING p->state = TASK_RUNNING; // Find CPU to add to run queue // Could be current CPU, previous CPU (for cache affinity), // or idle CPU (for load balancing) cpu = select_task_rq(p, p->wake_cpu, wake_flags); // Add to that CPU's run queue // Uses scheduler class (CFS, RT, etc.) activate_task(rq, p, ENQUEUE_WAKEUP); // Check if we should preempt current task on that CPU check_preempt_curr(rq, p, wake_flags); out: raw_spin_unlock_irqrestore(&p->pi_lock, flags); return success;}Key implementation details:
1. Spin lock protection:
The wait queue spinlock is held during the wakeup traversal. This prevents races with concurrent add/remove operations. The irqsave variant is used because wakeup can be called from interrupt context (e.g., timer handler, I/O completion).
2. list_for_each_entry_safe: This safe iteration macro handles the case where the current entry is removed during iteration—which can happen if the wakeup function removes itself.
3. Custom wake functions:
Each wait entry has a func pointer (usually default_wake_function). This allows custom wakeup logic—for example, poll() and epoll() use custom functions to aggregate multiple events.
4. The mode parameter:
This is a bitmask of states to wake: TASK_INTERRUPTIBLE, TASK_UNINTERRUPTIBLE, or both (TASK_NORMAL). wake_up_interruptible() only wakes TASK_INTERRUPTIBLE processes.
5. CPU selection for wakeup:
select_task_rq chooses which CPU's run queue to place the awakened process on. Factors include:
6. Preemption check: After enqueuing, the kernel checks if the newly runnable process has higher priority than whatever is currently running on the target CPU. If so, it sets a "need resched" flag, causing preemption at the next opportunity.
For very long wait queues, Linux uses a bookmark entry to track progress. This allows the lock to be dropped periodically during wakeup traversal, preventing excessive lock hold times. The bookmark is a dummy entry that marks where to resume iteration.
Wakeup doesn't exist in isolation—it's intimately connected with the scheduler. Understanding this relationship is crucial for performance and correctness.
The scheduler's role in wakeup:
When try_to_wake_up succeeds, it must integrate the newly runnable process with the scheduler. This involves:
CFS sleep bonus:
The Completely Fair Scheduler gives sleeping processes a boost when they wake up. The intuition: a process that slept is likely interactive (waiting for user input) and should get responsive scheduling.
// On wakeup, CFS places the task with a slight vruntime advantage
// This ensures recently-sleeping tasks get scheduled quickly
vruntime = min(vruntime, min_vruntime - sleep_bonus)
This "sleeper fairness" prevents interactive processes from being starved by CPU-bound processes that never sleep.
Wakeup latency:
The time from wakeup call to actual process execution is called wakeup latency. It depends on:
| Factor | Impact |
|---|---|
| Same CPU vs. remote | Same CPU: ~1-5μs; Remote: ~10-50μs (IPI overhead) |
| Current process priority | If lower, preemption adds ~1-10μs |
| Interrupt context | May need to defer to scheduler tick |
| RT vs. normal process | RT gets immediate preemption |
| CPU power state | If idle/deep sleep: ~10-100μs to wake CPU |
When you're about to yield the CPU anyway (e.g., you wake someone then immediately sleep yourself), use wake_up_sync(). This skips the preemption check and avoids unnecessary reschedule IPIs. The awakened process will be scheduled when you naturally yield, saving overhead.
12345678910111213141516171819202122
// Suboptimal: might trigger premature preemptionvoid producer_thread() { // Produce data buffer[idx] = data; // Wake consumer (might preempt us immediately) wake_up(&consumer_wq); // We were about to sleep anyway! wait_event_interruptible(&producer_wq, !buffer_full);} // Optimal: use sync wakeupvoid producer_thread_optimized() { buffer[idx] = data; // Wake consumer but don't trigger immediate reschedule wake_up_sync(&consumer_wq); // Now we sleep - scheduler will pick consumer if appropriate wait_event_interruptible(&producer_wq, !buffer_full);}A significant portion of wakeups occur from interrupt context—when hardware signals that an awaited event has occurred. Interrupt handlers must follow special rules.
Why interrupts trigger wakeups:
Constraints in interrupt context:
Interrupt handlers run with interrupts disabled or at elevated priority. They must:
Safe wakeup from interrupts:
Wakeup is specifically designed to be safe from interrupt context:
// All wakeup variants are interrupt-safe
wake_up(); // Safe
wake_up_interruptible(); // Safe
wake_up_all(); // Safe
wake_up_process(p); // Safe
Internally, these use spin_lock_irqsave to handle the nested interrupt case—an interrupt handler that's itself interrupted.
The deferred work pattern:
Sometimes the interrupt handler needs to do more than just wakeup—it needs to copy data, update statistics, or perform complex logic. But it must stay fast. The solution is deferred work:
// Disk I/O completion interrupt handler (simplified)
void disk_irq_handler(int irq, void *dev_id) {
struct request *req = get_completed_request(dev_id);
// Mark request complete (very fast)
req->status = SUCCESS;
req->bytes_transferred = req->length;
// Wake the waiting process
wake_up(&req->wait_queue);
// That's it! Handler exits quickly
// The awakened process does any heavy processing
}
Modern kernels support threaded interrupt handlers (request_threaded_irq). The interrupt triggers a kernel thread rather than running in raw interrupt context. This thread can sleep, take blocking locks, and do extensive processing—while still waking other processes. This simplifies driver development at a small latency cost.
A spurious wakeup occurs when a process wakes up even though the condition it was waiting for is not true. This is not a bug—it's an expected possibility that code must handle.
Sources of spurious wakeups:
if (condition) { ... } else { sleep; } — This pattern FAILS on spurious wakeups. After waking, the condition is not rechecked, leading to proceeding with false assumptions. ALWAYS use while (!condition) { sleep; } to re-check after every wakeup.
12345678910111213141516171819202122232425262728293031323334
// WRONG: Susceptible to spurious wakeupsvoid consumer_wrong() { // Check once, then sleep if not ready if (buffer_empty()) { wait_event_interruptible(&wq, !buffer_empty()); } // BUG: If spuriously awakened but still empty, we proceed incorrectly item = buffer[--count]; // Potential underflow!} // CORRECT: Always re-check conditionvoid consumer_correct() { // Loop until condition is actually true while (buffer_empty()) { int ret = wait_event_interruptible(&wq, !buffer_empty()); if (ret == -ERESTARTSYS) { // Woken by signal - handle appropriately return handle_signal(); } // If we get here without error, check condition again // The while loop naturally re-checks } // Guaranteed: buffer is not empty item = buffer[--count]; // Safe!} // The wait_event macro does this automatically// This is CORRECT (wait_event loops internally)void consumer_wait_event() { wait_event_interruptible(&wq, !buffer_empty()); // wait_event only returns when condition is true // (or if interrupted) item = buffer[--count];}Why spurious wakeups are tolerated:
You might wonder: why not prevent all spurious wakeups? The answer is efficiency and simplicity:
The design philosophy is: make wakeup cheap and simple, require waiters to be robust.
The POSIX spec for pthread_cond_wait() explicitly states that spurious wakeups may occur. This is why all correct condition variable usage wraps the wait in a while loop. This isn't a limitation—it's a deliberate API contract.
The wakeup mechanism completes the sleep/wakeup pair, enabling efficient blocking synchronization. Let's consolidate the essential concepts:
You now understand the wakeup mechanism—how sleeping processes are restored to activity, the different wakeup strategies, and the critical integration with the scheduler. Next, we'll examine what happens when wakeup and sleep are improperly coordinated: the infamous lost wakeup problem.