Loading learning content...
In the previous page, we established why futex exists—to provide blazingly fast userspace synchronization with kernel assistance only when necessary. Now we turn to how: the actual system call interface that the Linux kernel exposes.
The futex() system call is deceptively simple on the surface—a single syscall number with different operations selected by a flag. Yet this minimalist interface enables the construction of every synchronization primitive in modern computing: mutexes, semaphores, condition variables, read-write locks, barriers, and more.
Understanding these operations in depth is essential for anyone building synchronization primitives, debugging threading bugs, or optimizing concurrent systems. The difference between using FUTEX_WAKE and FUTEX_REQUEUE in a condition variable implementation, for instance, can mean a 10x performance difference under high contention.
By the end of this page, you will understand every major futex operation: WAIT, WAKE, REQUEUE, WAKE_OP, and their behavioral variants. You'll know the exact semantics, parameters, and return values, enabling you to reason about (and debug) synchronization at the kernel interface level.
All futex operations flow through a single system call with a multiplexed interface. The operation is selected by the futex_op parameter, with additional parameters interpreted based on the operation.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
#include <linux/futex.h>#include <sys/syscall.h>#include <stdint.h> /* * The futex() system call * * long futex(uint32_t *uaddr, int futex_op, uint32_t val, * const struct timespec *timeout, /* or: uint32_t val2 */ * uint32_t *uaddr2, uint32_t val3); * * Parameters: * uaddr - Pointer to the futex word (32-bit aligned address in userspace) * futex_op - Operation + flags (e.g., FUTEX_WAIT | FUTEX_PRIVATE_FLAG) * val - Value (meaning depends on operation) * timeout - Timeout for wait operations (or val2 for some operations) * uaddr2 - Second futex address (for REQUEUE operations) * val3 - Third value (for CMP_REQUEUE, WAKE_OP) * * Returns: * >= 0 - Success (meaning depends on operation) * -1 - Error (errno set appropriately) */ // Glibc does NOT expose a futex() wrapper, so we use syscall() directlystatic inline long futex(uint32_t *uaddr, int futex_op, uint32_t val, const struct timespec *timeout, uint32_t *uaddr2, uint32_t val3) { return syscall(SYS_futex, uaddr, futex_op, val, timeout, uaddr2, val3);} /* * Common Operation Flags */#define FUTEX_WAIT 0 /* Sleep until value changes */#define FUTEX_WAKE 1 /* Wake sleeping threads */#define FUTEX_FD 2 /* (Removed in Linux 2.6.26) */#define FUTEX_REQUEUE 3 /* Move waiters to different futex */#define FUTEX_CMP_REQUEUE 4 /* Conditional requeue */#define FUTEX_WAKE_OP 5 /* Wake + conditional wake at second addr */#define FUTEX_LOCK_PI 6 /* Priority-inheritance lock */#define FUTEX_UNLOCK_PI 7 /* Priority-inheritance unlock */#define FUTEX_TRYLOCK_PI 8 /* Try to acquire PI lock */#define FUTEX_WAIT_BITSET 9 /* Wait with bitset for selective wakeup */#define FUTEX_WAKE_BITSET 10 /* Wake with bitset for selective wakeup */#define FUTEX_WAIT_REQUEUE_PI 11 /* Wait on futex, requeue to PI futex */#define FUTEX_CMP_REQUEUE_PI 12 /* Requeue to PI futex */ /* * Operation Modifiers (OR'd with operation) */#define FUTEX_PRIVATE_FLAG 128 /* Futex is process-private */#define FUTEX_CLOCK_REALTIME 256 /* Use CLOCK_REALTIME for timeout */ /* * Common combined operations */#define FUTEX_WAIT_PRIVATE (FUTEX_WAIT | FUTEX_PRIVATE_FLAG)#define FUTEX_WAKE_PRIVATE (FUTEX_WAKE | FUTEX_PRIVATE_FLAG)The FUTEX_PRIVATE_FLAG Optimization:
When a futex is known to be accessible only within a single process (not through shared memory), specifying FUTEX_PRIVATE_FLAG enables significant kernel optimizations:
Always use the private flag for intra-process synchronization. It's faster and tells the kernel exactly what you need.
Unlike most system calls, glibc does not provide a futex() wrapper function. This is intentional—futex is considered a low-level primitive for library authors, not application developers. You must call it via syscall() or use library implementations (pthread_mutex, etc.) that wrap it for you.
FUTEX_WAIT is the operation that puts a thread to sleep, waiting for a value change. It's the kernel's half of the synchronization handshake—userspace detects contention, then asks the kernel to sleep until woken.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
/* * FUTEX_WAIT Operation * * Semantics (atomic check-and-sleep): * 1. Read *uaddr * 2. If *uaddr == val: * Sleep until FUTEX_WAKE or timeout * 3. If *uaddr != val: * Return immediately with EAGAIN * * The comparison and sleep are ATOMIC from the kernel's perspective. * This prevents the lost wakeup problem. */ #include <errno.h>#include <time.h> // Basic FUTEX_WAIT usageint futex_wait(uint32_t *uaddr, uint32_t expected_val) { // Wait indefinitely if *uaddr == expected_val long ret = syscall(SYS_futex, uaddr, FUTEX_WAIT_PRIVATE, expected_val, NULL, NULL, 0); if (ret == -1) { if (errno == EAGAIN) { // Value changed before we could sleep // This is NOT an error - just means we should retry return 1; // Value changed } if (errno == EINTR) { // Interrupted by signal // Caller should typically retry return 2; // Interrupted } // Other errors are serious (EFAULT, EINVAL, etc.) return -1; } return 0; // Woken by FUTEX_WAKE} // FUTEX_WAIT with timeoutint futex_wait_timeout(uint32_t *uaddr, uint32_t expected_val, unsigned long timeout_ns) { struct timespec ts = { .tv_sec = timeout_ns / 1000000000UL, .tv_nsec = timeout_ns % 1000000000UL, }; long ret = syscall(SYS_futex, uaddr, FUTEX_WAIT_PRIVATE, expected_val, &ts, NULL, 0); if (ret == -1 && errno == ETIMEDOUT) { return 3; // Timeout occurred } // ... handle other cases as above return ret == 0 ? 0 : -1;}Why the Atomic Check Matters:
The atomicity of the comparison-and-sleep is the key to futex correctness. Consider what happens without it:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
/* * THE LOST WAKEUP PROBLEM (what futex prevents) * * Without atomic check-and-sleep: */ // BROKEN (non-atomic) implementation:void broken_wait(uint32_t *uaddr, uint32_t expected) { // Race window starts here if (*uaddr == expected) { // Check // <-- WAKEUP ARRIVES HERE (lost!) sleep_until_woken(uaddr); // Sleep } // Race window ends} // Thread A (waiter): Thread B (waker):// ----------------- -----------------// 1. Check: *lock == 1? YES// 2. Set *lock = 0// 3. FUTEX_WAKE (no waiters!)// 4. Sleep forever...// // Result: Thread A sleeps forever, never woken.// The wakeup was "lost" because it happened between check and sleep. /* * FUTEX_WAIT prevents this by making check+sleep atomic: * * Kernel implementation (conceptual): */long futex_wait_kernel(uint32_t *uaddr, uint32_t val) { spin_lock(&futex_hash_bucket->lock); // Kernel lock held // Atomically check value AND add to wait queue if (get_user(current_val, uaddr) != 0) return -EFAULT; if (current_val != val) { spin_unlock(&futex_hash_bucket->lock); return -EAGAIN; // Value changed, don't sleep } // Add ourselves to the wait queue while holding the lock list_add(¤t->futex_entry, &wait_queue); spin_unlock(&futex_hash_bucket->lock); // Now safe to sleep - any WAKE will find us in the queue schedule(); // Sleep until woken return 0;}| Return | errno | Meaning | Typical Response |
|---|---|---|---|
| 0 | N/A | Woken by FUTEX_WAKE | Check condition, proceed or re-wait |
| -1 | EAGAIN | *uaddr != val when called | Retry the operation (value changed) |
| -1 | ETIMEDOUT | Timeout expired | Handle timeout condition |
| -1 | EINTR | Interrupted by signal | Usually retry the wait |
| -1 | EFAULT | uaddr not valid memory | Bug in caller (invalid pointer) |
| -1 | EINVAL | Invalid operation/alignment | Bug in caller (misuse of API) |
FUTEX_WAIT can return even without FUTEX_WAKE being called—due to signals, implementation details, or other reasons. Correct code must ALWAYS recheck the condition after waking and re-wait if necessary. Never assume that wakeup means the condition is satisfied.
FUTEX_WAKE is the complement to FUTEX_WAIT—it wakes threads sleeping on a futex. The operation is simple: wake up to N threads from the wait queue associated with the futex address.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
/* * FUTEX_WAKE Operation * * Semantics: * Wake up to 'val' threads sleeping on the futex at 'uaddr'. * Threads are woken in FIFO order (first sleeper woken first). * * Returns: * Number of threads actually woken (0 to val) */ // Wake exactly one waiter (common for mutexes)int futex_wake_one(uint32_t *uaddr) { long ret = syscall(SYS_futex, uaddr, FUTEX_WAKE_PRIVATE, 1, // Wake at most 1 thread NULL, NULL, 0); return (int)ret; // Returns count of threads woken} // Wake all waiters (common for broadcast operations)int futex_wake_all(uint32_t *uaddr) { long ret = syscall(SYS_futex, uaddr, FUTEX_WAKE_PRIVATE, INT_MAX, // Wake all threads NULL, NULL, 0); return (int)ret;} /* * Example: Simple mutex unlock with conditional wake */#define UNLOCKED 0#define LOCKED 1#define CONTENDED 2 void mutex_unlock(uint32_t *futex_word) { uint32_t prev = atomic_exchange(futex_word, UNLOCKED); if (prev == CONTENDED) { // There were waiters - wake one syscall(SYS_futex, futex_word, FUTEX_WAKE_PRIVATE, 1, NULL, NULL, 0); } // If prev == LOCKED (no waiters), skip the syscall entirely} /* * Kernel implementation (conceptual): */long futex_wake_kernel(uint32_t *uaddr, int nr_wake) { struct futex_hash_bucket *bucket = hash_futex(uaddr); int woken = 0; spin_lock(&bucket->lock); list_for_each_entry(waiter, &bucket->waiters, list) { if (waiter->uaddr == uaddr) { // Remove from wait queue list_del(&waiter->list); // Mark thread as runnable wake_up_process(waiter->task); if (++woken >= nr_wake) break; } } spin_unlock(&bucket->lock); return woken;}Wake Count Strategies:
The val parameter (number of threads to wake) is a crucial design decision:
| Value | Use Case | Example |
|---|---|---|
| 1 | Mutex unlock, semaphore V(1) | Wake next waiter for exclusive access |
| N | Semaphore V(N), bounded wakeup | Release N permits |
| INT_MAX | Broadcast, barrier release | Wake everyone waiting |
The Thundering Herd Problem:
Waking all waiters (INT_MAX) can cause a thundering herd: all threads wake simultaneously, contend for resources, and all but one must go back to sleep. This wastes CPU and increases latency. Use sparingly—typically only for true broadcasts where all waiters should proceed.
If no threads are waiting on the futex, FUTEX_WAKE returns immediately with 0. The kernel quickly checks the hash bucket and returns—there's no work to do. This makes conditional waking (checking for waiters before calling wake) an optimization, not a requirement for correctness.
FUTEX_REQUEUE and its safer variant FUTEX_CMP_REQUEUE are sophisticated operations designed specifically for implementing condition variables efficiently. They move waiters from one futex to another without fully waking them.
The Problem with Naive Condition Variables:
Consider implementing pthread_cond_signal() naively:
12345678910111213141516171819202122232425262728293031323334353637
/* * NAIVE (inefficient) condition variable implementation */typedef struct { uint32_t futex; // Waiters sleep on this mutex_t *mutex; // Associated mutex} naive_cond_t; void naive_cond_wait(naive_cond_t *cond, mutex_t *mutex) { uint32_t val = cond->futex; mutex_unlock(mutex); // Release mutex futex_wait(&cond->futex, val); // Sleep on condvar mutex_lock(mutex); // Re-acquire mutex} void naive_cond_signal(naive_cond_t *cond) { atomic_fetch_add(&cond->futex, 1); // Change value futex_wake_one(&cond->futex); // Wake one waiter} /* * THE PROBLEM: * * When we wake a waiter, it immediately tries to reacquire the mutex. * If the signaler still holds the mutex: * * Signaler: Waiter: * 1. signal() wakes waiter * 2. Wakes up * 3. Tries mutex_lock() * 4. Mutex held - goes back to sleep! * 5. Eventually unlocks * 6. ANOTHER wake needed! * * This is called "hurry up and wait" - wasteful context switches. */FUTEX_REQUEUE Solution:
Instead of waking the waiter (who will just block on the mutex), we requeue them directly to the mutex's wait queue. When the signaler releases the mutex, the waiter is already in the right place—no extra wakeup needed.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
/* * FUTEX_CMP_REQUEUE Operation * * futex(uaddr, FUTEX_CMP_REQUEUE, nr_wake, nr_requeue, uaddr2, val3) * * Semantics (if *uaddr == val3): * 1. Wake up to 'nr_wake' threads waiting on uaddr * 2. Move up to 'nr_requeue' remaining waiters from uaddr to uaddr2 * 3. Return total number of threads woken + requeued * * If *uaddr != val3: return -EAGAIN (prevents ABA issues) */ typedef struct { uint32_t seq; // Sequence number (changes on signal) mutex_t *mutex; // Associated mutex (has its own futex)} efficient_cond_t; void cond_wait(efficient_cond_t *cond, mutex_t *mutex) { uint32_t seq = cond->seq; mutex_unlock(mutex); // Wait for signal (seq to change) while (cond->seq == seq) { futex_wait(&cond->seq, seq); } mutex_lock(mutex);} void cond_signal(efficient_cond_t *cond) { uint32_t old_seq = atomic_fetch_add(&cond->seq, 1); // Requeue: wake 1 thread, move rest to mutex's futex // Parameters: wake 1, requeue INT_MAX, from cond to mutex syscall(SYS_futex, &cond->seq, FUTEX_CMP_REQUEUE_PRIVATE, 1, // nr_wake: wake 1 thread INT_MAX, // nr_requeue: move all others (cast to timespec*) &cond->mutex->futex, // uaddr2: mutex's futex old_seq + 1); // val3: expected new sequence value} void cond_broadcast(efficient_cond_t *cond) { uint32_t old_seq = atomic_fetch_add(&cond->seq, 1); // Wake 1, requeue all to mutex (they'll wake when mutex releases) syscall(SYS_futex, &cond->seq, FUTEX_CMP_REQUEUE_PRIVATE, 1, // nr_wake: wake 1 (to avoid thundering herd) INT_MAX, // nr_requeue: move all others &cond->mutex->futex, old_seq + 1);}| Parameter | Name | Description |
|---|---|---|
| uaddr | Source futex | Futex to take waiters FROM |
| futex_op | Operation | FUTEX_CMP_REQUEUE_PRIVATE |
| val | nr_wake | Max threads to fully wake |
| timeout (as uint32_t) | nr_requeue | Max threads to move (not wake) |
| uaddr2 | Dest futex | Futex to move waiters TO |
| val3 | Expected | Expected value of *uaddr (CMP check) |
FUTEX_CMP_REQUEUE adds an atomic comparison before requeuing. This prevents race conditions where the futex value changed between the caller's decision to requeue and the actual operation. Always prefer CMP_REQUEUE in production code—plain REQUEUE is considered deprecated and unsafe.
FUTEX_WAKE_OP is the most complex futex operation—it performs an atomic read-modify-write on one futex, conditionally wakes on that futex, and optionally wakes on a second futex. This is useful for implementing semaphores with sophisticated waking strategies.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
/* * FUTEX_WAKE_OP Operation * * futex(uaddr, FUTEX_WAKE_OP, nr_wake, nr_wake2, uaddr2, op_arg) * * Semantics (all atomic): * 1. Read old value of *uaddr2 * 2. Apply operation 'op' to *uaddr2 with argument 'oparg': *uaddr2 = op(*uaddr2, oparg) * 3. Wake up to nr_wake threads on uaddr * 4. If comparison 'cmp' succeeds (comparing old *uaddr2 with cmparg): * Also wake up to nr_wake2 threads on uaddr2 * 5. Return total threads woken * * The op_arg encodes: operation | comparison | oparg | cmparg */ // op_arg encoding (32 bits)// Bits 0-11: cmparg (12 bits)// Bits 12-23: oparg (12 bits) // Bits 24-27: cmp (4 bits)// Bits 28-31: op (4 bits) #define FUTEX_OP(op, oparg, cmp, cmparg) \ (((op) << 28) | ((cmp) << 24) | \ (((oparg) & 0xfff) << 12) | ((cmparg) & 0xfff)) // Operations (applied to *uaddr2)#define FUTEX_OP_SET 0 /* *uaddr2 = oparg */#define FUTEX_OP_ADD 1 /* *uaddr2 += oparg */#define FUTEX_OP_OR 2 /* *uaddr2 |= oparg */#define FUTEX_OP_ANDN 3 /* *uaddr2 &= ~oparg */#define FUTEX_OP_XOR 4 /* *uaddr2 ^= oparg */ // Comparisons (compare old *uaddr2 with cmparg)#define FUTEX_OP_CMP_EQ 0 /* old == cmparg */#define FUTEX_OP_CMP_NE 1 /* old != cmparg */#define FUTEX_OP_CMP_LT 2 /* old < cmparg (signed) */#define FUTEX_OP_CMP_LE 3 /* old <= cmparg (signed) */#define FUTEX_OP_CMP_GT 4 /* old > cmparg (signed) */#define FUTEX_OP_CMP_GE 5 /* old >= cmparg (signed) */ /* * Example: Semaphore with wake-on-transition * * We want to wake waiters only when count transitions from 0 to positive */typedef struct { uint32_t count; // Semaphore count uint32_t wake_futex; // Separate futex for waking} smart_sem_t; void sem_post(smart_sem_t *sem) { // Atomically: count++, if old count was 0, wake a waiter uint32_t op_arg = FUTEX_OP( FUTEX_OP_ADD, 1, // *count += 1 FUTEX_OP_CMP_EQ, 0 // Wake if old count == 0 ); syscall(SYS_futex, &sem->wake_futex, FUTEX_WAKE_OP_PRIVATE, 1, // Wake 1 on wake_futex (always) 1, // Wake 1 on count if comparison succeeds &sem->count, // uaddr2 = &count op_arg);}The power of FUTEX_WAKE_OP is that the read-modify-write and conditional wake are atomic with respect to other futex operations. This enables lock-free implementations of complex primitives without additional synchronization. However, this complexity is rarely needed—most applications do fine with WAIT/WAKE/REQUEUE.
FUTEX_WAIT_BITSET and FUTEX_WAKE_BITSET extend the basic wait/wake interface with a 32-bit mask that enables selective waking of subsets of waiters.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
/* * FUTEX_WAIT_BITSET / FUTEX_WAKE_BITSET Operations * * Each waiter specifies a 32-bit bitmask when calling WAIT_BITSET. * When WAKE_BITSET is called, only waiters whose bitmask ANDs non-zero * with the wake bitmask are woken. * * This is useful for: * - Read-write locks (wake readers vs writers separately) * - Priority classes (wake high-priority first) * - Condition variable with multiple predicates */ // futex(uaddr, FUTEX_WAIT_BITSET, val, timeout, NULL, bitset)// futex(uaddr, FUTEX_WAKE_BITSET, nr_wake, NULL, NULL, bitset) #define FUTEX_BITSET_MATCH_ANY 0xffffffff // Matches all (backward compat) // Example: Read-write lock with selective wake #define READER_BIT 0x1#define WRITER_BIT 0x2 typedef struct { uint32_t state; // Lock state // High bits: reader count, bit 0: writer held, bit 1: writers waiting} rwlock_t; void rwlock_read_lock(rwlock_t *rw) { // ... try to acquire read lock ... // If blocked, wait with READER_BIT syscall(SYS_futex, &rw->state, FUTEX_WAIT_BITSET_PRIVATE, expected_val, NULL, NULL, READER_BIT);} void rwlock_write_unlock(rwlock_t *rw) { // Release write lock... atomic_store(&rw->state, new_state); // Wake ALL readers (they can all proceed) syscall(SYS_futex, &rw->state, FUTEX_WAKE_BITSET_PRIVATE, INT_MAX, NULL, NULL, READER_BIT);} void rwlock_read_unlock(rwlock_t *rw) { // Decrement reader count... uint32_t new_state = atomic_fetch_sub(&rw->state, 1) - 1; if ((new_state & READER_COUNT_MASK) == 0) { // Last reader - wake ONE writer syscall(SYS_futex, &rw->state, FUTEX_WAKE_BITSET_PRIVATE, 1, NULL, NULL, WRITER_BIT); }} /* * Bitset advantages: * * Without bitset: All waiters on same futex, must scan/filter * With bitset: Kernel filters efficiently during wake * * Use cases: * - Reader preference vs writer preference in rwlocks * - Multi-waitqueue in single futex word (saves memory) * - Barrier phases (wake only threads in current phase) */FUTEX_WAIT_BITSET also introduces proper absolute timeout support. Unlike FUTEX_WAIT (relative timeout with CLOCK_MONOTONIC), FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME flag enables waiting until an absolute wall-clock time. This is essential for implementing pthread_cond_timedwait() correctly.
Robust futex usage requires careful handling of all possible return values and edge cases. Here we consolidate the error semantics and discuss proper handling strategies.
| errno | Operations | Cause | Handling |
|---|---|---|---|
| EAGAIN | WAIT, CMP_REQUEUE | Futex value != expected | Normal: retry with new value |
| ETIMEDOUT | WAIT, WAIT_BITSET | Timeout expired | Normal: handle timeout logic |
| EINTR | WAIT, WAIT_BITSET | Signal interrupted | Normal: typically retry wait |
| EFAULT | All ops | uaddr not accessible | Bug: invalid pointer |
| EINVAL | All ops | Invalid args/alignment | Bug: API misuse |
| ENOSYS | Some ops | Operation not supported | Kernel too old or disabled |
| EDEADLK | PI variants | Would deadlock | Bug: lock ordering violation |
| ESRCH | PI variants | Owner thread doesn't exist | Bug: lock state corruption |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
/* * Robust FUTEX_WAIT wrapper with proper error handling */#include <errno.h>#include <signal.h> typedef enum { WAIT_WOKEN, // Successfully woken by FUTEX_WAKE WAIT_CHANGED, // Value changed, should retry WAIT_TIMEOUT, // Timeout expired WAIT_INTERRUPTED, // Interrupted by signal WAIT_ERROR, // Unrecoverable error} wait_result_t; wait_result_t robust_futex_wait(uint32_t *uaddr, uint32_t expected, const struct timespec *timeout) { while (1) { long ret = syscall(SYS_futex, uaddr, FUTEX_WAIT_PRIVATE, expected, timeout, NULL, 0); if (ret == 0) { return WAIT_WOKEN; } switch (errno) { case EAGAIN: // Value changed between caller's check and our wait // This is expected - caller should re-check condition return WAIT_CHANGED; case ETIMEDOUT: // Timeout expired return WAIT_TIMEOUT; case EINTR: // Interrupted by signal // Decision: retry or return? // For most uses, we should retry unless there's a // pending signal that the application wants to handle if (/* application wants to handle signals */) { return WAIT_INTERRUPTED; } // Otherwise, retry the wait continue; case EFAULT: // This should never happen with valid code // Log error and abort in debug, return error in prod assert(0 && "FUTEX_WAIT: invalid address"); return WAIT_ERROR; case EINVAL: // Alignment issue or invalid operation assert(0 && "FUTEX_WAIT: invalid arguments"); return WAIT_ERROR; default: // Unknown error - log and return return WAIT_ERROR; } }} /* * Common pattern for condition wait loop */void condition_wait_loop(condition_t *cond, mutex_t *mutex) { uint32_t seq = cond->sequence; mutex_unlock(mutex); while (1) { // Check if condition is already satisfied if (/* condition is true */) { break; } wait_result_t res = robust_futex_wait(&cond->sequence, seq, NULL); switch (res) { case WAIT_WOKEN: case WAIT_CHANGED: // Either woken or value changed - recheck condition seq = cond->sequence; continue; case WAIT_TIMEOUT: // Shouldn't happen with NULL timeout assert(0); break; case WAIT_INTERRUPTED: case WAIT_ERROR: // Depending on application needs break; } } mutex_lock(mutex);}EINTR handling is subtle. If your application uses signals for async notification (SIGUSR1, SIGALRM), you may want to surface EINTR to the caller. If signals are only for system use (SIGCHLD, SIGPIPE), you typically want to silently retry. Know your application's signal model.
We've covered the complete set of futex operations available to userspace. Let's consolidate the key takeaways.
| Operation | Purpose | When Used |
|---|---|---|
| FUTEX_WAIT | Sleep until woken | Lock contention, condvar wait |
| FUTEX_WAKE | Wake waiters | Lock release, condvar signal |
| FUTEX_CMP_REQUEUE | Transfer waiters | Condvar broadcast (to mutex) |
| FUTEX_WAKE_OP | Compound wake | Complex semaphores |
| FUTEX_WAIT_BITSET | Selective wait | RW locks, multiplexing |
| FUTEX_WAKE_BITSET | Selective wake | RW locks, priority wake |
What's Next:
With the operation interface understood, the next page explores kernel involvement—what actually happens inside the Linux kernel when you make a futex call. We'll trace through the kernel code paths, understand the wait queue hash table, and see how the kernel balances correctness and performance.
You now understand the complete futex system call interface—every major operation, its semantics, parameters, and return values. This knowledge enables you to build synchronization primitives from scratch, debug threading issues at the kernel interface level, and understand what your threading library is doing under the hood.