Loading content...
The final piece of our futex journey takes us into the actual Linux kernel implementation. While we've covered concepts and interfaces, seeing how Linux engineers solved the hard problems—and continue to evolve the implementation—provides invaluable insight.
This page examines the kernel source structure, traces through real code paths, explores priority inheritance futexes (a critical feature for real-time systems), and covers the evolution from futex's introduction in 2002 to modern kernels. By the end, you'll understand not just what futex does, but how Linux makes it happen.
By the end of this page, you will understand the Linux kernel's futex source structure, the implementation of priority inheritance futexes, the evolution of futex across kernel versions, and advanced features like robust futexes and futex2.
The futex implementation lives in the kernel/futex/ directory (since Linux 5.15; previously kernel/futex.c). The code is organized into focused modules for maintainability.
1234567891011121314151617181920212223242526272829303132333435
# Linux kernel futex source structure (as of Linux 6.x) kernel/futex/├── core.c # Core infrastructure: hash table, key computation├── futex.h # Internal header: structs, inline functions├── pi.c # Priority inheritance futex implementation├── requeue.c # FUTEX_REQUEUE and variants├── waitwake.c # FUTEX_WAIT and FUTEX_WAKE├── Makefile # Build configuration # Key files and their responsibilities: ## core.c (~1200 lines)# - futex_hash_bucket: Wait queue hash table# - get_futex_key(): Compute futex key from userspace address# - futex_q: Per-waiter queue entry structure# - hash_futex(): Map key to bucket# - futex_cmpxchg_release(): Atomic operations on user memory ## waitwake.c (~500 lines)# - futex_wait(): The FUTEX_WAIT implementation# - futex_wake(): The FUTEX_WAKE implementation# - futex_wait_queue_me(): Add to queue and sleep# - futex_wake_mark(): Mark waiter for waking ## requeue.c (~700 lines)# - futex_requeue(): Move waiters between futexes# - futex_wake_op(): Compound wake operation# - requeue_pi(): Priority inheritance requeue ## pi.c (~800 lines)# - futex_lock_pi(): Priority inheritance lock acquire# - futex_unlock_pi(): Priority inheritance unlock# - fixup_pi_state_owner(): Handle PI state transfers# - rt_mutex integration: Uses kernel RT mutexes internallyWhy the Split?
The original kernel/futex.c grew to over 4000 lines and became difficult to maintain. In Linux 5.15 (2021), Thomas Gleixner refactored it into the current modular structure. Each file now has a clear responsibility:
The Linux kernel source is available at kernel.org or on GitHub. To explore futex, start with kernel/futex/core.c and trace the main system call entry point. Use cscope or ctags for navigation. The code is well-commented—kernel developers expect their code to be read.
All futex operations enter the kernel through a single system call handler. Let's trace the entry point and dispatch logic.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109
/* * System call entry point for futex * Source: kernel/futex/syscalls.c * * This is called when userspace executes syscall(SYS_futex, ...) */ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val, const struct __kernel_timespec __user *, utime, u32 __user *, uaddr2, u32, val3){ int ret, cmd = op & FUTEX_CMD_MASK; ktime_t t, *tp = NULL; struct timespec64 ts; /* * STEP 1: Handle timeout conversion * * For wait operations, convert userspace timespec to kernel ktime. * Different operations interpret the timeout differently: * - FUTEX_WAIT: relative timeout * - FUTEX_WAIT_BITSET: absolute timeout * - FUTEX_LOCK_PI: absolute timeout */ if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI || cmd == FUTEX_WAIT_BITSET || cmd == FUTEX_WAIT_REQUEUE_PI)) { if (get_timespec64(&ts, utime)) return -EFAULT; if (!timespec64_valid(&ts)) return -EINVAL; t = timespec64_to_ktime(ts); if (cmd == FUTEX_WAIT) t = ktime_add_safe(ktime_get(), t); // Relative -> absolute tp = &t; } /* * STEP 2: Dispatch to operation-specific handler */ return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);} /* * Main futex dispatch function */long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, u32 __user *uaddr2, u32 val2, u32 val3){ int cmd = op & FUTEX_CMD_MASK; unsigned int flags = 0; /* * STEP 3: Parse flags */ if (!(op & FUTEX_PRIVATE_FLAG)) flags |= FLAGS_SHARED; if (op & FUTEX_CLOCK_REALTIME) { flags |= FLAGS_CLOCKRT; if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI && cmd != FUTEX_LOCK_PI) return -ENOSYS; } /* * STEP 4: Dispatch based on command */ switch (cmd) { case FUTEX_WAIT: return futex_wait(uaddr, flags, val, timeout, FUTEX_BITSET_MATCH_ANY); case FUTEX_WAIT_BITSET: return futex_wait(uaddr, flags, val, timeout, val3); case FUTEX_WAKE: return futex_wake(uaddr, flags, val, FUTEX_BITSET_MATCH_ANY); case FUTEX_WAKE_BITSET: return futex_wake(uaddr, flags, val, val3); case FUTEX_REQUEUE: return futex_requeue(uaddr, flags, uaddr2, val, val2, NULL, 0); case FUTEX_CMP_REQUEUE: return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 0); case FUTEX_WAKE_OP: return futex_wake_op(uaddr, flags, uaddr2, val, val2, val3); case FUTEX_LOCK_PI: return futex_lock_pi(uaddr, flags, timeout, 0); case FUTEX_UNLOCK_PI: return futex_unlock_pi(uaddr, flags); case FUTEX_TRYLOCK_PI: return futex_lock_pi(uaddr, flags, NULL, 1); case FUTEX_WAIT_REQUEUE_PI: return futex_wait_requeue_pi(uaddr, flags, val, timeout, val3, uaddr2); case FUTEX_CMP_REQUEUE_PI: return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1); } return -ENOSYS; // Unknown operation}The SYSCALL_DEFINE6 macro generates the actual system call handler with proper type checking, tracing hooks, and ABI handling. The '6' means it takes 6 arguments. This macro is part of Linux's syscall infrastructure that handles differences between 32-bit and 64-bit calling conventions.
Priority inversion is a classic problem in real-time systems: a high-priority task waits for a lock held by a low-priority task, which is preempted by a medium-priority task. The high-priority task is effectively blocked by the medium-priority task—a priority inversion.
Priority Inheritance (PI) solves this: when a high-priority task blocks on a lock, the lock holder temporarily inherits the high priority, preventing preemption by medium-priority tasks.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
/* * Priority Inheritance Futex Implementation * Source: kernel/futex/pi.c * * PI futexes differ from regular futexes: * 1. The futex word contains the TID of the owner (not just 0/1/2) * 2. The kernel tracks ownership and adjusts priorities * 3. Uses the kernel's RT mutex infrastructure internally */ /* * PI futex word format (32 bits): * * Bits 0-29: Owner's TID (thread ID) * Bit 30: FUTEX_WAITERS - there are waiters * Bit 31: FUTEX_OWNER_DIED - owner died while holding */#define FUTEX_TID_MASK 0x3fffffff#define FUTEX_WAITERS 0x40000000#define FUTEX_OWNER_DIED 0x80000000 /* * PI state structure - tracks priority inheritance chain */struct futex_pi_state { struct list_head list; // All PI states for this task struct rt_mutex_base pi_mutex; // The RT mutex for PI tracking struct task_struct *owner; // Current owner refcount_t refcount; // Reference count union futex_key key; // Which futex this is for}; /* * Acquiring a PI futex * * Much more complex than regular futex due to priority tracking */int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int trylock){ struct futex_hash_bucket *hb; struct futex_q q = futex_q_init; int ret; /* * STEP 1: Get futex key */ ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key); if (ret) return ret; /* * STEP 2: Fast path - try to acquire uncontended * * If futex word is 0, try to set it to our TID */ ret = futex_lock_pi_atomic(uaddr, &q, NULL, current->pid, NULL); if (ret == 1) { return 0; // Got it! Fast path success. } if (ret < 0) { return ret; // Error } /* * STEP 3: Slow path - need to wait with PI * * ret == 0 means someone else holds it */ hb = hash_futex(&q.key); spin_lock(&hb->lock); /* * STEP 4: Set up PI state * * Find or create the futex_pi_state for this futex. * Attach ourselves to the RT mutex wait chain. */ ret = futex_wait_setup(uaddr, val, flags, &q, &hb); if (ret) goto out_unlock; ret = attach_to_pi_state(uaddr, &q, current); if (ret) goto out_unlock; /* * STEP 5: Now the interesting part - priority inheritance * * The RT mutex code will: * 1. Add us to the RT mutex wait list * 2. Boost owner's priority to ours if we're higher * 3. Potentially chain-boost if owner is also blocked */ ret = rt_mutex_wait_proxy_lock(&q.pi_state->pi_mutex, time, &rt_waiter); /* * STEP 6: Woken up - we now own the lock * * The RT mutex code transferred ownership to us. * Update the userspace futex word to our TID. */ fixup_owner(uaddr, &q, current); out_unlock: spin_unlock(&hb->lock); return ret;} /* * Priority inheritance chain example: * * Task A (priority 99, highest) wants lock held by B * Task B (priority 50) wants lock held by C * Task C (priority 10, lowest) holds both locks * * Without PI: * C runs at priority 10, gets preempted by everything * A waits potentially forever * * With PI: * C inherits A's priority 99 through the chain * C runs at priority 99, completes quickly * B inherits A's priority 99 (or its own 50, whichever higher) * A finally runs */| Aspect | Regular Futex | PI Futex |
|---|---|---|
| Futex word content | 0/1/2 (state) | TID of owner + flags |
| Priority tracking | None | Full chain boosting |
| Kernel state | Hash table only | rt_mutex + pi_state |
| Use case | General synchronization | Real-time systems |
| Performance | Faster | More overhead for PI |
| POSIX equivalent | PTHREAD_MUTEX_NORMAL | PTHREAD_PRIO_INHERIT |
PI futexes are significantly more complex and have higher overhead. Use them only when real-time guarantees are needed. For general-purpose synchronization, regular futexes are faster and sufficient.
What happens if a thread crashes while holding a lock? With naive futexes, other threads wait forever—the lock is never released. Robust futexes solve this: the kernel detects owner death and marks the lock as recoverable.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
/* * Robust Futex Mechanism * * The key idea: each thread maintains a list of held robust locks. * When the thread exits (normally or via crash), the kernel walks * this list and marks each lock as FUTEX_OWNER_DIED. */ /* * Registering the robust list with the kernel * * Called early in thread startup (glibc does this automatically * for PTHREAD_MUTEX_ROBUST mutexes). */#include <linux/futex.h> struct robust_list_head { struct robust_list *list; // Head of circular list long futex_offset; // Offset to futex word in struct struct robust_list *list_op_pending; // Currently being acquired/released}; // System call to register the robust listlong set_robust_list(struct robust_list_head *head, size_t len); /* * How userspace tracks robust locks * * Each mutex structure has a 'list' member that links it into * the thread's robust list. When acquiring, add to list. * When releasing, remove from list. */ struct robust_mutex { struct robust_list list; // Link in robust list uint32_t futex; // The actual futex word // ... other fields}; void acquire_robust_mutex(struct robust_mutex *m) { // Add to pending (in case we crash during acquire) __robust_list_head.list_op_pending = &m->list; // Acquire the lock (PI or regular futex) // ... futex operations ... // Move from pending to held list add_to_robust_list(&m->list); __robust_list_head.list_op_pending = NULL;} /* * Kernel side: what happens on thread exit * Source: kernel/exit.c -> exit_robust_list() */void exit_robust_list(struct task_struct *curr){ struct robust_list_head __user *head; struct robust_list __user *entry, *next; unsigned int limit = ROBUST_LIST_LIMIT; // Get the robust list head for this task head = curr->robust_list; if (!head) return; // No robust list registered /* * Handle pending operation first * * If we crashed during acquire/release, handle that entry specially */ if (head->list_op_pending) { handle_futex_death((void *)head->list_op_pending + head->futex_offset, curr); } /* * Walk the robust list and mark each lock as OWNER_DIED */ entry = head->list; while (entry != (struct robust_list __user *)head && --limit) { u32 __user *uaddr = (u32 *)((char *)entry + head->futex_offset); handle_futex_death(uaddr, curr); // Follow the list if (get_user(next, &entry->next)) break; entry = next; }} /* * Marking a lock as owner-died */static void handle_futex_death(u32 __user *uaddr, struct task_struct *curr){ u32 uval, nval; // Read current value if (get_user(uval, uaddr)) return; // Only if we actually owned it if ((uval & FUTEX_TID_MASK) != curr->pid) return; // Atomically set FUTEX_OWNER_DIED, keeping WAITERS bit do { nval = (uval & FUTEX_WAITERS) | FUTEX_OWNER_DIED; } while (cmpxchg_futex_value_locked(uaddr, uval, nval) != uval); // Wake any waiters so they can recover if (uval & FUTEX_WAITERS) futex_wake(uaddr, FLAGS_SHARED, 1, FUTEX_BITSET_MATCH_ANY);} /* * Userspace recovery * * When a waiter wakes and sees FUTEX_OWNER_DIED, it can: * 1. Take ownership of the lock * 2. Run recovery code (check data consistency) * 3. Clear FUTEX_OWNER_DIED and FUTEX_WAITERS * 4. Continue with the acquired lock * * pthread_mutex_consistent() performs this recovery. */In POSIX, use pthread_mutexattr_setrobust() with PTHREAD_MUTEX_ROBUST. The kernel handles list registration automatically. When pthread_mutex_lock returns EOWNERDEAD, call pthread_mutex_consistent() after recovering state integrity, or pthread_mutex_unlock() if recovery isn't possible.
Futex has evolved significantly since its introduction in 2002. Understanding this evolution helps appreciate current behavior and anticipate future changes.
| Kernel Version | Year | Features Added |
|---|---|---|
| 2.5.7 | 2002 | Initial futex: WAIT, WAKE, FD |
| 2.5.40 | 2002 | FUTEX_REQUEUE for efficient condvars |
| 2.6.0 | 2003 | FUTEX_CMP_REQUEUE (safer requeue) |
| 2.6.12 | 2005 | FUTEX_WAKE_OP compound operations |
| 2.6.17 | 2006 | Priority inheritance (PI) futexes |
| 2.6.18 | 2006 | Robust futexes (handle owner death) |
| 2.6.25 | 2008 | FUTEX_WAIT_BITSET, FUTEX_WAKE_BITSET |
| 2.6.26 | 2008 | Removed FUTEX_FD (security issues) |
| 2.6.31 | 2009 | Private futex hash optimizations |
| 3.14 | 2014 | Better NUMA-aware hash tables |
| 4.2 | 2015 | Futex PI improvements |
| 5.15 | 2021 | Code refactoring into kernel/futex/ |
| 5.16+ | 2022+ | futex2 proposals (not yet merged) |
Key Evolutionary Insights:
From simple to complex: Futex started with just wait/wake. Real-world needs drove addition of requeue (for condvars), PI (for real-time), and robust (for fault tolerance).
Security lessons: FUTEX_FD was removed because it leaked kernel information. Security considerations affect all futex development.
Performance refinements: Private futex optimizations, NUMA-aware hashing, and hash table sizing reflect continuous performance tuning.
Ongoing evolution: futex2 proposals aim to address remaining limitations (variable-size futexes, multiple-wait operations).
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
/* * Futex2 Proposals (as of 2023, not yet merged) * * The current futex syscall has limitations that futex2 aims to address: */ /* * LIMITATION 1: Fixed 32-bit futex word * * Current: Futex word is always uint32_t * Problem: 64-bit atomic operations becoming common * * Proposed: Variable-size futexes (8, 16, 32, 64-bit) */struct futex_waitv { uint64_t val; // Expected value (any size) uint64_t uaddr; // Address of futex uint32_t flags; // FUTEX2_SIZE_U8/U16/U32/U64, PRIVATE, etc. uint32_t __reserved;}; /* * LIMITATION 2: Can only wait on one futex at a time * * Current: One FUTEX_WAIT per syscall * Problem: Polling multiple futexes requires threads or epoll hacks * * Proposed: Wait on multiple futexes simultaneously */long futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes, unsigned int flags, struct timespec *timeout); /* * Usage example: Wait for either of two events */struct futex_waitv events[2] = { { .uaddr = &event1, .val = 0, .flags = FUTEX2_SIZE_U32 | FUTEX2_PRIVATE }, { .uaddr = &event2, .val = 0, .flags = FUTEX2_SIZE_U32 | FUTEX2_PRIVATE },}; ret = futex_waitv(events, 2, 0, &timeout);// Returns index of futex that woke us, or -1 on timeout /* * LIMITATION 3: Timeout handling quirks * * Current: Inconsistent timeout semantics across operations * Proposed: Unified, explicit timeout handling with newer interfaces */ /* * STATUS: As of 2023, futex_waitv (multi-wait) has been partially * accepted for enabling Windows gaming via Wine/Proton (needs waiting * on multiple synchronization objects). Full futex2 is still evolving. */The original futex syscall will never be removed or incompatibly changed—Linux kernel ABI is stable. New features come through new operations or new syscalls (like futex2), not modifications to existing behavior. Existing code will continue to work indefinitely.
Most applications use futex indirectly through glibc's NPTL (Native POSIX Thread Library). Understanding how NPTL uses futex completes our picture.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
/* * How glibc NPTL uses futex * Source: glibc/nptl/*.c */ /* * pthread_mutex_t structure (simplified, glibc 2.34+) */typedef struct { int __lock; // Futex word unsigned int __count; // Recursive lock count int __owner; // Owner TID (for error checking) unsigned int __nusers; // Debugging: number of users int __kind; // Mutex type (normal, recursive, etc.) // ... additional fields for robust, PI, etc.} pthread_mutex_t; /* * pthread_mutex_lock implementation (simplified) */int pthread_mutex_lock(pthread_mutex_t *mutex){ int type = mutex->__kind & PTHREAD_MUTEX_KIND_MASK; switch (type) { case PTHREAD_MUTEX_NORMAL: return lll_lock(&mutex->__lock); // Direct futex path case PTHREAD_MUTEX_RECURSIVE: if (mutex->__owner == pthread_self()) { mutex->__count++; return 0; // Already hold it, just increment count } lll_lock(&mutex->__lock); mutex->__owner = pthread_self(); mutex->__count = 1; return 0; case PTHREAD_MUTEX_ERRORCHECK: if (mutex->__owner == pthread_self()) return EDEADLK; // Error: would deadlock lll_lock(&mutex->__lock); mutex->__owner = pthread_self(); return 0; // ... PI and robust variants }} /* * The low-level lock (lll) functions in NPTL * These directly use futex operations */ // Lock acquire (fast path inline)static inline void lll_lock(int *futex){ if (__glibc_likely(atomic_compare_exchange_weak(futex, 0, 1))) return; // Fast path: acquired lll_lock_wait(futex); // Slow path} // Slow path: futex wait loopvoid lll_lock_wait(int *futex){ // Set to contended state if (atomic_exchange(futex, 2) == 2) goto do_futex; // Someone else was already waiting // Try once more after setting contended while (atomic_exchange(futex, 2) != 0) { do_futex: // Actually sleep via futex syscall(SYS_futex, futex, FUTEX_WAIT | FUTEX_PRIVATE_FLAG, 2, NULL); }} /* * pthread_cond_t and condition variable implementation * Uses FUTEX_CMP_REQUEUE for efficient signaling */ int pthread_cond_signal(pthread_cond_t *cond){ atomic_fetch_add(&cond->__wrefs, 8); // Increment sequence // Requeue: wake 1, requeue rest to mutex syscall(SYS_futex, &cond->__wseq, FUTEX_CMP_REQUEUE | PRIVATE, 1, // Wake 1 thread INT_MAX, // Requeue all others &cond->__mutex->__lock, // To the mutex cond->__wseq); // Expected sequence return 0;}NPTL Design Philosophy:
NPTL was designed around futex from the ground up (unlike the older LinuxThreads that bolted synchronization onto process-based threads). Key design choices:
Zero kernel involvement for thread-local operations: Thread creation, exit, and most synchronization are userspace-only until blocking occurs.
Minimal structure sizes: pthread_mutex_t is small enough to embed anywhere without allocation.
Always use FUTEX_PRIVATE_FLAG: NPTL assumes mutexes are private unless PTHREAD_PROCESS_SHARED is set.
Adaptive mutexes: Spin briefly before futex wait (controlled by PTHREAD_MUTEX_ADAPTIVE_NP).
While glibc NPTL is the most common pthread implementation on Linux, alternatives exist: musl libc has its own futex-based pthreads, the Go runtime uses futexes for its synchronization, and Rust's standard library uses parking_lot which builds on futex. All share the same futex philosophy: fast path in userspace, kernel only for blocking.
When synchronization goes wrong, futex-level debugging tools become essential. Let's explore techniques for diagnosing futex-related problems.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
#!/bin/bash# Debugging futex issues # ============================================# TECHNIQUE 1: Find threads blocked on futex# ============================================ # List all threads in your processps -eLf | grep my_application # Check what each thread is doingcat /proc/<pid>/task/<tid>/syscall# Output like: 202 0x7f1234 0x80 0x2 ...# 202 = SYS_futex, following are arguments # More readable with stracestrace -f -e futex -p <pid># Shows all futex calls in real-time # ============================================# TECHNIQUE 2: GDB for lock state inspection# ============================================ # Attach to running processgdb -p <pid> # Find mutex state(gdb) print *my_mutex# Shows __lock field value:# 0 = unlocked# 1 = locked, no waiters# 2 = locked, has waiters # Find who holds a PI mutex(gdb) print my_pi_mutex.__owner# Shows TID of owner # Get backtrace of all threads(gdb) thread apply all bt # Find threads in futex_wait(gdb) thread apply all bt | grep futex_wait # ============================================# TECHNIQUE 3: bpftrace for futex timing# ============================================ # Find long futex waitssudo bpftrace -e 'tracepoint:syscalls:sys_enter_futex /args->op & 0x7f == 0/ { @start[tid] = nsecs;}tracepoint:syscalls:sys_exit_futex /@start[tid] && nsecs - @start[tid] > 1000000000/ { printf("Thread %d waited %d ms in futex\n", tid, (nsecs - @start[tid]) / 1000000); @[ustack] = count(); delete(@start[tid]);}' # ============================================# TECHNIQUE 4: lockdep for deadlock detection# ============================================ # Enable CONFIG_LOCKDEP in kernel (debug builds)# Kernel will detect lock order violations # Runtime check for circular dependenciesecho 1 > /proc/sys/kernel/softlockup_panic # View lock statistics (if enabled)cat /proc/lockdep_stats # ============================================# TECHNIQUE 5: Valgrind helgrind/drd# ============================================ # Detect data races and lock misusevalgrind --tool=helgrind ./my_application # DRD is similar but different detection strategyvalgrind --tool=drd ./my_application # Common issues detected:# - Lock order inconsistency (potential deadlock)# - Unlocking unheld lock# - Data race: concurrent access without synchronization| Symptom | Likely Cause | Diagnostic | Fix |
|---|---|---|---|
| Thread stuck in futex_wait | Deadlock or lost wakeup | Check lock order, waker logic | Fix lock ordering or wake call |
| High EAGAIN rate | Contention or ABA | Profile fast/slow path | Reduce contention or fix ABA |
| Futex word corruption | Memory corruption | ASAN, watchpoints | Fix buffer overflow/use-after-free |
| PI deadlock | Priority inversion unresolved | Check rt_mutex chain | Verify PI mutex setup |
| Robust futex not cleaning | Missing set_robust_list | Check startup code | Call set_robust_list early |
The #1 cause of futex-related hangs is deadlock from inconsistent lock ordering. Establish and enforce a global lock order in your application. Document it. Use lockdep or ThreadSanitizer to catch violations during development.
We've completed our deep dive into the Linux kernel's futex implementation. Let's consolidate the key knowledge.
Module Complete:
You have now mastered the futex synchronization primitive from every angle:
This knowledge positions you to build, debug, and optimize synchronization at the lowest level on Linux systems.
You now have a complete understanding of the futex primitive—from philosophical motivation through kernel implementation detail. This is the foundation upon which all modern Linux threading is built. Whether you're debugging a production deadlock, implementing a custom synchronization primitive, or simply understanding why your mutexes are fast, you now have the knowledge to succeed.