Loading learning content...
You've learned what spinlocks are and how they work. But knowing how to implement a tool is different from knowing when to use it. Using a spinlock in the wrong context wastes CPU resources, increases power consumption, and can dramatically reduce system throughput. Using blocking locks where spinlocks would excel adds unnecessary latency and overhead.
This page develops your intuition for spinlock selection—not through rules of thumb, but through deep understanding of the underlying trade-offs. By the end, you'll be able to analyze any scenario and confidently choose between spinning and blocking.
By the end of this page, you will understand: the fundamental trade-off equation between spinning and blocking; specific scenarios where spinlocks excel; anti-patterns where spinlocks cause harm; how kernel context differs from userspace; and how modern adaptive locks blend both approaches.
Every synchronization choice comes down to costs. Let's formalize the trade-off precisely.
Let T_critical be the expected time a thread holds the lock (critical section duration).
Let T_context be the cost of a context switch (includes saving state, scheduler work, cache effects, and restoring state).
Let T_spin be the CPU time wasted per unit of spinning time (essentially 1:1 on a dedicated core, or varies with hyperthreading).
Total cost of spinning:
C_spin = T_critical × T_spin × opportunity_cost_factor
Total cost of blocking:
C_block = 2 × T_context + scheduler_overhead + wakeup_latency
Spinlocks win when: C_spin < C_block
| Parameter | Typical Value | Notes |
|---|---|---|
| Context switch (kernel) | 1-5 μs | Minimal path, hot cache |
| Context switch (full) | 5-20 μs | Cold cache, scheduler work |
| Spin iteration (L1 hit) | 10-50 ns | Including PAUSE |
| L2 cache miss | 10-15 ns | ~40 cycles |
| L3 cache miss | 40-50 ns | ~150 cycles |
| Main memory access | 60-100 ns | ~200-300 cycles |
| Cross-NUMA memory | 100-300 ns | Remote memory access |
Given these numbers, if:
Break-even occurs at: 5,000 ns / 50 ns = ~100 spin iterations
In time terms: spinning is advantageous if the lock is held for less than ~5-10 μs.
This leads to the first rule: Spinlocks are for short critical sections.
But 'short' needs quantification. Typical guidelines:
These are guidelines for low contention. With 10 threads competing for a lock, the 'expected wait time' is 10× the critical section duration. A 1μs critical section becomes 10μs average wait. Always consider contention levels when choosing primitives.
Let's examine specific scenarios where spinlocks are the right choice, with concrete justifications.
In operating system kernels, certain code paths cannot block. Interrupt handlers, tasklets, and code running with preemption disabled fall into this category.
Why blocking is impossible:
Spinlocks are the only option here.
1234567891011121314151617181920212223
// Interrupt handler example - must use spinlock static DEFINE_SPINLOCK(device_lock);static struct device_data shared_data; // This function is called from interrupt context - cannot sleep!irqreturn_t device_interrupt(int irq, void *dev_id) { unsigned long flags; // spin_lock_irqsave: acquire spinlock, disable interrupts spin_lock_irqsave(&device_lock, flags); // Access shared data - protected by spinlock process_device_interrupt(&shared_data); spin_unlock_irqrestore(&device_lock, flags); return IRQ_HANDLED;} // If we used a mutex here:// mutex_lock() → might sleep → scheduler invoked → // scheduler tries to switch contexts → crash or hangThe scheduler itself needs synchronization. But it cannot block on a lock because blocking requires the scheduler. This is the ultimate chicken-and-egg problem.
The runqueue lock in Linux protects per-CPU scheduler data. It must be a spinlock because:
1234567891011121314151617181920212223242526272829303132333435
// Simplified runqueue lock usage in scheduler struct rq { raw_spinlock_t lock; // Must be spinlock - cannot block struct list_head active; // Runnable processes struct task_struct *current; // Currently running // ...}; // Context switch path - cannot use blocking lock herestatic void context_switch(struct rq *rq, struct task_struct *prev, struct task_struct *next) { // Already holding rq->lock (spinlock) // Perform the actual switch switch_to(prev, next); // Lock might be released in the switched-to context} // Schedule function - finds next task to runvoid __schedule(void) { struct rq *rq = this_rq(); raw_spin_lock(&rq->lock); // MUST be spinlock // Find next task, update accounting next = pick_next_task(rq); if (prev != next) { context_switch(rq, prev, next); } raw_spin_unlock(&rq->lock);}When locks are acquired very frequently and held for very short durations, even small context switch overhead accumulates.
Example: Per-packet network processing
A network device driver might process millions of packets per second. If each packet requires a brief lock (e.g., updating statistics), context switch overhead at blocking lock granularity becomes prohibitive.
Math:
Spinlocks with sub-microsecond hold times keep overhead manageable.
Scenario: You have multiple CPUs that need to coordinate, but you know for certain that:
In this scenario, spinning is efficient because:
Hard real-time systems often prefer spinlocks over blocking locks because spinning has bounded, predictable latency. A context switch introduces variable delays from cache warming, scheduler decisions, and queue management. When microsecond-level predictability matters, spinning's determinism is valuable.
Understanding when NOT to use spinlocks is equally important. These anti-patterns cause real problems in production systems.
If the lock is held for milliseconds or longer, spinning threads waste enormous CPU time. Every spinning thread burns 100% of its core doing nothing useful.
1234567891011121314151617181920212223242526272829303132333435
// ANTI-PATTERN: Spinlock with I/O in critical section spinlock_t bad_lock; void process_file(const char *path) { spin_lock(&bad_lock); // BAD: File I/O can take milliseconds or longer FILE *f = fopen(path, "r"); while (fgets(buffer, sizeof(buffer), f)) { process_line(buffer); } fclose(f); spin_unlock(&bad_lock); // A thread spinning on this lock might wait 100ms+ // That's 100,000,000 spin iterations = pure waste} // CORRECT: Use mutex for I/O-bound operationspthread_mutex_t good_lock; void process_file_correct(const char *path) { pthread_mutex_lock(&good_lock); // OK: Mutex will put waiting threads to sleep FILE *f = fopen(path, "r"); while (fgets(buffer, sizeof(buffer), f)) { process_line(buffer); } fclose(f); pthread_mutex_unlock(&good_lock);}This is a cardinal sin in kernel programming. If you sleep while holding a spinlock, you've created a logical impossibility:
12345678910111213141516171819202122232425262728
// FATAL BUG: Sleeping while holding spinlock spinlock_t lock; void buggy_function(void) { spin_lock(&lock); // Acquire spinlock // BUG: kmalloc with GFP_KERNEL can sleep! buffer = kmalloc(size, GFP_KERNEL); spin_unlock(&lock);} // What happens:// 1. Thread A acquires spinlock// 2. Thread A calls kmalloc, which triggers page reclaim// 3. Page reclaim needs to swap pages to disk - BLOCKS// 4. Thread A is now asleep while holding spinlock// 5. Thread B tries to acquire spinlock - SPINS FOREVER// 6. If B has higher priority than A: priority inversion deadlock// 7. System hangs // CORRECT: Use GFP_ATOMIC (non-sleeping) OR use mutex insteadvoid correct_function(void) { spin_lock(&lock); buffer = kmalloc(size, GFP_ATOMIC); // Never sleeps spin_unlock(&lock);}On a single CPU without preemption disabled, spinlocks guarantee deadlock:
123456789101112131415161718
// DEADLOCK on single-CPU system void thread_a(void) { spin_lock(&lock); // Acquire lock // ... critical section ... // PREEMPTED HERE by scheduler} void thread_b(void) { // Now running (thread A was preempted) spin_lock(&lock); // Lock is held by A // B spins waiting for A // But A can't run - only one CPU! // DEADLOCK} // The fix: spinlocks MUST disable preemption// Linux's spin_lock() includes preempt_disable()When many threads contend for the same spinlock, each waiter burns a CPU. With 32 threads contending for one lock, you could have 31 CPUs spinning uselessly.
Signs of excessive contention:
Solutions:
Spinning prevents CPUs from entering low-power states. On battery-powered devices, a single spinning thread can halve battery life. On servers, it increases cooling requirements and electricity costs. In cloud environments, you're paying for compute you're not using productively.
The choice between spinlocks and blocking locks varies significantly between kernel and userspace code.
The kernel has obligations and capabilities that influence synchronization choices:
Obligations:
Capabilities:
| Situation | Recommended Primitive | Rationale |
|---|---|---|
| Interrupt handler | spinlock_t with IRQ disable | Cannot sleep; must be fast |
| Scheduler code | raw_spinlock_t | Cannot invoke scheduler |
| Device driver (process ctx) | mutex (for I/O), spinlock (for registers) | Match to operation type |
| Read-heavy data | rwlock or RCU | Readers don't block each other |
| Counting resources | semaphore | Can count > 1 |
| Sleeping with lock | mutex only | Spinlocks prohibit sleep |
Userspace applications have different constraints:
Challenges:
Consequences:
12345678910111213141516171819202122232425262728293031
// Userspace spinlock dangers illustrated // Scenario: Thread A holds spinlock, gets preemptedvoid thread_a(void) { spin_lock(&lock); // Thread A is preempted here by the kernel scheduler // (its 10ms time slice expired) process_data(); spin_unlock(&lock);} // Thread B scheduled on same CPUvoid thread_b(void) { spin_lock(&lock); // Spins for up to 10ms until A gets CPU again! // ...} // Without disabling preemption (only possible in kernel),// userspace spinlocks can waste entire time slices. // Better approach: pthread_mutex with adaptive spinning// The library implements smart spin-then-block semanticspthread_mutex_t adaptive_mutex = PTHREAD_MUTEX_INITIALIZER; void thread_safe(void) { pthread_mutex_lock(&adaptive_mutex); // Library decides whether to spin briefly or block immediately // based on observed contention and system state process_data(); pthread_mutex_unlock(&adaptive_mutex);}Modern pthread implementations (NPTL on Linux, for example) implement adaptive mutexes that: spin briefly if the lock is held by a running thread on another CPU, block immediately if the holder is sleeping or on the same CPU, and use futex for efficient kernel-assisted blocking. You get near-spinlock performance for short waits without spinlock risks.
Real-world workloads have variable lock hold times. Sometimes critical sections are microseconds; sometimes they're milliseconds. Adaptive locks dynamically choose between spinning and blocking based on observed conditions.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
// Conceptual adaptive lock implementation typedef struct { atomic_int locked; atomic_int owner_cpu; // Which CPU holds the lock atomic_int owner_sleeping; // Is holder potentially blocked wait_queue_t waiters;} adaptive_lock_t; void adaptive_lock(adaptive_lock_t *lock) { int spin_count = 0; const int MAX_SPIN = 1000; // Empirically tuned while (1) { // Try to acquire if (try_lock(lock)) { return; // Got it! } // Decide: spin or block? if (spin_count < MAX_SPIN && lock->owner_cpu != current_cpu() && !lock->owner_sleeping) { // Owner is running on another CPU - spin a bit spin_count++; cpu_relax(); } else { // Either we've spun enough, or owner might be blocked // Time to sleep add_to_wait_queue(&lock->waiters, current_thread()); // Double-check lock is still held before sleeping if (lock->locked) { schedule(); // Block until woken } else { remove_from_wait_queue(&lock->waiters, current_thread()); } spin_count = 0; // Reset for next attempt } }} void adaptive_unlock(adaptive_lock_t *lock) { release_lock(lock); // Wake one waiter if any wake_up_one(&lock->waiters);}The Linux kernel mutex (since 2.6.x) implements adaptive spinning:
This design achieves near-spinlock latency for short critical sections while being well-behaved under contention or long hold times.
| Condition | Action | Rationale |
|---|---|---|
| Lock is free | CAS acquire, enter CS | No waiting needed |
| Holder running, other CPU | Spin briefly | Will release soon |
| Holder running, same CPU | Block immediately | Spinning can't help (single core) |
| Holder is sleeping | Block immediately | Might wait a long time |
| Spin count exceeded | Block | Heuristic: wait is unexpectedly long |
| Just woke from block | Try CAS, then spin | Might succeed immediately |
The Rust ecosystem's 'parking_lot' crate provides adaptive locks with remarkably clever implementations. Its Mutex spins briefly, uses futex-style blocking, maintains fairness, and handles priority inversion—all while being smaller and often faster than pthread mutexes. Study its implementation for state-of-the-art adaptive lock design.
Let's synthesize what we've learned into a practical decision process. When facing a synchronization problem, work through these questions:
Here's a condensed reference for common scenarios:
| Scenario | Primitive | Why |
|---|---|---|
| Interrupt handler | spin_lock_irqsave() | Can't sleep; need IRQ disable |
| Scheduler internals | raw_spinlock | Can't invoke scheduler |
| Device register access | spin_lock() | Fast, no I/O |
| File/disk operations | mutex | I/O can sleep |
| Network buffer management | spinlock or RCU | Speed-critical, short hold |
| Database transaction | mutex + condition variable | Complex waiting patterns |
| Counter increment | Atomic operations | Lock-free is possible |
| Read-heavy data | rwlock or RCU | Readers shouldn't block readers |
| Userspace general purpose | pthread_mutex | Adaptive implementation |
| Userspace high-performance | parking_lot / futex | Advanced adaptive |
Theory guides initial choices, but real systems are complex. Profile your actual workload. Measure contention levels, hold times, and wait times. Let data drive the final decision. A theoretically suboptimal choice might empirically be fine for your specific patterns.
Even experienced engineers make synchronization primitive selection errors. Here are the most common mistakes and how to avoid them.
while (!try_lock(l)) { } is identical to a spinlock. Limit attempts or add backoff.12345678910111213141516171819202122232425262728293031323334353637383940
// MISTAKE 1: Spinlock for variable-duration critical sectionspinlock_t cache_lock; void update_cache(key, value) { spin_lock(&cache_lock); // Usually fast, but... entry = hash_lookup(key); // Fast: O(1) typically if (cache_full()) { evict_entries(); // SLOW: might evict thousands of entries! // Other threads spin for 10+ ms } insert(key, value); spin_unlock(&cache_lock);} // FIX: Use mutex, or restructure to do eviction outside lock // MISTAKE 2: Busy-loop on try_lockvoid bad_try_lock(lock_t *lock) { while (!try_lock(lock)) { // "I'm using try_lock, not spin_lock, so it's fine" - WRONG // This IS spin locking, just spelled differently }} // FIX: Add backoff, limit attempts, or just use the appropriate lock typevoid better_try_lock(lock_t *lock) { int attempts = 0; while (!try_lock(lock)) { if (++attempts > 1000) { // Fall back to blocking blocking_lock(lock); return; } backoff(attempts); // Exponential backoff }}The Linux kernel includes 'lockdep', a lock debugging tool that tracks lock dependencies, detects potential deadlocks, and warns about incorrect contexts (like sleeping with spinlock held). Enable CONFIG_LOCKDEP during development—it catches many selection mistakes at runtime.
We've developed a nuanced understanding of when spinlocks are the right tool. Let's consolidate the key insights:
What's next:
Now that we know when to use spinlocks, the next page examines spin time considerations—how long to spin before giving up, backoff strategies, and the performance implications of spin tuning. These details separate a correct spinlock from an efficient one.
You now have a framework for deciding when spinlocks are appropriate. You understand the trade-offs, can identify scenarios favoring spinlocks, and can recognize dangerous anti-patterns. Next, we'll explore the nuances of spin duration and backoff strategies.