Operating SystemsLocks & Hardware Support

Spinlocks

LevelIntermediate

Duration60 mins

TopicLocks & Hardware Support

3 / 5

When Spinlocks Are Appropriate

Choosing Your Synchronization Primitive

You've learned what spinlocks are and how they work. But knowing how to implement a tool is different from knowing when to use it. Using a spinlock in the wrong context wastes CPU resources, increases power consumption, and can dramatically reduce system throughput. Using blocking locks where spinlocks would excel adds unnecessary latency and overhead.

This page develops your intuition for spinlock selection—not through rules of thumb, but through deep understanding of the underlying trade-offs. By the end, you'll be able to analyze any scenario and confidently choose between spinning and blocking.

What You Will Learn

By the end of this page, you will understand: the fundamental trade-off equation between spinning and blocking; specific scenarios where spinlocks excel; anti-patterns where spinlocks cause harm; how kernel context differs from userspace; and how modern adaptive locks blend both approaches.

The Fundamental Trade-off Equation

Every synchronization choice comes down to costs. Let's formalize the trade-off precisely.

The Cost Model

Let T_critical be the expected time a thread holds the lock (critical section duration).

Let T_context be the cost of a context switch (includes saving state, scheduler work, cache effects, and restoring state).

Let T_spin be the CPU time wasted per unit of spinning time (essentially 1:1 on a dedicated core, or varies with hyperthreading).

Total cost of spinning:

C_spin = T_critical × T_spin × opportunity_cost_factor

Total cost of blocking:

C_block = 2 × T_context + scheduler_overhead + wakeup_latency

Spinlocks win when: C_spin < C_block

Typical Timing Parameters (Modern Hardware, circa 2024)
Parameter	Typical Value	Notes
Context switch (kernel)	1-5 μs	Minimal path, hot cache
Context switch (full)	5-20 μs	Cold cache, scheduler work
Spin iteration (L1 hit)	10-50 ns	Including PAUSE
L2 cache miss	10-15 ns	~40 cycles
L3 cache miss	40-50 ns	~150 cycles
Main memory access	60-100 ns	~200-300 cycles
Cross-NUMA memory	100-300 ns	Remote memory access

Break-even Analysis

Given these numbers, if:

Context switch ≈ 5 μs
Each spin iteration ≈ 50 ns

Break-even occurs at: 5,000 ns / 50 ns = ~100 spin iterations

In time terms: spinning is advantageous if the lock is held for less than ~5-10 μs.

This leads to the first rule: Spinlocks are for short critical sections.

But 'short' needs quantification. Typical guidelines:

Critical Section Duration Guidelines

•< 1 μs — Spinlock strongly preferred. Context switch overhead dominates.
•1-10 μs — Spinlock usually wins. Depends on contention level.
•10-100 μs — Grey zone. Adaptive locks that spin-then-block excel here.
•> 100 μs — Blocking lock preferred. Spinning wastes significant CPU.
•> 1 ms — Spinlock is actively harmful. Use mutex or semaphore.

Contention Amplifies Everything

These are guidelines for low contention. With 10 threads competing for a lock, the 'expected wait time' is 10× the critical section duration. A 1μs critical section becomes 10μs average wait. Always consider contention levels when choosing primitives.

Scenarios Where Spinlocks Excel

Let's examine specific scenarios where spinlocks are the right choice, with concrete justifications.

1. Interrupt Handlers and Atomic Context

In operating system kernels, certain code paths cannot block. Interrupt handlers, tasklets, and code running with preemption disabled fall into this category.

Why blocking is impossible:

Interrupt handlers don't have a thread context to save/restore
The scheduler cannot be invoked from interrupt context
Blocking would deadlock (the interrupted thread can't run to release anything)

Spinlocks are the only option here.

interrupt_handler.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Interrupt handler example - must use spinlock
 
static DEFINE_SPINLOCK(device_lock);
static struct device_data shared_data;
 
// This function is called from interrupt context - cannot sleep!
irqreturn_t device_interrupt(int irq, void *dev_id) {
    unsigned long flags;
    
    // spin_lock_irqsave: acquire spinlock, disable interrupts
    spin_lock_irqsave(&device_lock, flags);
    
    // Access shared data - protected by spinlock
    process_device_interrupt(&shared_data);
    
    spin_unlock_irqrestore(&device_lock, flags);
    
    return IRQ_HANDLED;
}
 
// If we used a mutex here:
// mutex_lock() → might sleep → scheduler invoked → 
// scheduler tries to switch contexts → crash or hang

2. Protecting Scheduler Data Structures

The scheduler itself needs synchronization. But it cannot block on a lock because blocking requires the scheduler. This is the ultimate chicken-and-egg problem.

The runqueue lock in Linux protects per-CPU scheduler data. It must be a spinlock because:

The scheduler must access it to perform context switches
Blocking while holding it would prevent any thread from running
It's held for very short durations (queue manipulation only)

scheduler_lock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Simplified runqueue lock usage in scheduler
 
struct rq {
    raw_spinlock_t lock;          // Must be spinlock - cannot block
    struct list_head active;       // Runnable processes
    struct task_struct *current;   // Currently running
    // ...
};
 
// Context switch path - cannot use blocking lock here
static void context_switch(struct rq *rq, struct task_struct *prev,
                          struct task_struct *next) {
    // Already holding rq->lock (spinlock)
    
    // Perform the actual switch
    switch_to(prev, next);
    
    // Lock might be released in the switched-to context
}
 
// Schedule function - finds next task to run
void __schedule(void) {
    struct rq *rq = this_rq();
    
    raw_spin_lock(&rq->lock);    // MUST be spinlock
    
    // Find next task, update accounting
    next = pick_next_task(rq);
    
    if (prev != next) {
        context_switch(rq, prev, next);
    }
    
    raw_spin_unlock(&rq->lock);
}

3. Short Critical Sections with High Frequency

When locks are acquired very frequently and held for very short durations, even small context switch overhead accumulates.

Example: Per-packet network processing

A network device driver might process millions of packets per second. If each packet requires a brief lock (e.g., updating statistics), context switch overhead at blocking lock granularity becomes prohibitive.

Math:

1 million packets/second × 5 μs context switch overhead = 5 seconds of overhead per second
This is clearly impossible—the system would collapse

Spinlocks with sub-microsecond hold times keep overhead manageable.

4. Multiprocessor-Only Synchronization

Scenario: You have multiple CPUs that need to coordinate, but you know for certain that:

The lock holder is running on a different CPU
That CPU will make progress without interruption
The lock will be released very quickly

In this scenario, spinning is efficient because:

The holder IS making progress (unlike single-CPU where spinning prevents holder progress)
The wait time is bounded by the critical section length
No context switch is needed—just wait for the other CPU

Real-Time Systems

Hard real-time systems often prefer spinlocks over blocking locks because spinning has bounded, predictable latency. A context switch introduces variable delays from cache warming, scheduler decisions, and queue management. When microsecond-level predictability matters, spinning's determinism is valuable.

When Spinlocks Are Harmful

Understanding when NOT to use spinlocks is equally important. These anti-patterns cause real problems in production systems.

1. Long Critical Sections

If the lock is held for milliseconds or longer, spinning threads waste enormous CPU time. Every spinning thread burns 100% of its core doing nothing useful.

long_critical_section.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// ANTI-PATTERN: Spinlock with I/O in critical section
 
spinlock_t bad_lock;
 
void process_file(const char *path) {
    spin_lock(&bad_lock);
    
    // BAD: File I/O can take milliseconds or longer
    FILE *f = fopen(path, "r");
    while (fgets(buffer, sizeof(buffer), f)) {
        process_line(buffer);
    }
    fclose(f);
    
    spin_unlock(&bad_lock);
    
    // A thread spinning on this lock might wait 100ms+
    // That's 100,000,000 spin iterations = pure waste
}
 
// CORRECT: Use mutex for I/O-bound operations
pthread_mutex_t good_lock;
 
void process_file_correct(const char *path) {
    pthread_mutex_lock(&good_lock);
    
    // OK: Mutex will put waiting threads to sleep
    FILE *f = fopen(path, "r");
    while (fgets(buffer, sizeof(buffer), f)) {
        process_line(buffer);
    }
    fclose(f);
    
    pthread_mutex_unlock(&good_lock);
}

2. Sleeping While Holding Spinlock

This is a cardinal sin in kernel programming. If you sleep while holding a spinlock, you've created a logical impossibility:

sleep_with_spinlock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// FATAL BUG: Sleeping while holding spinlock
 
spinlock_t lock;
 
void buggy_function(void) {
    spin_lock(&lock);          // Acquire spinlock
    
    // BUG: kmalloc with GFP_KERNEL can sleep!
    buffer = kmalloc(size, GFP_KERNEL);
    
    spin_unlock(&lock);
}
 
// What happens:
// 1. Thread A acquires spinlock
// 2. Thread A calls kmalloc, which triggers page reclaim
// 3. Page reclaim needs to swap pages to disk - BLOCKS
// 4. Thread A is now asleep while holding spinlock
// 5. Thread B tries to acquire spinlock - SPINS FOREVER
// 6. If B has higher priority than A: priority inversion deadlock
// 7. System hangs
 
// CORRECT: Use GFP_ATOMIC (non-sleeping) OR use mutex instead
void correct_function(void) {
    spin_lock(&lock);
    buffer = kmalloc(size, GFP_ATOMIC);  // Never sleeps
    spin_unlock(&lock);
}

Operations That Can Sleep

•Memory allocation — GFP_KERNEL, vmalloc, or any allocation that might trigger page reclaim
•File/disk I/O — read(), write(), anything touching filesystem
•Network I/O — send(), recv(), anything waiting for packets
•Kernel message printing — printk() can sleep in some configurations
•Copying to/from userspace — copy_to_user(), copy_from_user() can page fault
•Acquiring a mutex — mutex_lock() is designed to sleep
•Waiting on completion — wait_for_completion() variants

3. Single-CPU Systems Without Preemption Control

On a single CPU without preemption disabled, spinlocks guarantee deadlock:

single_cpu_deadlock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// DEADLOCK on single-CPU system
 
void thread_a(void) {
    spin_lock(&lock);       // Acquire lock
    // ... critical section ...
    // PREEMPTED HERE by scheduler
}
 
void thread_b(void) {
    // Now running (thread A was preempted)
    spin_lock(&lock);       // Lock is held by A
                            // B spins waiting for A
                            // But A can't run - only one CPU!
                            // DEADLOCK
}
 
// The fix: spinlocks MUST disable preemption
// Linux's spin_lock() includes preempt_disable()

4. Very High Contention

When many threads contend for the same spinlock, each waiter burns a CPU. With 32 threads contending for one lock, you could have 31 CPUs spinning uselessly.

Signs of excessive contention:

High CPU utilization with low actual throughput
System appears 'busy' but work isn't completing
Adding more CPUs makes performance worse

Solutions:

Use finer-grained locking (partition the data structure)
Switch to lock-free algorithms
Use blocking locks (at least waiters don't burn CPU)
Consider RCU or reader-writer locks for read-heavy workloads

The Power Cost

Spinning prevents CPUs from entering low-power states. On battery-powered devices, a single spinning thread can halve battery life. On servers, it increases cooling requirements and electricity costs. In cloud environments, you're paying for compute you're not using productively.

Kernel vs. Userspace Considerations

The choice between spinlocks and blocking locks varies significantly between kernel and userspace code.

Kernel Context

The kernel has obligations and capabilities that influence synchronization choices:

Obligations:

Must handle interrupts responsively
Must not leave spinlocks held across schedule() calls
Must prevent priority inversion on critical paths
Must be fair to all processes

Capabilities:

Can disable preemption and interrupts
Has direct access to scheduler state
Can use specialized primitives (rcu_read_lock, seqlocks)

Kernel Synchronization Primitive Selection
Situation	Recommended Primitive	Rationale
Interrupt handler	spinlock_t with IRQ disable	Cannot sleep; must be fast
Scheduler code	raw_spinlock_t	Cannot invoke scheduler
Device driver (process ctx)	mutex (for I/O), spinlock (for registers)	Match to operation type
Read-heavy data	rwlock or RCU	Readers don't block each other
Counting resources	semaphore	Can count > 1
Sleeping with lock	mutex only	Spinlocks prohibit sleep

Userspace Context

Userspace applications have different constraints:

Challenges:

A spinlocking thread can be preempted at any time by the kernel
No way to disable preemption or interrupts
If the lock holder is preempted, spinners waste their entire time slice
Priority inversion is common and hard to detect

Consequences:

Pure spinlocks in userspace are rarely appropriate
Adaptive locks (spin briefly, then block) are the standard
Libraries like pthreads implement this hybrid approach

userspace_consideration.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Userspace spinlock dangers illustrated
 
// Scenario: Thread A holds spinlock, gets preempted
void thread_a(void) {
    spin_lock(&lock);
    // Thread A is preempted here by the kernel scheduler
    // (its 10ms time slice expired)
    process_data();
    spin_unlock(&lock);
}
 
// Thread B scheduled on same CPU
void thread_b(void) {
    spin_lock(&lock);  // Spins for up to 10ms until A gets CPU again!
    // ...
}
 
// Without disabling preemption (only possible in kernel),
// userspace spinlocks can waste entire time slices.
 
// Better approach: pthread_mutex with adaptive spinning
// The library implements smart spin-then-block semantics
pthread_mutex_t adaptive_mutex = PTHREAD_MUTEX_INITIALIZER;
 
void thread_safe(void) {
    pthread_mutex_lock(&adaptive_mutex);
    // Library decides whether to spin briefly or block immediately
    // based on observed contention and system state
    process_data();
    pthread_mutex_unlock(&adaptive_mutex);
}

Threading Library Intelligence

Modern pthread implementations (NPTL on Linux, for example) implement adaptive mutexes that: spin briefly if the lock is held by a running thread on another CPU, block immediately if the holder is sleeping or on the same CPU, and use futex for efficient kernel-assisted blocking. You get near-spinlock performance for short waits without spinlock risks.

Adaptive Locks: The Best of Both Worlds

Real-world workloads have variable lock hold times. Sometimes critical sections are microseconds; sometimes they're milliseconds. Adaptive locks dynamically choose between spinning and blocking based on observed conditions.

The Adaptive Strategy

Start by spinning — For short waits, this avoids context switch overhead
Track the lock holder's state — Is the holder actively running?
Transition to blocking if symptoms suggest long wait — Don't waste CPU indefinitely

adaptive_lock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Conceptual adaptive lock implementation
 
typedef struct {
    atomic_int locked;
    atomic_int owner_cpu;           // Which CPU holds the lock
    atomic_int owner_sleeping;      // Is holder potentially blocked
    wait_queue_t waiters;
} adaptive_lock_t;
 
void adaptive_lock(adaptive_lock_t *lock) {
    int spin_count = 0;
    const int MAX_SPIN = 1000;      // Empirically tuned
    
    while (1) {
        // Try to acquire
        if (try_lock(lock)) {
            return;  // Got it!
        }
        
        // Decide: spin or block?
        if (spin_count < MAX_SPIN && 
            lock->owner_cpu != current_cpu() && 
            !lock->owner_sleeping) {
            
            // Owner is running on another CPU - spin a bit
            spin_count++;
            cpu_relax();
            
        } else {
            // Either we've spun enough, or owner might be blocked
            // Time to sleep
            add_to_wait_queue(&lock->waiters, current_thread());
            
            // Double-check lock is still held before sleeping
            if (lock->locked) {
                schedule();  // Block until woken
            } else {
                remove_from_wait_queue(&lock->waiters, current_thread());
            }
            
            spin_count = 0;  // Reset for next attempt
        }
    }
}
 
void adaptive_unlock(adaptive_lock_t *lock) {
    release_lock(lock);
    
    // Wake one waiter if any
    wake_up_one(&lock->waiters);
}

Linux Mutex: A Real Adaptive Lock

The Linux kernel mutex (since 2.6.x) implements adaptive spinning:

Fast path: If lock is free, acquire via single atomic CAS
Spinning phase: If lock is held by a running thread on another CPU, spin waiting
Blocking phase: If holder is sleeping or spin count exceeded, add to wait queue and sleep
Wakeup: When lock is released, owner wakes one waiter

This design achieves near-spinlock latency for short critical sections while being well-behaved under contention or long hold times.

Adaptive Lock Decision Matrix
Condition	Action	Rationale
Lock is free	CAS acquire, enter CS	No waiting needed
Holder running, other CPU	Spin briefly	Will release soon
Holder running, same CPU	Block immediately	Spinning can't help (single core)
Holder is sleeping	Block immediately	Might wait a long time
Spin count exceeded	Block	Heuristic: wait is unexpectedly long
Just woke from block	Try CAS, then spin	Might succeed immediately

Parking Lot in Rust

The Rust ecosystem's 'parking_lot' crate provides adaptive locks with remarkably clever implementations. Its Mutex spins briefly, uses futex-style blocking, maintains fairness, and handles priority inversion—all while being smaller and often faster than pthread mutexes. Study its implementation for state-of-the-art adaptive lock design.

A Practical Decision Framework

Let's synthesize what we've learned into a practical decision process. When facing a synchronization problem, work through these questions:

Spinlock Decision Flowchart

•Can the code sleep? If no (interrupt context, scheduler code), USE SPINLOCK. No other option exists.
•Is this in kernel or userspace? Userspace: prefer adaptive primitives (pthread_mutex, parking_lot). Kernel: continue evaluation.
•What's the critical section duration? Under ~5μs: spinlock often wins. Over ~100μs: use mutex. Between: consider adaptive.
•What's the contention level? Low: spinlock overhead is minimal. High: blocking prevents CPU waste from many spinners.
•Any sleepable operations in critical section? File I/O, most allocations, network: USE MUTEX. Spinlocks cannot protect code that sleeps.
•Does real-time determinism matter? Yes: spinlocks have bounded latency. Blocking has variable latency.
•Is power consumption critical? Yes (mobile, laptops): avoid spinlocks—they prevent low-power states.

Quick Reference

Here's a condensed reference for common scenarios:

When to Use Which Primitive
Scenario	Primitive	Why
Interrupt handler	spin_lock_irqsave()	Can't sleep; need IRQ disable
Scheduler internals	raw_spinlock	Can't invoke scheduler
Device register access	spin_lock()	Fast, no I/O
File/disk operations	mutex	I/O can sleep
Network buffer management	spinlock or RCU	Speed-critical, short hold
Database transaction	mutex + condition variable	Complex waiting patterns
Counter increment	Atomic operations	Lock-free is possible
Read-heavy data	rwlock or RCU	Readers shouldn't block readers
Userspace general purpose	pthread_mutex	Adaptive implementation
Userspace high-performance	parking_lot / futex	Advanced adaptive

When in Doubt, Measure

Theory guides initial choices, but real systems are complex. Profile your actual workload. Measure contention levels, hold times, and wait times. Let data drive the final decision. A theoretically suboptimal choice might empirically be fine for your specific patterns.

Common Selection Mistakes

Even experienced engineers make synchronization primitive selection errors. Here are the most common mistakes and how to avoid them.

Selection Anti-patterns

•'Spinlocks are faster' misconception — Yes, for sub-microsecond holds. No, not universally. Blocking for 1ms wastes less total CPU than spinning for 1ms.
•Ignoring hold time variance — Average hold time of 1μs but occasional 10ms holds? Spinlock is wrong. Use adaptive locks.
•Forgetting the holder-gets-preempted scenario — In userspace, pure spinlocks can waste entire scheduler quanta. Kernel: disable preemption.
•Using spinlock with sleeping code — Adding logging, allocation, or I/O to a spinlock-protected section is an eventual bug.
•Over-spinning in try-lock loops — while (!try_lock(l)) { } is identical to a spinlock. Limit attempts or add backoff.
•Choosing based on single-threaded behavior — Code that 'never contends in testing' becomes contention nightmare in production with real load.

selection_mistakes.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// MISTAKE 1: Spinlock for variable-duration critical section
spinlock_t cache_lock;
 
void update_cache(key, value) {
    spin_lock(&cache_lock);
    
    // Usually fast, but...
    entry = hash_lookup(key);  // Fast: O(1) typically
    
    if (cache_full()) {
        evict_entries();       // SLOW: might evict thousands of entries!
                               // Other threads spin for 10+ ms
    }
    
    insert(key, value);
    spin_unlock(&cache_lock);
}
 
// FIX: Use mutex, or restructure to do eviction outside lock
 
// MISTAKE 2: Busy-loop on try_lock
void bad_try_lock(lock_t *lock) {
    while (!try_lock(lock)) {
        // "I'm using try_lock, not spin_lock, so it's fine" - WRONG
        // This IS spin locking, just spelled differently
    }
}
 
// FIX: Add backoff, limit attempts, or just use the appropriate lock type
void better_try_lock(lock_t *lock) {
    int attempts = 0;
    while (!try_lock(lock)) {
        if (++attempts > 1000) {
            // Fall back to blocking
            blocking_lock(lock);
            return;
        }
        backoff(attempts);  // Exponential backoff
    }
}

The LockDep Tool

The Linux kernel includes 'lockdep', a lock debugging tool that tracks lock dependencies, detects potential deadlocks, and warns about incorrect contexts (like sleeping with spinlock held). Enable CONFIG_LOCKDEP during development—it catches many selection mistakes at runtime.

Summary: When Spinlocks Are Appropriate

We've developed a nuanced understanding of when spinlocks are the right tool. Let's consolidate the key insights:

Key Takeaways

•The break-even point is context switch time (~5-10μs) — Spinlocks win for holds shorter than this; blocking wins for longer holds.
•Atomic context mandates spinlocks — Interrupt handlers, scheduler code, and anywhere that cannot sleep must use spinlocks.
•Userspace generally prefers adaptive locks — pthread_mutex and similar implement spin-then-block, giving best of both worlds.
•Never sleep with spinlock held — This is a guaranteed bug. I/O, most allocations, and any blocking call are forbidden.
•Contention amplifies costs — High contention means many spinners wasting CPU. Consider lock-free, finer granularity, or blocking.
•Power efficiency matters — Spinning prevents CPU low-power states. On mobile devices or in cost-conscious clouds, this matters.
•When uncertain, use adaptive primitives — Modern locks handle the decision automatically with good heuristics.

What's next:

Now that we know when to use spinlocks, the next page examines spin time considerations—how long to spin before giving up, backoff strategies, and the performance implications of spin tuning. These details separate a correct spinlock from an efficient one.

Page Complete

You now have a framework for deciding when spinlocks are appropriate. You understand the trade-offs, can identify scenarios favoring spinlocks, and can recognize dangerous anti-patterns. Next, we'll explore the nuances of spin duration and backoff strategies.

3 / 5

Loading learning content...

Operating SystemsLocks & Hardware Support

Spinlocks

LevelIntermediate

Duration60 mins

TopicLocks & Hardware Support

3 / 5

When Spinlocks Are Appropriate

Choosing Your Synchronization Primitive

What You Will Learn

The Fundamental Trade-off Equation

Every synchronization choice comes down to costs. Let's formalize the trade-off precisely.

The Cost Model

Let T_critical be the expected time a thread holds the lock (critical section duration).

Let T_context be the cost of a context switch (includes saving state, scheduler work, cache effects, and restoring state).

Let T_spin be the CPU time wasted per unit of spinning time (essentially 1:1 on a dedicated core, or varies with hyperthreading).

Total cost of spinning:

C_spin = T_critical × T_spin × opportunity_cost_factor

Total cost of blocking:

C_block = 2 × T_context + scheduler_overhead + wakeup_latency

Spinlocks win when: C_spin < C_block

Typical Timing Parameters (Modern Hardware, circa 2024)
Parameter	Typical Value	Notes
Context switch (kernel)	1-5 μs	Minimal path, hot cache
Context switch (full)	5-20 μs	Cold cache, scheduler work
Spin iteration (L1 hit)	10-50 ns	Including PAUSE
L2 cache miss	10-15 ns	~40 cycles
L3 cache miss	40-50 ns	~150 cycles
Main memory access	60-100 ns	~200-300 cycles
Cross-NUMA memory	100-300 ns	Remote memory access

Break-even Analysis

Given these numbers, if:

Context switch ≈ 5 μs
Each spin iteration ≈ 50 ns

Break-even occurs at: 5,000 ns / 50 ns = ~100 spin iterations

In time terms: spinning is advantageous if the lock is held for less than ~5-10 μs.

This leads to the first rule: Spinlocks are for short critical sections.

But 'short' needs quantification. Typical guidelines:

Critical Section Duration Guidelines

•< 1 μs — Spinlock strongly preferred. Context switch overhead dominates.
•1-10 μs — Spinlock usually wins. Depends on contention level.
•10-100 μs — Grey zone. Adaptive locks that spin-then-block excel here.
•> 100 μs — Blocking lock preferred. Spinning wastes significant CPU.
•> 1 ms — Spinlock is actively harmful. Use mutex or semaphore.

Contention Amplifies Everything

Scenarios Where Spinlocks Excel

Let's examine specific scenarios where spinlocks are the right choice, with concrete justifications.

1. Interrupt Handlers and Atomic Context

In operating system kernels, certain code paths cannot block. Interrupt handlers, tasklets, and code running with preemption disabled fall into this category.

Why blocking is impossible:

Interrupt handlers don't have a thread context to save/restore
The scheduler cannot be invoked from interrupt context
Blocking would deadlock (the interrupted thread can't run to release anything)

Spinlocks are the only option here.

interrupt_handler.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Interrupt handler example - must use spinlock
 
static DEFINE_SPINLOCK(device_lock);
static struct device_data shared_data;
 
// This function is called from interrupt context - cannot sleep!
irqreturn_t device_interrupt(int irq, void *dev_id) {
    unsigned long flags;
    
    // spin_lock_irqsave: acquire spinlock, disable interrupts
    spin_lock_irqsave(&device_lock, flags);
    
    // Access shared data - protected by spinlock
    process_device_interrupt(&shared_data);
    
    spin_unlock_irqrestore(&device_lock, flags);
    
    return IRQ_HANDLED;
}
 
// If we used a mutex here:
// mutex_lock() → might sleep → scheduler invoked → 
// scheduler tries to switch contexts → crash or hang

2. Protecting Scheduler Data Structures

The scheduler itself needs synchronization. But it cannot block on a lock because blocking requires the scheduler. This is the ultimate chicken-and-egg problem.

The runqueue lock in Linux protects per-CPU scheduler data. It must be a spinlock because:

The scheduler must access it to perform context switches
Blocking while holding it would prevent any thread from running
It's held for very short durations (queue manipulation only)

scheduler_lock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Simplified runqueue lock usage in scheduler
 
struct rq {
    raw_spinlock_t lock;          // Must be spinlock - cannot block
    struct list_head active;       // Runnable processes
    struct task_struct *current;   // Currently running
    // ...
};
 
// Context switch path - cannot use blocking lock here
static void context_switch(struct rq *rq, struct task_struct *prev,
                          struct task_struct *next) {
    // Already holding rq->lock (spinlock)
    
    // Perform the actual switch
    switch_to(prev, next);
    
    // Lock might be released in the switched-to context
}
 
// Schedule function - finds next task to run
void __schedule(void) {
    struct rq *rq = this_rq();
    
    raw_spin_lock(&rq->lock);    // MUST be spinlock
    
    // Find next task, update accounting
    next = pick_next_task(rq);
    
    if (prev != next) {
        context_switch(rq, prev, next);
    }
    
    raw_spin_unlock(&rq->lock);
}

3. Short Critical Sections with High Frequency

When locks are acquired very frequently and held for very short durations, even small context switch overhead accumulates.

Example: Per-packet network processing

Math:

1 million packets/second × 5 μs context switch overhead = 5 seconds of overhead per second
This is clearly impossible—the system would collapse

Spinlocks with sub-microsecond hold times keep overhead manageable.

4. Multiprocessor-Only Synchronization

Scenario: You have multiple CPUs that need to coordinate, but you know for certain that:

The lock holder is running on a different CPU
That CPU will make progress without interruption
The lock will be released very quickly

In this scenario, spinning is efficient because:

The holder IS making progress (unlike single-CPU where spinning prevents holder progress)
The wait time is bounded by the critical section length
No context switch is needed—just wait for the other CPU

Real-Time Systems

When Spinlocks Are Harmful

Understanding when NOT to use spinlocks is equally important. These anti-patterns cause real problems in production systems.

1. Long Critical Sections

If the lock is held for milliseconds or longer, spinning threads waste enormous CPU time. Every spinning thread burns 100% of its core doing nothing useful.

long_critical_section.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// ANTI-PATTERN: Spinlock with I/O in critical section
 
spinlock_t bad_lock;
 
void process_file(const char *path) {
    spin_lock(&bad_lock);
    
    // BAD: File I/O can take milliseconds or longer
    FILE *f = fopen(path, "r");
    while (fgets(buffer, sizeof(buffer), f)) {
        process_line(buffer);
    }
    fclose(f);
    
    spin_unlock(&bad_lock);
    
    // A thread spinning on this lock might wait 100ms+
    // That's 100,000,000 spin iterations = pure waste
}
 
// CORRECT: Use mutex for I/O-bound operations
pthread_mutex_t good_lock;
 
void process_file_correct(const char *path) {
    pthread_mutex_lock(&good_lock);
    
    // OK: Mutex will put waiting threads to sleep
    FILE *f = fopen(path, "r");
    while (fgets(buffer, sizeof(buffer), f)) {
        process_line(buffer);
    }
    fclose(f);
    
    pthread_mutex_unlock(&good_lock);
}

2. Sleeping While Holding Spinlock

This is a cardinal sin in kernel programming. If you sleep while holding a spinlock, you've created a logical impossibility:

sleep_with_spinlock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// FATAL BUG: Sleeping while holding spinlock
 
spinlock_t lock;
 
void buggy_function(void) {
    spin_lock(&lock);          // Acquire spinlock
    
    // BUG: kmalloc with GFP_KERNEL can sleep!
    buffer = kmalloc(size, GFP_KERNEL);
    
    spin_unlock(&lock);
}
 
// What happens:
// 1. Thread A acquires spinlock
// 2. Thread A calls kmalloc, which triggers page reclaim
// 3. Page reclaim needs to swap pages to disk - BLOCKS
// 4. Thread A is now asleep while holding spinlock
// 5. Thread B tries to acquire spinlock - SPINS FOREVER
// 6. If B has higher priority than A: priority inversion deadlock
// 7. System hangs
 
// CORRECT: Use GFP_ATOMIC (non-sleeping) OR use mutex instead
void correct_function(void) {
    spin_lock(&lock);
    buffer = kmalloc(size, GFP_ATOMIC);  // Never sleeps
    spin_unlock(&lock);
}

Operations That Can Sleep

•Memory allocation — GFP_KERNEL, vmalloc, or any allocation that might trigger page reclaim
•File/disk I/O — read(), write(), anything touching filesystem
•Network I/O — send(), recv(), anything waiting for packets
•Kernel message printing — printk() can sleep in some configurations
•Copying to/from userspace — copy_to_user(), copy_from_user() can page fault
•Acquiring a mutex — mutex_lock() is designed to sleep
•Waiting on completion — wait_for_completion() variants

3. Single-CPU Systems Without Preemption Control

On a single CPU without preemption disabled, spinlocks guarantee deadlock:

single_cpu_deadlock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// DEADLOCK on single-CPU system
 
void thread_a(void) {
    spin_lock(&lock);       // Acquire lock
    // ... critical section ...
    // PREEMPTED HERE by scheduler
}
 
void thread_b(void) {
    // Now running (thread A was preempted)
    spin_lock(&lock);       // Lock is held by A
                            // B spins waiting for A
                            // But A can't run - only one CPU!
                            // DEADLOCK
}
 
// The fix: spinlocks MUST disable preemption
// Linux's spin_lock() includes preempt_disable()

4. Very High Contention

When many threads contend for the same spinlock, each waiter burns a CPU. With 32 threads contending for one lock, you could have 31 CPUs spinning uselessly.

Signs of excessive contention:

High CPU utilization with low actual throughput
System appears 'busy' but work isn't completing
Adding more CPUs makes performance worse

Solutions:

Use finer-grained locking (partition the data structure)
Switch to lock-free algorithms
Use blocking locks (at least waiters don't burn CPU)
Consider RCU or reader-writer locks for read-heavy workloads

The Power Cost

Kernel vs. Userspace Considerations

The choice between spinlocks and blocking locks varies significantly between kernel and userspace code.

Kernel Context

The kernel has obligations and capabilities that influence synchronization choices:

Obligations:

Must handle interrupts responsively
Must not leave spinlocks held across schedule() calls
Must prevent priority inversion on critical paths
Must be fair to all processes

Capabilities:

Can disable preemption and interrupts
Has direct access to scheduler state
Can use specialized primitives (rcu_read_lock, seqlocks)

Kernel Synchronization Primitive Selection
Situation	Recommended Primitive	Rationale
Interrupt handler	spinlock_t with IRQ disable	Cannot sleep; must be fast
Scheduler code	raw_spinlock_t	Cannot invoke scheduler
Device driver (process ctx)	mutex (for I/O), spinlock (for registers)	Match to operation type
Read-heavy data	rwlock or RCU	Readers don't block each other
Counting resources	semaphore	Can count > 1
Sleeping with lock	mutex only	Spinlocks prohibit sleep

Userspace Context

Userspace applications have different constraints:

Challenges:

A spinlocking thread can be preempted at any time by the kernel
No way to disable preemption or interrupts
If the lock holder is preempted, spinners waste their entire time slice
Priority inversion is common and hard to detect

Consequences:

Pure spinlocks in userspace are rarely appropriate
Adaptive locks (spin briefly, then block) are the standard
Libraries like pthreads implement this hybrid approach

userspace_consideration.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Userspace spinlock dangers illustrated
 
// Scenario: Thread A holds spinlock, gets preempted
void thread_a(void) {
    spin_lock(&lock);
    // Thread A is preempted here by the kernel scheduler
    // (its 10ms time slice expired)
    process_data();
    spin_unlock(&lock);
}
 
// Thread B scheduled on same CPU
void thread_b(void) {
    spin_lock(&lock);  // Spins for up to 10ms until A gets CPU again!
    // ...
}
 
// Without disabling preemption (only possible in kernel),
// userspace spinlocks can waste entire time slices.
 
// Better approach: pthread_mutex with adaptive spinning
// The library implements smart spin-then-block semantics
pthread_mutex_t adaptive_mutex = PTHREAD_MUTEX_INITIALIZER;
 
void thread_safe(void) {
    pthread_mutex_lock(&adaptive_mutex);
    // Library decides whether to spin briefly or block immediately
    // based on observed contention and system state
    process_data();
    pthread_mutex_unlock(&adaptive_mutex);
}

Threading Library Intelligence

Adaptive Locks: The Best of Both Worlds

The Adaptive Strategy

Start by spinning — For short waits, this avoids context switch overhead
Track the lock holder's state — Is the holder actively running?
Transition to blocking if symptoms suggest long wait — Don't waste CPU indefinitely

adaptive_lock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Conceptual adaptive lock implementation
 
typedef struct {
    atomic_int locked;
    atomic_int owner_cpu;           // Which CPU holds the lock
    atomic_int owner_sleeping;      // Is holder potentially blocked
    wait_queue_t waiters;
} adaptive_lock_t;
 
void adaptive_lock(adaptive_lock_t *lock) {
    int spin_count = 0;
    const int MAX_SPIN = 1000;      // Empirically tuned
    
    while (1) {
        // Try to acquire
        if (try_lock(lock)) {
            return;  // Got it!
        }
        
        // Decide: spin or block?
        if (spin_count < MAX_SPIN && 
            lock->owner_cpu != current_cpu() && 
            !lock->owner_sleeping) {
            
            // Owner is running on another CPU - spin a bit
            spin_count++;
            cpu_relax();
            
        } else {
            // Either we've spun enough, or owner might be blocked
            // Time to sleep
            add_to_wait_queue(&lock->waiters, current_thread());
            
            // Double-check lock is still held before sleeping
            if (lock->locked) {
                schedule();  // Block until woken
            } else {
                remove_from_wait_queue(&lock->waiters, current_thread());
            }
            
            spin_count = 0;  // Reset for next attempt
        }
    }
}
 
void adaptive_unlock(adaptive_lock_t *lock) {
    release_lock(lock);
    
    // Wake one waiter if any
    wake_up_one(&lock->waiters);
}

Linux Mutex: A Real Adaptive Lock

The Linux kernel mutex (since 2.6.x) implements adaptive spinning:

Fast path: If lock is free, acquire via single atomic CAS
Spinning phase: If lock is held by a running thread on another CPU, spin waiting
Blocking phase: If holder is sleeping or spin count exceeded, add to wait queue and sleep
Wakeup: When lock is released, owner wakes one waiter

This design achieves near-spinlock latency for short critical sections while being well-behaved under contention or long hold times.

Adaptive Lock Decision Matrix
Condition	Action	Rationale
Lock is free	CAS acquire, enter CS	No waiting needed
Holder running, other CPU	Spin briefly	Will release soon
Holder running, same CPU	Block immediately	Spinning can't help (single core)
Holder is sleeping	Block immediately	Might wait a long time
Spin count exceeded	Block	Heuristic: wait is unexpectedly long
Just woke from block	Try CAS, then spin	Might succeed immediately

Parking Lot in Rust

A Practical Decision Framework

Let's synthesize what we've learned into a practical decision process. When facing a synchronization problem, work through these questions:

Spinlock Decision Flowchart

•Can the code sleep? If no (interrupt context, scheduler code), USE SPINLOCK. No other option exists.
•Is this in kernel or userspace? Userspace: prefer adaptive primitives (pthread_mutex, parking_lot). Kernel: continue evaluation.
•What's the critical section duration? Under ~5μs: spinlock often wins. Over ~100μs: use mutex. Between: consider adaptive.
•What's the contention level? Low: spinlock overhead is minimal. High: blocking prevents CPU waste from many spinners.
•Any sleepable operations in critical section? File I/O, most allocations, network: USE MUTEX. Spinlocks cannot protect code that sleeps.
•Does real-time determinism matter? Yes: spinlocks have bounded latency. Blocking has variable latency.
•Is power consumption critical? Yes (mobile, laptops): avoid spinlocks—they prevent low-power states.

Quick Reference

Here's a condensed reference for common scenarios:

When to Use Which Primitive
Scenario	Primitive	Why
Interrupt handler	spin_lock_irqsave()	Can't sleep; need IRQ disable
Scheduler internals	raw_spinlock	Can't invoke scheduler
Device register access	spin_lock()	Fast, no I/O
File/disk operations	mutex	I/O can sleep
Network buffer management	spinlock or RCU	Speed-critical, short hold
Database transaction	mutex + condition variable	Complex waiting patterns
Counter increment	Atomic operations	Lock-free is possible
Read-heavy data	rwlock or RCU	Readers shouldn't block readers
Userspace general purpose	pthread_mutex	Adaptive implementation
Userspace high-performance	parking_lot / futex	Advanced adaptive

When in Doubt, Measure

Common Selection Mistakes

Even experienced engineers make synchronization primitive selection errors. Here are the most common mistakes and how to avoid them.

Selection Anti-patterns

•'Spinlocks are faster' misconception — Yes, for sub-microsecond holds. No, not universally. Blocking for 1ms wastes less total CPU than spinning for 1ms.
•Ignoring hold time variance — Average hold time of 1μs but occasional 10ms holds? Spinlock is wrong. Use adaptive locks.
•Forgetting the holder-gets-preempted scenario — In userspace, pure spinlocks can waste entire scheduler quanta. Kernel: disable preemption.
•Using spinlock with sleeping code — Adding logging, allocation, or I/O to a spinlock-protected section is an eventual bug.
•Over-spinning in try-lock loops — while (!try_lock(l)) { } is identical to a spinlock. Limit attempts or add backoff.
•Choosing based on single-threaded behavior — Code that 'never contends in testing' becomes contention nightmare in production with real load.

selection_mistakes.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// MISTAKE 1: Spinlock for variable-duration critical section
spinlock_t cache_lock;
 
void update_cache(key, value) {
    spin_lock(&cache_lock);
    
    // Usually fast, but...
    entry = hash_lookup(key);  // Fast: O(1) typically
    
    if (cache_full()) {
        evict_entries();       // SLOW: might evict thousands of entries!
                               // Other threads spin for 10+ ms
    }
    
    insert(key, value);
    spin_unlock(&cache_lock);
}
 
// FIX: Use mutex, or restructure to do eviction outside lock
 
// MISTAKE 2: Busy-loop on try_lock
void bad_try_lock(lock_t *lock) {
    while (!try_lock(lock)) {
        // "I'm using try_lock, not spin_lock, so it's fine" - WRONG
        // This IS spin locking, just spelled differently
    }
}
 
// FIX: Add backoff, limit attempts, or just use the appropriate lock type
void better_try_lock(lock_t *lock) {
    int attempts = 0;
    while (!try_lock(lock)) {
        if (++attempts > 1000) {
            // Fall back to blocking
            blocking_lock(lock);
            return;
        }
        backoff(attempts);  // Exponential backoff
    }
}

The LockDep Tool

Summary: When Spinlocks Are Appropriate

We've developed a nuanced understanding of when spinlocks are the right tool. Let's consolidate the key insights:

Key Takeaways

•The break-even point is context switch time (~5-10μs) — Spinlocks win for holds shorter than this; blocking wins for longer holds.
•Atomic context mandates spinlocks — Interrupt handlers, scheduler code, and anywhere that cannot sleep must use spinlocks.
•Userspace generally prefers adaptive locks — pthread_mutex and similar implement spin-then-block, giving best of both worlds.
•Never sleep with spinlock held — This is a guaranteed bug. I/O, most allocations, and any blocking call are forbidden.
•Contention amplifies costs — High contention means many spinners wasting CPU. Consider lock-free, finer granularity, or blocking.
•Power efficiency matters — Spinning prevents CPU low-power states. On mobile devices or in cost-conscious clouds, this matters.
•When uncertain, use adaptive primitives — Modern locks handle the decision automatically with good heuristics.

What's next:

Page Complete

3 / 5