Operating SystemsTest-and-Set Lock (TSL)

Test-and-Set Lock (TSL): Hardware-Assisted Synchronization

LevelIntermediate

Duration75 mins

TopicTest-and-Set Lock (TSL)

4 / 5

Lock Implementation with TSL: Building Complete Synchronization

From Spinlock to Mutex: The Complete Picture

A spinlock is just the beginning. Production synchronization requires more: the ability to block when spinning becomes wasteful, wake efficiently when the lock becomes available, handle priority correctly, and provide fairness guarantees. These capabilities transform a simple spinlock into a full-featured mutex (mutual exclusion lock).

In this page, we'll build upon our Test-and-Set foundation to construct complete lock abstractions. We'll examine how operating systems combine hardware primitives with kernel services to create the locks that programmers actually use. Understanding this architecture reveals why different lock types exist and when to choose each.

The goal is architectural understanding—seeing how simple atomic operations compose into sophisticated synchronization primitives that balance competing concerns: latency, throughput, fairness, and efficiency.

What You Will Learn

By the end of this page, you will understand: (1) The architecture of mutex implementations, (2) How to combine spinning with blocking, (3) Wait queue management and wake-up strategies, (4) Implementing fairness and preventing starvation, and (5) Real-world mutex implementations in Linux and POSIX.

The Mutex Architecture

A mutex (mutual exclusion lock) extends the basic spinlock with capabilities that make it suitable for general-purpose synchronization. Unlike spinlocks that waste CPU cycles while waiting, mutexes can block—the waiting thread yields the CPU and sleeps until the lock becomes available.

Components of a Mutex:

Mutex Structural Components

•State variable — Tracks lock status (unlocked, locked, locked with waiters). Uses atomic operations for thread-safe updates.
•Wait queue — List of threads blocked waiting for the lock. Organized for efficient insertion and removal.
•Owner tracking — Records which thread holds the lock. Enables recursive locking and debugging.
•Guard lock — Protects the wait queue from concurrent modification. Often a lightweight spinlock.
•Blocking mechanism — Interface to the scheduler for putting threads to sleep and waking them.

mutex_structure.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Complete mutex structure
typedef struct mutex {
    // Primary state: 0=unlocked, 1=locked, 2+=locked with waiters
    atomic_int state;
    
    // Wait queue for blocked threads
    struct list_head wait_queue;
    
    // Spinlock protecting the wait queue
    spinlock_t wait_lock;
    
    // Owner thread (for recursive locks and debugging)
    struct thread *owner;
    
    // Recursion count (for recursive mutexes)
    int recursion_count;
    
    #ifdef DEBUG_MUTEX
    const char *name;
    struct mutex_debug_info debug;
    #endif
} mutex_t;
 
// Wait queue entry for each blocked thread
typedef struct mutex_waiter {
    struct list_node node;      // Linkage in wait queue
    struct thread *thread;      // The waiting thread
    int flags;                  // Priority, timeout info
} mutex_waiter_t;
 
// Initialize a mutex
void mutex_init(mutex_t *mutex) {
    atomic_store(&mutex->state, 0);
    list_init(&mutex->wait_queue);
    spinlock_init(&mutex->wait_lock);
    mutex->owner = NULL;
    mutex->recursion_count = 0;
}

The Three-State Model:

Production mutexes often use a three-state model for efficiency:

State 0: Unlocked — Lock is available for immediate acquisition
State 1: Locked, no waiters — Lock is held, but no threads are blocked waiting
State 2+: Locked with waiters — Lock is held, and at least one thread is sleeping

This distinction matters for the fast path: if state is 1 when unlocking, we know no one needs to be woken and can skip the expensive wake-up path.

Why Three States?

Without the 'locked with waiters' state, every unlock would need to check the wait queue (expensive). With it, unlock can check state atomically: if state==1, just set to 0; if state==2, take the slow path and wake a waiter. This optimization is critical for uncontended performance.

Lock/Unlock Protocol Overview:

mutex_protocol.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Mutex Lock Protocol:
=====================
1. Try fast path: atomic CAS (0 → 1)
   If success: Lock acquired! Set owner, return
   
2. Slow path: spin briefly (if enabled)
   For N iterations: test-and-test-and-set
   If acquired during spin: Set owner, return
   
3. Block path: prepare to sleep
   a. Take wait_lock spinlock
   b. Add self to wait queue
   c. Update state to indicate waiters (1 → 2)
   d. Release wait_lock
   e. Call scheduler to sleep
   f. (wake up) → remove from queue, retry from step 1
 
Mutex Unlock Protocol:
======================
1. Clear owner, release memory barrier
   
2. Fast path check: was state == 1?
   If yes: atomic CAS (1 → 0)
   If success: Done! No waiters to wake
   
3. Slow path: wake a waiter
   a. Take wait_lock spinlock
   b. Remove first waiter from queue
   c. If queue now empty: state = 1, else state = 2
   d. Release wait_lock
   e. Wake the removed thread

Fast Path Optimization: Uncontended Lock Performance

The key insight in mutex design is that most lock operations are uncontended. In well-designed systems, locks are held briefly and contention is rare. Therefore, the most important optimization is making the uncontended path as fast as possible.

The Ideal Fast Path:

For an uncontended lock, the entire operation should be:

One atomic Compare-and-Swap (state: 0 → 1)
A few instructions to record owner (if needed)
Return

No function calls, no branches taken, no memory allocation, no kernel involvement.

Similarly, uncontended unlock:

Clear owner
One atomic Compare-and-Swap (state: 1 → 0)
Return

Again, pure user-space, predictable, fast.

fast_path_mutex.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Fast-path optimized mutex
// Designed for uncontended case to be maximum speed
 
void mutex_lock(mutex_t *mutex) {
    // FAST PATH: Try to acquire immediately
    // Single atomic operation, no branches in common case
    if (likely(atomic_compare_exchange_strong_explicit(
            &mutex->state,
            &(int){0},    // Expected: unlocked
            1,            // Desired: locked  
            memory_order_acquire,
            memory_order_relaxed))) {
        // Success! Lock acquired on first try
        mutex->owner = current_thread();
        return;
    }
    
    // SLOW PATH: Lock is contended
    mutex_lock_slowpath(mutex);
}
 
void mutex_unlock(mutex_t *mutex) {
    mutex->owner = NULL;
    
    // FAST PATH: Try to release if no waiters
    if (likely(atomic_compare_exchange_strong_explicit(
            &mutex->state,
            &(int){1},    // Expected: locked, no waiters
            0,            // Desired: unlocked
            memory_order_release,
            memory_order_relaxed))) {
        // Success! No waiters, we're done
        return;
    }
    
    // SLOW PATH: There are waiters to wake
    mutex_unlock_slowpath(mutex);
}
 
// likely() macro for branch prediction hint
#define likely(x)   __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)

Measuring Fast Path Performance:

Mutex Fast Path Latency (Uncontended)
Operation	Cycles (x86-64)	Time @ 3GHz
Lock (CAS success)	15-25	5-8 ns
Unlock (CAS success)	10-20	3-7 ns
Lock+Unlock round trip	25-45	8-15 ns
For comparison: function call	5-10	2-3 ns
For comparison: L3 cache miss	40-60	13-20 ns

Inlining Critical Paths

The fast path is often inlined at call sites, eliminating function call overhead. The slow path (which handles blocking) remains a separate function to keep the code size of the fast path minimal, improving instruction cache efficiency.

Avoiding Unnecessary Work:

Fast-path design principles:

No branches in common case: Use CAS return value directly as success indicator
Minimal memory accesses: Only touch the state variable in fast path
No kernel calls: Everything stays in user space for uncontended case
No memory allocation: All structures pre-allocated
Cache-friendly: State variable should be in caller's cache

Slow Path: Spinning, Blocking, and Wake-up

When the fast path fails, we enter the slow path. This is where the interesting design decisions happen: how long to spin before blocking, how to manage the wait queue, and how to wake waiters efficiently.

Adaptive Spinning:

Before blocking, it often makes sense to spin briefly. If the lock holder is actively running and about to release, spinning can be faster than the context switch overhead of blocking and waking.

adaptive_spin.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
// Slow path with adaptive spinning
noinline void mutex_lock_slowpath(mutex_t *mutex) {
    int spins = 0;
    const int MAX_SPINS = 100;  // Tunable parameter
    
    // Phase 1: Adaptive spinning
    while (spins < MAX_SPINS) {
        // Check if lock is available
        if (atomic_load_explicit(&mutex->state, memory_order_relaxed) == 0) {
            // Try to acquire
            if (atomic_compare_exchange_weak_explicit(
                    &mutex->state, &(int){0}, 1,
                    memory_order_acquire, memory_order_relaxed)) {
                mutex->owner = current_thread();
                return;  // Acquired during spin!
            }
        }
        
        // Check if holder is still running (adaptive part)
        if (mutex->owner && !is_running(mutex->owner)) {
            // Holder is not on CPU - blocking is better than spinning
            break;
        }
        
        cpu_relax();
        spins++;
    }
    
    // Phase 2: Prepare to block
    mutex_waiter_t waiter;
    waiter.thread = current_thread();
    waiter.flags = 0;
    
    // Add ourselves to wait queue
    spin_lock(&mutex->wait_lock);
    
    // Double-check: lock might have been released
    if (atomic_load_explicit(&mutex->state, memory_order_relaxed) == 0) {
        if (atomic_compare_exchange_strong_explicit(
                &mutex->state, &(int){0}, 1,
                memory_order_acquire, memory_order_relaxed)) {
            spin_unlock(&mutex->wait_lock);
            mutex->owner = current_thread();
            return;  // Acquired after taking wait_lock
        }
    }
    
    // Add to queue and mark that we're waiting
    list_add_tail(&waiter.node, &mutex->wait_queue);
    
    // Set state to indicate waiters
    int expected = 1;
    atomic_compare_exchange_strong(&mutex->state, &expected, 2);
    
    spin_unlock(&mutex->wait_lock);
    
    // Phase 3: Block and wait
    while (1) {
        // Sleep until woken
        scheduler_sleep(&waiter);
        
        // Woken up - try to acquire
        spin_lock(&mutex->wait_lock);
        
        // We might have been woken spuriously or lost race
        if (atomic_load(&mutex->state) == 0) {
            if (atomic_compare_exchange_strong(
                    &mutex->state, &(int){0}, 
                    list_empty(&mutex->wait_queue) ? 1 : 2,
                    memory_order_acquire, memory_order_relaxed)) {
                list_del(&waiter.node);
                spin_unlock(&mutex->wait_lock);
                mutex->owner = current_thread();
                return;  // Finally acquired!
            }
        }
        
        spin_unlock(&mutex->wait_lock);
        // Race lost, go back to sleep
    }
}

Adaptive Spinning Complexity

Checking if the lock holder is running (is_running()) requires accessing scheduler data, which may itself require synchronization. This check must be very cheap or the benefit of avoiding unnecessary blocking is lost. Some implementations approximate this with heuristics rather than exact checking.

Unlock and Wake-up:

The unlock slow path must efficiently wake exactly one waiter (or all waiters, depending on design):

mutex_unlock_slowpath.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Unlock slow path: wake a waiter
noinline void mutex_unlock_slowpath(mutex_t *mutex) {
    mutex_waiter_t *waiter = NULL;
    
    spin_lock(&mutex->wait_lock);
    
    // Get first waiter from queue
    if (!list_empty(&mutex->wait_queue)) {
        waiter = list_first_entry(&mutex->wait_queue, 
                                  mutex_waiter_t, node);
        list_del(&waiter->node);
    }
    
    // Update state based on remaining waiters
    if (list_empty(&mutex->wait_queue)) {
        // No more waiters - set to unlocked or locked depending on handoff
        atomic_store_explicit(&mutex->state, 0, memory_order_release);
    } else {
        // Still have waiters - keep state=2
        // Someone else will need to unlock again
    }
    
    spin_unlock(&mutex->wait_lock);
    
    // Wake the waiter outside the lock
    // This avoids "thundering herd" within the wait_lock critical section
    if (waiter) {
        scheduler_wake(waiter->thread);
    }
}

Wake Strategies:

There are several approaches to waking waiters:

Wake-one (default): Wake the first waiter. They compete with new arrivals.
Handoff: Transfer ownership directly to the waiter, preventing starvation.
Wake-all: Wake all waiters on unlock (causes thundering herd).
Batched wake: Wake a few waiters, balancing responsiveness and overhead.

Wake Strategy Comparison
Strategy	Fairness	Throughput	Latency	Use Case
Wake-one	Low	High	Variable	General purpose
Handoff	High	Medium	Predictable	Fairness critical
Wake-all	High-ish	Low	Low (first)	Broadcast signals
Batched	Medium	High	Medium	High contention

Fairness and Starvation Prevention

Basic TAS spinlocks provide no fairness guarantees—a thread can wait indefinitely while others repeatedly acquire the lock. For many applications, this is unacceptable. Building fair locks using TAS requires additional mechanisms.

The Fairness Problem:

Consider a workload where threads T1-T10 all need the lock. With a basic TAS lock:

T1 might acquire 1000 times
T9 might acquire 10 times
T10 might wait forever (starvation)

This unfairness arises because TAS doesn't remember who arrived first—it's a free-for-all.

Starvation in Production

Starvation isn't theoretical. In a 2016 incident, a major website went down because a single thread held a lock while other threads starved. The starving threads included the health check handler, so the load balancer kept sending traffic to the 'dead' server. Fairness matters.

Ticket Locks: Fair TAS-based Locking:

Ticket locks use TAS to implement a queueing discipline. The idea is borrowed from bakery ticket systems: take a number, wait for your number to be called.

ticket_lock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Ticket Lock: Fair FIFO ordering using TSL
typedef struct {
    atomic_uint next_ticket;  // Next ticket to be issued
    atomic_uint now_serving;  // Ticket currently being served
} ticket_lock_t;
 
#define TICKET_LOCK_INIT { .next_ticket = 0, .now_serving = 0 }
 
void ticket_lock(ticket_lock_t *lock) {
    // Atomically get our ticket number
    unsigned int my_ticket = atomic_fetch_add_explicit(
        &lock->next_ticket, 1, memory_order_relaxed);
    
    // Wait until our number is called
    while (atomic_load_explicit(
            &lock->now_serving, memory_order_acquire) != my_ticket) {
        cpu_relax();
    }
    // Lock acquired!
}
 
void ticket_unlock(ticket_lock_t *lock) {
    // Advance to next ticket
    atomic_fetch_add_explicit(
        &lock->now_serving, 1, memory_order_release);
}
 
// Analysis:
// - FIFO fairness: threads served in arrival order
// - No starvation: bounded wait (at most N-1 threads ahead)
// - Simple implementation
//
// Drawbacks:
// - All waiters spin on same cache line (now_serving) - not scalable
// - More expensive than basic TAS in uncontended case

MCS Locks: Scalable Fair Locking:

The MCS lock (Mellor-Crummey and Scott, 1991) provides fairness while eliminating the cache line bouncing problem of ticket locks. Each waiter spins on its own private variable.

mcs_lock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
// MCS Lock: Scalable, fair spinlock
// Each waiter spins on a local variable, eliminating cache line bouncing
 
typedef struct mcs_node {
    struct mcs_node *next;
    atomic_int locked;  // 0 = has lock, 1 = waiting
} mcs_node_t;
 
typedef struct {
    atomic_ptr tail;  // Tail of queue (or NULL if unlocked)
} mcs_lock_t;
 
void mcs_lock(mcs_lock_t *lock, mcs_node_t *node) {
    node->next = NULL;
    atomic_store(&node->locked, 1);  // We're waiting
    
    // Atomically add ourselves to the queue
    mcs_node_t *prev = atomic_exchange(&lock->tail, node);
    
    if (prev == NULL) {
        // Queue was empty - we have the lock!
        return;
    }
    
    // Link ourselves to predecessor
    prev->next = node;
    
    // Spin on OUR OWN locked flag (not shared!)
    while (atomic_load_explicit(&node->locked, memory_order_acquire)) {
        cpu_relax();
    }
    // Predecessor woke us - we have the lock
}
 
void mcs_unlock(mcs_lock_t *lock, mcs_node_t *node) {
    // Check if we're the only one
    if (node->next == NULL) {
        // Try to atomically clear the tail
        mcs_node_t *expected = node;
        if (atomic_compare_exchange_strong(&lock->tail, &expected, NULL)) {
            // Success - lock is now free
            return;
        }
        // Someone is in the process of queueing behind us
        // Wait for them to link
        while (node->next == NULL) {
            cpu_relax();
        }
    }
    
    // Wake our successor by clearing their locked flag
    atomic_store_explicit(&node->next->locked, 0, memory_order_release);
}
 
// Usage requires caller to provide per-thread mcs_node:
// thread_local mcs_node_t my_node;
// mcs_lock(&lock, &my_node);
// ... critical section ...
// mcs_unlock(&lock, &my_node);

Fair Lock Comparison
Lock Type	Fairness	Cache Behavior	Per-Thread State	Complexity
TAS Spinlock	None	All spin on one line	None	Simple
Ticket Lock	FIFO	All spin on one line	O(1)	Simple
MCS Lock	FIFO	Spin on private line	O(N) total	Moderate
CLH Lock	FIFO	Spin on pred's line	O(N) total	Moderate

Linux qspinlock

Linux combines ticket lock simplicity with MCS scalability in its qspinlock. Under low contention, it behaves like a ticket lock. Under high contention, waiters form an MCS-style queue. This adaptive approach achieves both good fast-path performance and scalable contention handling.

Real-World Mutex Implementations

Let's examine how major operating systems and libraries implement mutexes, seeing how they balance the competing concerns we've discussed.

Linux Kernel Mutex:

The Linux kernel mutex (not to be confused with futex) is a sleeping lock optimized for the uncontended case.

linux_mutex_simplified.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Simplified Linux kernel mutex implementation
// Actual implementation is in kernel/locking/mutex.c
 
struct mutex {
    atomic_long_t owner;      // Owner task + flags in low bits
    spinlock_t wait_lock;     // Protects wait_list
    struct list_head wait_list;
    #ifdef CONFIG_DEBUG_MUTEXES
    void *magic;
    struct lockdep_map dep_map;
    #endif
};
 
// Flags stored in low bits of 'owner' pointer
#define MUTEX_FLAG_WAITERS    0x01
#define MUTEX_FLAG_HANDOFF    0x02
#define MUTEX_FLAG_PICKUP     0x04
 
void mutex_lock(struct mutex *lock) {
    // Fast path: single atomic if uncontended
    if (!__mutex_trylock_fast(lock)) {
        // Slow path: optimistic spin or block
        __mutex_lock_slowpath(lock);
    }
}
 
static __always_inline bool __mutex_trylock_fast(struct mutex *lock) {
    // Attempt CAS: NULL -> current_task
    unsigned long curr = (unsigned long)current;
    unsigned long zero = 0UL;
    
    return atomic_long_try_cmpxchg_acquire(&lock->owner, &zero, curr);
}
 
void __mutex_lock_slowpath(struct mutex *lock) {
    // 1. Optimistic spinning (MCS-style queue for spinning)
    //    Only if owner is running on a CPU
    
    // 2. If spinning fails or not appropriate, block
    //    Add to wait_list, set WAITERS flag, sleep
    
    // 3. Handle handoff for fairness
    //    After sleeping too long, request handoff
}

POSIX pthread_mutex:

POSIX threads provide pthread_mutex_t with configurable behavior:

pthread_mutex_usage.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// POSIX mutex with various configurations
#include <pthread.h>
 
// Normal mutex (default)
pthread_mutex_t normal_mutex = PTHREAD_MUTEX_INITIALIZER;
 
// Error-checking mutex (detects deadlock attempts)
pthread_mutex_t errorcheck_mutex;
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ERRORCHECK);
pthread_mutex_init(&errorcheck_mutex, &attr);
 
// Recursive mutex (same thread can lock multiple times)
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
pthread_mutex_init(&recursive_mutex, &attr);
 
// Robust mutex (handles owner death)
pthread_mutexattr_setrobust(&attr, PTHREAD_MUTEX_ROBUST);
 
// Priority inheritance (prevents priority inversion)
pthread_mutexattr_setprotocol(&attr, PTHREAD_PRIO_INHERIT);
 
// Lock operations
int rc = pthread_mutex_lock(&normal_mutex);      // Block until acquired
rc = pthread_mutex_trylock(&normal_mutex);       // Non-blocking attempt
struct timespec ts;
rc = pthread_mutex_timedlock(&normal_mutex, &ts); // Timeout
 
// Unlock
pthread_mutex_unlock(&normal_mutex);
 
// Error checking examples:
// PTHREAD_MUTEX_ERRORCHECK:
//   - Returns EDEADLK if thread tries to lock mutex it owns
//   - Returns EPERM if thread tries to unlock mutex it doesn't own

Windows Critical Sections:

Windows provides CRITICAL_SECTION, a hybrid spinlock/mutex:

windows_critical_section.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Windows Critical Section
#include <windows.h>
 
CRITICAL_SECTION cs;
 
// Initialize with spin count
// SpinCount: number of spins before blocking (0-4096)
// Higher on multi-processor systems
InitializeCriticalSectionAndSpinCount(&cs, 4000);
 
// Acquire
EnterCriticalSection(&cs);
// or non-blocking:
if (TryEnterCriticalSection(&cs)) {
    // acquired
}
 
// Critical section work...
 
// Release
LeaveCriticalSection(&cs);
 
// Cleanup
DeleteCriticalSection(&cs);
 
// Slim Reader/Writer Lock (SRW Lock) - lighter weight, no spin count
SRWLOCK srw = SRWLOCK_INIT;
AcquireSRWLockExclusive(&srw);
ReleaseSRWLockExclusive(&srw);
 
// Internal structure (simplified):
typedef struct _RTL_CRITICAL_SECTION {
    PRTL_CRITICAL_SECTION_DEBUG DebugInfo;
    LONG LockCount;        // -1 = free, 0+ = locked + waiters
    LONG RecursionCount;   // Recursive acquisition count
    HANDLE OwningThread;   // Owner thread ID
    HANDLE LockSemaphore;  // Blocking mechanism (auto-allocated)
    ULONG_PTR SpinCount;   // Spin count before blocking
} RTL_CRITICAL_SECTION;

Modern Windows: SRW Locks

For new code, Microsoft recommends SRW locks over Critical Sections. SRW locks are smaller (just one pointer-sized word), faster for uncontended cases, and provide reader/writer semantics. They don't support recursive locking, which is often a design smell anyway.

Priority Inversion and Mitigation

When using locks in systems with priority scheduling, a subtle but dangerous problem arises: priority inversion. A high-priority task can be blocked indefinitely by a lower-priority task, violating the fundamental premise of priority scheduling.

The Mars Pathfinder Bug:

In 1997, NASA's Mars Pathfinder mission experienced repeated system resets due to priority inversion. A low-priority thread holding a mutex was preempted by a medium-priority thread, while a high-priority thread waited for the mutex. The high-priority thread starved, triggering a watchdog reset.

Converting Mermaid diagram...

Priority Inheritance Protocol:

When a high-priority thread blocks on a mutex, temporarily boost the lock holder's priority to match. This prevents medium-priority threads from preempting the holder.

priority_inheritance.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Priority Inheritance implementation sketch
 
void mutex_lock_with_pi(mutex_t *mutex) {
    if (!try_acquire(mutex)) {
        // We're blocked - boost owner's priority
        struct thread *owner = mutex->owner;
        int my_priority = current_thread()->priority;
        
        if (owner->priority < my_priority) {
            // Boost owner to our priority
            boost_priority(owner, my_priority);
        }
        
        // Now block
        block_on_mutex(mutex);
    }
    // Acquired! (priority boosting cleaned up by unlock)
}
 
void mutex_unlock_with_pi(mutex_t *mutex) {
    // Restore original priority (if we were boosted)
    if (current_thread()->base_priority != current_thread()->priority) {
        restore_priority(current_thread());
    }
    
    // Wake highest-priority waiter
    struct thread *waiter = get_highest_priority_waiter(mutex);
    release_lock(mutex);
    if (waiter) {
        wake(waiter);
    }
}

Priority Inversion Solutions

•Priority Inheritance: Lock holder inherits the highest priority of any blocked waiter. Automatic, reactive, can cascade through chains of locks.
•Priority Ceiling: Each mutex has a ceiling priority (max of all threads that might use it). Holder runs at ceiling priority. Prevents blocking chains but requires manual configuration.
•Random Boosting (Windows): Occasionally boost random threads. Probabilistically resolves inversions. Simple but unpredictable.
•Avoid Shared Locks: Redesign to eliminate sharing between priority levels. Best solution when possible.

Real-Time Systems Require Care

Priority inversion is particularly dangerous in real-time systems where deadlines are hard. POSIX provides PTHREAD_PRIO_INHERIT and PTHREAD_PRIO_PROTECT attributes for mutexes. If your system has priority levels and shared locks, you MUST consider inversion—it will bite you eventually.

Summary: Complete Lock Implementation

We have explored the complete architecture of lock implementations built upon Test-and-Set primitives. From simple spinlocks to sophisticated mutexes with fairness, blocking, and priority handling, we've seen how atomic operations compose into production-grade synchronization. Let's consolidate the key insights:

Key Takeaways

•Mutex architecture — Complete mutexes combine state tracking, wait queues, blocking/waking mechanisms, and ownership management beyond simple spinning.
•Fast path is critical — Uncontended cases dominate; optimize for single CAS success. Keep slow path out of line.
•Adaptive spinning — Brief spinning before blocking can outperform pure blocking when lock hold times are short and holders are running.
•Fairness requires explicit design — TAS provides no ordering. Ticket locks and MCS locks add FIFO fairness with different scalability characteristics.
•Real implementations are sophisticated — Linux mutex, pthread_mutex, and Windows Critical Section all combine multiple techniques.
•Priority inversion is dangerous — Mixed-priority systems sharing locks need inheritance or ceiling protocols to maintain scheduling guarantees.

What's Next:

Our final topic in this module addresses a subtle but critical issue: the ABA problem. When using Compare-and-Swap operations (a close relative of Test-and-Set), the possibility of values cycling back to old states creates correctness bugs that are devilishly hard to detect. Understanding and preventing ABA is essential for lock-free programming and certain advanced lock implementations.

Page Complete

You now understand how complete mutex implementations are built from atomic primitives. From fast paths through blocking mechanisms to fairness and priority protocols, you can reason about lock implementations at any level of detail. This knowledge is essential for debugging synchronization issues and designing efficient concurrent systems.

4 / 5

Loading learning content...

Operating SystemsTest-and-Set Lock (TSL)

Test-and-Set Lock (TSL): Hardware-Assisted Synchronization

LevelIntermediate

Duration75 mins

TopicTest-and-Set Lock (TSL)

4 / 5

Lock Implementation with TSL: Building Complete Synchronization

From Spinlock to Mutex: The Complete Picture

What You Will Learn

The Mutex Architecture

Components of a Mutex:

Mutex Structural Components

•State variable — Tracks lock status (unlocked, locked, locked with waiters). Uses atomic operations for thread-safe updates.
•Wait queue — List of threads blocked waiting for the lock. Organized for efficient insertion and removal.
•Owner tracking — Records which thread holds the lock. Enables recursive locking and debugging.
•Guard lock — Protects the wait queue from concurrent modification. Often a lightweight spinlock.
•Blocking mechanism — Interface to the scheduler for putting threads to sleep and waking them.

mutex_structure.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Complete mutex structure
typedef struct mutex {
    // Primary state: 0=unlocked, 1=locked, 2+=locked with waiters
    atomic_int state;
    
    // Wait queue for blocked threads
    struct list_head wait_queue;
    
    // Spinlock protecting the wait queue
    spinlock_t wait_lock;
    
    // Owner thread (for recursive locks and debugging)
    struct thread *owner;
    
    // Recursion count (for recursive mutexes)
    int recursion_count;
    
    #ifdef DEBUG_MUTEX
    const char *name;
    struct mutex_debug_info debug;
    #endif
} mutex_t;
 
// Wait queue entry for each blocked thread
typedef struct mutex_waiter {
    struct list_node node;      // Linkage in wait queue
    struct thread *thread;      // The waiting thread
    int flags;                  // Priority, timeout info
} mutex_waiter_t;
 
// Initialize a mutex
void mutex_init(mutex_t *mutex) {
    atomic_store(&mutex->state, 0);
    list_init(&mutex->wait_queue);
    spinlock_init(&mutex->wait_lock);
    mutex->owner = NULL;
    mutex->recursion_count = 0;
}

The Three-State Model:

Production mutexes often use a three-state model for efficiency:

State 0: Unlocked — Lock is available for immediate acquisition
State 1: Locked, no waiters — Lock is held, but no threads are blocked waiting
State 2+: Locked with waiters — Lock is held, and at least one thread is sleeping

This distinction matters for the fast path: if state is 1 when unlocking, we know no one needs to be woken and can skip the expensive wake-up path.

Why Three States?

Lock/Unlock Protocol Overview:

mutex_protocol.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Mutex Lock Protocol:
=====================
1. Try fast path: atomic CAS (0 → 1)
   If success: Lock acquired! Set owner, return
   
2. Slow path: spin briefly (if enabled)
   For N iterations: test-and-test-and-set
   If acquired during spin: Set owner, return
   
3. Block path: prepare to sleep
   a. Take wait_lock spinlock
   b. Add self to wait queue
   c. Update state to indicate waiters (1 → 2)
   d. Release wait_lock
   e. Call scheduler to sleep
   f. (wake up) → remove from queue, retry from step 1
 
Mutex Unlock Protocol:
======================
1. Clear owner, release memory barrier
   
2. Fast path check: was state == 1?
   If yes: atomic CAS (1 → 0)
   If success: Done! No waiters to wake
   
3. Slow path: wake a waiter
   a. Take wait_lock spinlock
   b. Remove first waiter from queue
   c. If queue now empty: state = 1, else state = 2
   d. Release wait_lock
   e. Wake the removed thread

Fast Path Optimization: Uncontended Lock Performance

The Ideal Fast Path:

For an uncontended lock, the entire operation should be:

One atomic Compare-and-Swap (state: 0 → 1)
A few instructions to record owner (if needed)
Return

No function calls, no branches taken, no memory allocation, no kernel involvement.

Similarly, uncontended unlock:

Clear owner
One atomic Compare-and-Swap (state: 1 → 0)
Return

Again, pure user-space, predictable, fast.

fast_path_mutex.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Fast-path optimized mutex
// Designed for uncontended case to be maximum speed
 
void mutex_lock(mutex_t *mutex) {
    // FAST PATH: Try to acquire immediately
    // Single atomic operation, no branches in common case
    if (likely(atomic_compare_exchange_strong_explicit(
            &mutex->state,
            &(int){0},    // Expected: unlocked
            1,            // Desired: locked  
            memory_order_acquire,
            memory_order_relaxed))) {
        // Success! Lock acquired on first try
        mutex->owner = current_thread();
        return;
    }
    
    // SLOW PATH: Lock is contended
    mutex_lock_slowpath(mutex);
}
 
void mutex_unlock(mutex_t *mutex) {
    mutex->owner = NULL;
    
    // FAST PATH: Try to release if no waiters
    if (likely(atomic_compare_exchange_strong_explicit(
            &mutex->state,
            &(int){1},    // Expected: locked, no waiters
            0,            // Desired: unlocked
            memory_order_release,
            memory_order_relaxed))) {
        // Success! No waiters, we're done
        return;
    }
    
    // SLOW PATH: There are waiters to wake
    mutex_unlock_slowpath(mutex);
}
 
// likely() macro for branch prediction hint
#define likely(x)   __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)

Measuring Fast Path Performance:

Mutex Fast Path Latency (Uncontended)
Operation	Cycles (x86-64)	Time @ 3GHz
Lock (CAS success)	15-25	5-8 ns
Unlock (CAS success)	10-20	3-7 ns
Lock+Unlock round trip	25-45	8-15 ns
For comparison: function call	5-10	2-3 ns
For comparison: L3 cache miss	40-60	13-20 ns

Inlining Critical Paths

Avoiding Unnecessary Work:

Fast-path design principles:

No branches in common case: Use CAS return value directly as success indicator
Minimal memory accesses: Only touch the state variable in fast path
No kernel calls: Everything stays in user space for uncontended case
No memory allocation: All structures pre-allocated
Cache-friendly: State variable should be in caller's cache

Slow Path: Spinning, Blocking, and Wake-up

Adaptive Spinning:

Before blocking, it often makes sense to spin briefly. If the lock holder is actively running and about to release, spinning can be faster than the context switch overhead of blocking and waking.

adaptive_spin.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
// Slow path with adaptive spinning
noinline void mutex_lock_slowpath(mutex_t *mutex) {
    int spins = 0;
    const int MAX_SPINS = 100;  // Tunable parameter
    
    // Phase 1: Adaptive spinning
    while (spins < MAX_SPINS) {
        // Check if lock is available
        if (atomic_load_explicit(&mutex->state, memory_order_relaxed) == 0) {
            // Try to acquire
            if (atomic_compare_exchange_weak_explicit(
                    &mutex->state, &(int){0}, 1,
                    memory_order_acquire, memory_order_relaxed)) {
                mutex->owner = current_thread();
                return;  // Acquired during spin!
            }
        }
        
        // Check if holder is still running (adaptive part)
        if (mutex->owner && !is_running(mutex->owner)) {
            // Holder is not on CPU - blocking is better than spinning
            break;
        }
        
        cpu_relax();
        spins++;
    }
    
    // Phase 2: Prepare to block
    mutex_waiter_t waiter;
    waiter.thread = current_thread();
    waiter.flags = 0;
    
    // Add ourselves to wait queue
    spin_lock(&mutex->wait_lock);
    
    // Double-check: lock might have been released
    if (atomic_load_explicit(&mutex->state, memory_order_relaxed) == 0) {
        if (atomic_compare_exchange_strong_explicit(
                &mutex->state, &(int){0}, 1,
                memory_order_acquire, memory_order_relaxed)) {
            spin_unlock(&mutex->wait_lock);
            mutex->owner = current_thread();
            return;  // Acquired after taking wait_lock
        }
    }
    
    // Add to queue and mark that we're waiting
    list_add_tail(&waiter.node, &mutex->wait_queue);
    
    // Set state to indicate waiters
    int expected = 1;
    atomic_compare_exchange_strong(&mutex->state, &expected, 2);
    
    spin_unlock(&mutex->wait_lock);
    
    // Phase 3: Block and wait
    while (1) {
        // Sleep until woken
        scheduler_sleep(&waiter);
        
        // Woken up - try to acquire
        spin_lock(&mutex->wait_lock);
        
        // We might have been woken spuriously or lost race
        if (atomic_load(&mutex->state) == 0) {
            if (atomic_compare_exchange_strong(
                    &mutex->state, &(int){0}, 
                    list_empty(&mutex->wait_queue) ? 1 : 2,
                    memory_order_acquire, memory_order_relaxed)) {
                list_del(&waiter.node);
                spin_unlock(&mutex->wait_lock);
                mutex->owner = current_thread();
                return;  // Finally acquired!
            }
        }
        
        spin_unlock(&mutex->wait_lock);
        // Race lost, go back to sleep
    }
}

Adaptive Spinning Complexity

Unlock and Wake-up:

The unlock slow path must efficiently wake exactly one waiter (or all waiters, depending on design):

mutex_unlock_slowpath.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Unlock slow path: wake a waiter
noinline void mutex_unlock_slowpath(mutex_t *mutex) {
    mutex_waiter_t *waiter = NULL;
    
    spin_lock(&mutex->wait_lock);
    
    // Get first waiter from queue
    if (!list_empty(&mutex->wait_queue)) {
        waiter = list_first_entry(&mutex->wait_queue, 
                                  mutex_waiter_t, node);
        list_del(&waiter->node);
    }
    
    // Update state based on remaining waiters
    if (list_empty(&mutex->wait_queue)) {
        // No more waiters - set to unlocked or locked depending on handoff
        atomic_store_explicit(&mutex->state, 0, memory_order_release);
    } else {
        // Still have waiters - keep state=2
        // Someone else will need to unlock again
    }
    
    spin_unlock(&mutex->wait_lock);
    
    // Wake the waiter outside the lock
    // This avoids "thundering herd" within the wait_lock critical section
    if (waiter) {
        scheduler_wake(waiter->thread);
    }
}

Wake Strategies:

There are several approaches to waking waiters:

Wake-one (default): Wake the first waiter. They compete with new arrivals.
Handoff: Transfer ownership directly to the waiter, preventing starvation.
Wake-all: Wake all waiters on unlock (causes thundering herd).
Batched wake: Wake a few waiters, balancing responsiveness and overhead.

Wake Strategy Comparison
Strategy	Fairness	Throughput	Latency	Use Case
Wake-one	Low	High	Variable	General purpose
Handoff	High	Medium	Predictable	Fairness critical
Wake-all	High-ish	Low	Low (first)	Broadcast signals
Batched	Medium	High	Medium	High contention

Fairness and Starvation Prevention

The Fairness Problem:

Consider a workload where threads T1-T10 all need the lock. With a basic TAS lock:

T1 might acquire 1000 times
T9 might acquire 10 times
T10 might wait forever (starvation)

This unfairness arises because TAS doesn't remember who arrived first—it's a free-for-all.

Starvation in Production

Ticket Locks: Fair TAS-based Locking:

Ticket locks use TAS to implement a queueing discipline. The idea is borrowed from bakery ticket systems: take a number, wait for your number to be called.

ticket_lock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Ticket Lock: Fair FIFO ordering using TSL
typedef struct {
    atomic_uint next_ticket;  // Next ticket to be issued
    atomic_uint now_serving;  // Ticket currently being served
} ticket_lock_t;
 
#define TICKET_LOCK_INIT { .next_ticket = 0, .now_serving = 0 }
 
void ticket_lock(ticket_lock_t *lock) {
    // Atomically get our ticket number
    unsigned int my_ticket = atomic_fetch_add_explicit(
        &lock->next_ticket, 1, memory_order_relaxed);
    
    // Wait until our number is called
    while (atomic_load_explicit(
            &lock->now_serving, memory_order_acquire) != my_ticket) {
        cpu_relax();
    }
    // Lock acquired!
}
 
void ticket_unlock(ticket_lock_t *lock) {
    // Advance to next ticket
    atomic_fetch_add_explicit(
        &lock->now_serving, 1, memory_order_release);
}
 
// Analysis:
// - FIFO fairness: threads served in arrival order
// - No starvation: bounded wait (at most N-1 threads ahead)
// - Simple implementation
//
// Drawbacks:
// - All waiters spin on same cache line (now_serving) - not scalable
// - More expensive than basic TAS in uncontended case

MCS Locks: Scalable Fair Locking:

The MCS lock (Mellor-Crummey and Scott, 1991) provides fairness while eliminating the cache line bouncing problem of ticket locks. Each waiter spins on its own private variable.

mcs_lock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
// MCS Lock: Scalable, fair spinlock
// Each waiter spins on a local variable, eliminating cache line bouncing
 
typedef struct mcs_node {
    struct mcs_node *next;
    atomic_int locked;  // 0 = has lock, 1 = waiting
} mcs_node_t;
 
typedef struct {
    atomic_ptr tail;  // Tail of queue (or NULL if unlocked)
} mcs_lock_t;
 
void mcs_lock(mcs_lock_t *lock, mcs_node_t *node) {
    node->next = NULL;
    atomic_store(&node->locked, 1);  // We're waiting
    
    // Atomically add ourselves to the queue
    mcs_node_t *prev = atomic_exchange(&lock->tail, node);
    
    if (prev == NULL) {
        // Queue was empty - we have the lock!
        return;
    }
    
    // Link ourselves to predecessor
    prev->next = node;
    
    // Spin on OUR OWN locked flag (not shared!)
    while (atomic_load_explicit(&node->locked, memory_order_acquire)) {
        cpu_relax();
    }
    // Predecessor woke us - we have the lock
}
 
void mcs_unlock(mcs_lock_t *lock, mcs_node_t *node) {
    // Check if we're the only one
    if (node->next == NULL) {
        // Try to atomically clear the tail
        mcs_node_t *expected = node;
        if (atomic_compare_exchange_strong(&lock->tail, &expected, NULL)) {
            // Success - lock is now free
            return;
        }
        // Someone is in the process of queueing behind us
        // Wait for them to link
        while (node->next == NULL) {
            cpu_relax();
        }
    }
    
    // Wake our successor by clearing their locked flag
    atomic_store_explicit(&node->next->locked, 0, memory_order_release);
}
 
// Usage requires caller to provide per-thread mcs_node:
// thread_local mcs_node_t my_node;
// mcs_lock(&lock, &my_node);
// ... critical section ...
// mcs_unlock(&lock, &my_node);

Fair Lock Comparison
Lock Type	Fairness	Cache Behavior	Per-Thread State	Complexity
TAS Spinlock	None	All spin on one line	None	Simple
Ticket Lock	FIFO	All spin on one line	O(1)	Simple
MCS Lock	FIFO	Spin on private line	O(N) total	Moderate
CLH Lock	FIFO	Spin on pred's line	O(N) total	Moderate

Linux qspinlock

Real-World Mutex Implementations

Let's examine how major operating systems and libraries implement mutexes, seeing how they balance the competing concerns we've discussed.

Linux Kernel Mutex:

The Linux kernel mutex (not to be confused with futex) is a sleeping lock optimized for the uncontended case.

linux_mutex_simplified.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Simplified Linux kernel mutex implementation
// Actual implementation is in kernel/locking/mutex.c
 
struct mutex {
    atomic_long_t owner;      // Owner task + flags in low bits
    spinlock_t wait_lock;     // Protects wait_list
    struct list_head wait_list;
    #ifdef CONFIG_DEBUG_MUTEXES
    void *magic;
    struct lockdep_map dep_map;
    #endif
};
 
// Flags stored in low bits of 'owner' pointer
#define MUTEX_FLAG_WAITERS    0x01
#define MUTEX_FLAG_HANDOFF    0x02
#define MUTEX_FLAG_PICKUP     0x04
 
void mutex_lock(struct mutex *lock) {
    // Fast path: single atomic if uncontended
    if (!__mutex_trylock_fast(lock)) {
        // Slow path: optimistic spin or block
        __mutex_lock_slowpath(lock);
    }
}
 
static __always_inline bool __mutex_trylock_fast(struct mutex *lock) {
    // Attempt CAS: NULL -> current_task
    unsigned long curr = (unsigned long)current;
    unsigned long zero = 0UL;
    
    return atomic_long_try_cmpxchg_acquire(&lock->owner, &zero, curr);
}
 
void __mutex_lock_slowpath(struct mutex *lock) {
    // 1. Optimistic spinning (MCS-style queue for spinning)
    //    Only if owner is running on a CPU
    
    // 2. If spinning fails or not appropriate, block
    //    Add to wait_list, set WAITERS flag, sleep
    
    // 3. Handle handoff for fairness
    //    After sleeping too long, request handoff
}

POSIX pthread_mutex:

POSIX threads provide pthread_mutex_t with configurable behavior:

pthread_mutex_usage.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// POSIX mutex with various configurations
#include <pthread.h>
 
// Normal mutex (default)
pthread_mutex_t normal_mutex = PTHREAD_MUTEX_INITIALIZER;
 
// Error-checking mutex (detects deadlock attempts)
pthread_mutex_t errorcheck_mutex;
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ERRORCHECK);
pthread_mutex_init(&errorcheck_mutex, &attr);
 
// Recursive mutex (same thread can lock multiple times)
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
pthread_mutex_init(&recursive_mutex, &attr);
 
// Robust mutex (handles owner death)
pthread_mutexattr_setrobust(&attr, PTHREAD_MUTEX_ROBUST);
 
// Priority inheritance (prevents priority inversion)
pthread_mutexattr_setprotocol(&attr, PTHREAD_PRIO_INHERIT);
 
// Lock operations
int rc = pthread_mutex_lock(&normal_mutex);      // Block until acquired
rc = pthread_mutex_trylock(&normal_mutex);       // Non-blocking attempt
struct timespec ts;
rc = pthread_mutex_timedlock(&normal_mutex, &ts); // Timeout
 
// Unlock
pthread_mutex_unlock(&normal_mutex);
 
// Error checking examples:
// PTHREAD_MUTEX_ERRORCHECK:
//   - Returns EDEADLK if thread tries to lock mutex it owns
//   - Returns EPERM if thread tries to unlock mutex it doesn't own

Windows Critical Sections:

Windows provides CRITICAL_SECTION, a hybrid spinlock/mutex:

windows_critical_section.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Windows Critical Section
#include <windows.h>
 
CRITICAL_SECTION cs;
 
// Initialize with spin count
// SpinCount: number of spins before blocking (0-4096)
// Higher on multi-processor systems
InitializeCriticalSectionAndSpinCount(&cs, 4000);
 
// Acquire
EnterCriticalSection(&cs);
// or non-blocking:
if (TryEnterCriticalSection(&cs)) {
    // acquired
}
 
// Critical section work...
 
// Release
LeaveCriticalSection(&cs);
 
// Cleanup
DeleteCriticalSection(&cs);
 
// Slim Reader/Writer Lock (SRW Lock) - lighter weight, no spin count
SRWLOCK srw = SRWLOCK_INIT;
AcquireSRWLockExclusive(&srw);
ReleaseSRWLockExclusive(&srw);
 
// Internal structure (simplified):
typedef struct _RTL_CRITICAL_SECTION {
    PRTL_CRITICAL_SECTION_DEBUG DebugInfo;
    LONG LockCount;        // -1 = free, 0+ = locked + waiters
    LONG RecursionCount;   // Recursive acquisition count
    HANDLE OwningThread;   // Owner thread ID
    HANDLE LockSemaphore;  // Blocking mechanism (auto-allocated)
    ULONG_PTR SpinCount;   // Spin count before blocking
} RTL_CRITICAL_SECTION;

Modern Windows: SRW Locks

Priority Inversion and Mitigation

The Mars Pathfinder Bug:

Converting Mermaid diagram...

Priority Inheritance Protocol:

When a high-priority thread blocks on a mutex, temporarily boost the lock holder's priority to match. This prevents medium-priority threads from preempting the holder.

priority_inheritance.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Priority Inheritance implementation sketch
 
void mutex_lock_with_pi(mutex_t *mutex) {
    if (!try_acquire(mutex)) {
        // We're blocked - boost owner's priority
        struct thread *owner = mutex->owner;
        int my_priority = current_thread()->priority;
        
        if (owner->priority < my_priority) {
            // Boost owner to our priority
            boost_priority(owner, my_priority);
        }
        
        // Now block
        block_on_mutex(mutex);
    }
    // Acquired! (priority boosting cleaned up by unlock)
}
 
void mutex_unlock_with_pi(mutex_t *mutex) {
    // Restore original priority (if we were boosted)
    if (current_thread()->base_priority != current_thread()->priority) {
        restore_priority(current_thread());
    }
    
    // Wake highest-priority waiter
    struct thread *waiter = get_highest_priority_waiter(mutex);
    release_lock(mutex);
    if (waiter) {
        wake(waiter);
    }
}

Priority Inversion Solutions

•Priority Inheritance: Lock holder inherits the highest priority of any blocked waiter. Automatic, reactive, can cascade through chains of locks.
•Priority Ceiling: Each mutex has a ceiling priority (max of all threads that might use it). Holder runs at ceiling priority. Prevents blocking chains but requires manual configuration.
•Random Boosting (Windows): Occasionally boost random threads. Probabilistically resolves inversions. Simple but unpredictable.
•Avoid Shared Locks: Redesign to eliminate sharing between priority levels. Best solution when possible.

Real-Time Systems Require Care

Summary: Complete Lock Implementation

Key Takeaways

•Mutex architecture — Complete mutexes combine state tracking, wait queues, blocking/waking mechanisms, and ownership management beyond simple spinning.
•Fast path is critical — Uncontended cases dominate; optimize for single CAS success. Keep slow path out of line.
•Adaptive spinning — Brief spinning before blocking can outperform pure blocking when lock hold times are short and holders are running.
•Fairness requires explicit design — TAS provides no ordering. Ticket locks and MCS locks add FIFO fairness with different scalability characteristics.
•Real implementations are sophisticated — Linux mutex, pthread_mutex, and Windows Critical Section all combine multiple techniques.
•Priority inversion is dangerous — Mixed-priority systems sharing locks need inheritance or ceiling protocols to maintain scheduling guarantees.

What's Next:

Page Complete

4 / 5