Loading learning content...
The spinlocks we've studied so far share a critical flaw: unfairness. When a lock is released, any waiting thread might acquire it—there's no guarantee that the longest-waiting thread will succeed. Under high contention, this leads to starvation, where some threads wait indefinitely while others repeatedly acquire and release the lock.
Ticket locks solve this elegantly by borrowing a real-world concept: the numbered ticket system used at bakeries and deli counters. Take a ticket, wait for your number to be called. Simple, fair, and deterministic.
This page explores ticket locks in depth—their elegant design, implementation details, correctness guarantees, and the foundation they provide for even more sophisticated locking mechanisms.
By the end of this page, you will understand: the fairness problem with traditional spinlocks; how ticket locks guarantee FIFO ordering; detailed implementation with atomic operations; performance characteristics and cache behavior; ticket lock limitations and when to use them; and the evolution toward MCS and other queued locks.
Let's examine why traditional spinlocks are unfair and why this matters.
With a test-and-set spinlock, lock release triggers a race. All waiting threads observe the lock becoming free (via cache invalidation) and attempt to acquire it. The winner is determined by:
None of these factors relate to how long each thread has been waiting. The result is random: a thread that just arrived might win over one that's been waiting for milliseconds.
123456789101112131415161718192021222324
# Unfair spinlock behavior under contention Timeline:T=0: Thread A acquires lockT=1ms: Thread B starts waitingT=2ms: Thread C starts waitingT=3ms: Thread D starts waitingT=4ms: Thread A releases lock # All threads race for the lock# Due to cache topology, Thread D's CPU happens to# get exclusive access first T=4ms: Thread D acquires lock (waited 1ms)T=5ms: Thread D releases lock # Thread B races again... but loses to newcomer Thread E T=5ms: Thread E arrives and immediately acquires (waited 0ms!) Thread B now waited 4ms Thread C waited 3ms # Threads B and C can starve indefinitely if new arrivals# are favored by cache topology1. Bounded Wait Times: Without fairness guarantees, worst-case wait time is unbounded. A thread might wait forever if it's systematically unlucky.
2. Predictability: Real-time and deadline-sensitive systems require predictable latency. Starvation is unpredictable.
3. Progress Guarantees: Formal correctness often requires "bounded waiting"—a thread cannot be bypassed indefinitely. Unfair locks violate this.
4. Cache Behavior: Unfair locks can exhibit NUMA bias (threads on certain nodes are favored), creating implicit performance tiers.
On NUMA systems, threads on the same node as the lock's memory have lower latency access. They consistently win races against remote threads. The result: remote threads starve while local threads dominate. This hidden unfairness can cause severe performance disparities—certain processes run fast while others crawl.
Ticket locks enforce First-In-First-Out (FIFO) ordering through a simple mechanism:
Two counters:
Lock acquisition:
next_ticket to get your unique ticket numbernow_serving equals your ticketLock release:
now_serving to call the next ticketThe key insight: Each thread gets a unique, monotonically increasing ticket number. The order of acquisition is determined at arrival time, not at release time. No racing—just orderly waiting.
1234567891011121314151617181920212223242526272829303132
# Ticket lock operation visualization Initial state: next_ticket = 0, now_serving = 0 (lock is free) T=0: Thread A arrives A does: my_ticket = fetch_add(next_ticket) → gets 0 now_serving = 0 = my_ticket, so A enters critical section State: next_ticket = 1, now_serving = 0 T=1: Thread B arrives B does: my_ticket = fetch_add(next_ticket) → gets 1 now_serving = 0 ≠ 1, so B spins State: next_ticket = 2, now_serving = 0 T=2: Thread C arrives C does: my_ticket = fetch_add(next_ticket) → gets 2 now_serving = 0 ≠ 2, so C spins State: next_ticket = 3, now_serving = 0 T=3: Thread A releases A does: now_serving++ → becomes 1 B sees now_serving = 1 = my_ticket, B enters C still spins (now_serving = 1 ≠ 2) T=4: Thread B releases B does: now_serving++ → becomes 2 C sees now_serving = 2 = my_ticket, C enters # FIFO order guaranteed: A → B → C, exactly as they arrivedNo thundering herd: When the lock is released, only one thread (the next in line) observes its condition satisfied. Others continue spinning on their own condition.
Guaranteed fairness: FIFO ordering is intrinsic to the design. Cannot be violated regardless of cache topology, NUMA effects, or timing.
Simple implementation: Two atomic counters and two operations (fetch-add, load). No complex data structures.
Deterministic wait time: With N threads waiting, you will acquire the lock after exactly N releases. Bounded waiting is guaranteed.
Ticket locks were introduced by Mellor-Crummey and Scott in their influential 1991 paper 'Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors.' The same paper introduced MCS locks, which we'll touch on later. These algorithms remain foundational to modern lock design.
Let's build a complete ticket lock implementation, examining each design decision.
12345678910111213141516171819202122232425262728293031323334353637383940
#include <stdatomic.h> typedef struct { atomic_uint next_ticket; // Next ticket to be dispensed atomic_uint now_serving; // Currently held/served ticket} ticket_lock_t; // Initialize to zero (lock is free when next_ticket == now_serving)#define TICKET_LOCK_INIT { ATOMIC_VAR_INIT(0), ATOMIC_VAR_INIT(0) } void ticket_lock(ticket_lock_t *lock) { // Step 1: Get our ticket number (atomic increment) // fetch_add returns the OLD value, then increments unsigned int my_ticket = atomic_fetch_add_explicit( &lock->next_ticket, 1, memory_order_relaxed); // Step 2: Wait until it's our turn // Spin until now_serving matches our ticket while (atomic_load_explicit( &lock->now_serving, memory_order_acquire) != my_ticket) { // PAUSE for efficient spinning __builtin_ia32_pause(); } // Our ticket is now being served - we hold the lock! // The acquire ordering ensures we see all prior writes} void ticket_unlock(ticket_lock_t *lock) { // Call the next ticket // Just increment now_serving unsigned int current = atomic_load_explicit( &lock->now_serving, memory_order_relaxed); atomic_store_explicit( &lock->now_serving, current + 1, memory_order_release); // The release ordering ensures our writes are visible // before the next holder proceeds}Ticket acquisition (fetch_add): Uses relaxed ordering because:
Spin loop (load): Uses acquire ordering because:
Release (store): Uses release ordering because:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
// Optimized ticket lock with proportional backoff #include <stdatomic.h> typedef struct { _Alignas(64) atomic_uint next_ticket; // Separate cache line _Alignas(64) atomic_uint now_serving; // Separate cache line} ticket_lock_t; // Proportional backoff: wait longer if we're further back in queuevoid ticket_lock_optimized(ticket_lock_t *lock) { unsigned int my_ticket = atomic_fetch_add_explicit( &lock->next_ticket, 1, memory_order_relaxed); // Read current serving number unsigned int serving = atomic_load_explicit( &lock->now_serving, memory_order_relaxed); while (serving != my_ticket) { // Calculate how many threads are ahead of us unsigned int ahead = my_ticket - serving; // Proportional backoff: wait proportional to position in queue // Avoid checking too frequently when we're far back for (unsigned int i = 0; i < ahead * 100; i++) { __builtin_ia32_pause(); } // Check again with acquire ordering (needed when we're next) if (ahead == 1) { serving = atomic_load_explicit( &lock->now_serving, memory_order_acquire); } else { serving = atomic_load_explicit( &lock->now_serving, memory_order_relaxed); } }} void ticket_unlock_optimized(ticket_lock_t *lock) { // Increment with release ordering atomic_fetch_add_explicit( &lock->now_serving, 1, memory_order_release);}The optimized version places next_ticket and now_serving on separate cache lines (using _Alignas(64)). This prevents false sharing: when threads increment next_ticket, they don't invalidate the cache line containing now_serving that other threads are spinning on.
Let's formally verify that ticket locks satisfy the three properties required of any correct lock: mutual exclusion, progress, and bounded waiting.
Claim: At most one thread can be in the critical section at any time.
Proof:
now_serving == my_ticketnow_serving has a single value at any instant, at most one thread's condition can be satisfiedQED ∎
Claim: If no thread is in the critical section and threads are waiting, some thread will enter.
Proof:
now_serving holds the ticket of the next eligible threadnow_serving matches its ticketQED ∎
Claim: Every thread that wants entry will enter within a bounded number of lock acquisitions by other threads.
Proof:
now_serving equals N, and the thread entersQED ∎
Stronger guarantee: Ticket locks provide FIFO ordering, which is stricter than just bounded waiting. Not only is wait time bounded, but the order is deterministic—first-come, first-served.
What if ticket numbers overflow? With 32-bit unsigned integers, overflow wraps to 0. As long as the number of concurrent waiters is much less than 2^32, the arithmetic still works correctly due to modular arithmetic. The comparison (now_serving == my_ticket) works because both wrap equally. In practice, 32 bits is more than sufficient.
Ticket locks have different performance characteristics than TAS/TTAS spinlocks. Let's analyze them.
Lock acquisition (fetch_add on next_ticket):
next_ticketSpinning (reading now_serving):
now_servingLock release (incrementing now_serving):
now_serving, invalidating all spinners' cache lines| Phase | TAS Lock | TTAS Lock | Ticket Lock |
|---|---|---|---|
| Acquisition attempt | O(1) | O(1) | O(1) |
| Per spin iteration | O(N) coherence msgs | O(1) local read | O(1) local read |
| During hold (total) | O(N²) msgs | O(1) total | O(1) total |
| On release | O(N) msgs + race | O(N) msgs + race | O(N) msgs, no race |
Ticket locks have a subtle scalability issue: all threads spin on the same memory location (now_serving). When the lock is released:
now_servingWith many waiters, this creates O(N) cache traffic per release. On systems with hundreds of cores, this limits scalability.
This is the motivation for MCS locks: Each thread spins on a separate, thread-local memory location. Lock handoff updates only the next thread's location, achieving O(1) cache traffic per release.
12345678910111213141516171819202122
# Cache behavior comparison during lock handoff TICKET LOCK (N=8 waiters):Release: Write to now_serving → Invalidate 8 cache lines (one per waiter) → All 8 fetch new value → Thread 2 sees its ticket, enters critical section → Threads 3-8 see wrong ticket, continue spinning Total cache traffic: 8 invalidations + 8 fetches = 16 operations MCS LOCK (N=8 waiters):Each thread spins on its own queue node Release: Write to next thread's node → Invalidate 1 cache line (only next thread's) → Thread 2 sees its flag set, enters critical section → Threads 3-8 never see traffic (spinning locally) Total cache traffic: 1 invalidation + 1 fetch = 2 operations Result: MCS scales O(1), ticket scales O(N) per releaseOn very large systems (64+ cores), highly contended ticket locks show poor scalability due to the O(N) cache traffic. For such scenarios, MCS locks, qspinlocks, or hierarchical locks designed for the NUMA topology are preferred. Ticket locks remain excellent for moderate core counts and moderate contention.
The Linux kernel's spinlock implementation has evolved through several generations, each addressing limitations of the previous.
Phase 1 (early Linux): Simple TAS spinlock. Unfair, poor scalability.
Phase 2 (2.6.x): Ticket spinlock. Fair, but O(N) cache traffic per release.
Phase 3 (4.2+): Queued spinlock (qspinlock). Combines ticket-lock fairness with MCS-lock scalability.
qspinlock uses a clever hybrid approach:
12345678910111213141516171819202122232425262728293031323334
// Simplified qspinlock concept (actual implementation is more complex) // The lock word packs multiple fields:// - Locked bit: is the lock held?// - Pending bit: is there exactly one waiter (before forming queue)?// - Tail: index to MCS queue tail for multiple waiters typedef struct { union { atomic_uint val; struct { uint8_t locked; // Lock is held uint8_t pending; // One thread waiting (fast path) uint16_t tail; // MCS queue tail (slow path) }; };} qspinlock_t; void qspin_lock(qspinlock_t *lock) { // Fast path: lock is free, no waiters // Single CAS: 0 -> locked if (atomic_compare_exchange_strong(&lock->val, &(unsigned){0}, 1)) { return; // Got it immediately! } // Medium path: one waiter uses pending bit // Set pending, spin on locked, then acquire // ... // Slow path: multiple waiters form MCS queue // Each spins on local queue node, not shared memory // ...}1. Size: qspinlock fits in 4 bytes, same as ticket lock's minimum. MCS lock requires per-CPU queue nodes.
2. Uncontended performance: Single atomic CAS for acquisition when uncontended—as fast as a simple spinlock.
3. Contended scalability: Under high contention, forms an MCS queue where each thread spins on its own cache line. O(1) cache traffic per handoff.
4. NUMA awareness: Queue formation can prefer local waiters, reducing cross-node traffic.
5. Lock stealing prevention: Unlike pure MCS, prevents certain race conditions during PV (paravirtualized) operation.
| Generation | Type | Fairness | Scalability | Size |
|---|---|---|---|---|
| Early (< 2.6) | TAS | None | Poor | 4 bytes |
| 2.6.25 - 4.1 | Ticket | FIFO | O(N) traffic | 4 bytes |
| 4.2+ | qspinlock | FIFO | O(1) traffic | 4 bytes |
qspinlock achieves the holy grail of lock design: simple case fast (single CAS), fairness guaranteed (FIFO ordering), scalable under contention (O(1) cache traffic), and compact representation (4 bytes). The complexity is in the code, not the interface—callers just use spin_lock() and spin_unlock().
When should you use ticket locks in your own code?
1. Moderate contention, fairness required: When you need FIFO guarantees but don't expect hundreds of concurrent waiters.
12345678910111213141516
// Good use case: fair access to limited resource ticket_lock_t resource_lock = TICKET_LOCK_INIT; void access_limited_resource(int thread_id) { // Fair ordering ensures no thread starves ticket_lock(&resource_lock); printf("Thread %d accessing resource (waited fair turn)\n", thread_id); use_resource(); ticket_unlock(&resource_lock);} // All threads get their turn in arrival order// No priority inversion (except from scheduler decisions)2. Reader-writer hint: Ticket locks naturally support reader-writer extensions:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
// Ticket-based reader-writer lock (simplified concept) typedef struct { atomic_uint next_ticket; atomic_uint now_serving; atomic_uint active_readers;} ticket_rwlock_t; void read_lock(ticket_rwlock_t *lock) { unsigned int my_ticket = atomic_fetch_add(&lock->next_ticket, 1); // Wait for our turn while (atomic_load(&lock->now_serving) != my_ticket) { __builtin_ia32_pause(); } // Increment reader count atomic_fetch_add(&lock->active_readers, 1); // Allow next ticket holder to proceed // Writers will wait for active_readers == 0 atomic_fetch_add(&lock->now_serving, 1);} void read_unlock(ticket_rwlock_t *lock) { atomic_fetch_sub(&lock->active_readers, 1);} void write_lock(ticket_rwlock_t *lock) { unsigned int my_ticket = atomic_fetch_add(&lock->next_ticket, 1); // Wait for our turn while (atomic_load(&lock->now_serving) != my_ticket) { __builtin_ia32_pause(); } // Wait for all active readers to finish while (atomic_load(&lock->active_readers) > 0) { __builtin_ia32_pause(); } // Now we hold exclusive access // Don't increment now_serving until write_unlock} void write_unlock(ticket_rwlock_t *lock) { atomic_fetch_add(&lock->now_serving, 1);}Most threading libraries don't expose ticket locks directly—they use adaptive mutexes. For kernel development, Linux exposes spinlock semantics that use qspinlock internally. If you need explicit fairness in userspace, consider libraries like 'folly' (Facebook's) or implement ticket locks carefully with proper atomic semantics.
Ticket locks represent a significant step forward in spinlock design, introducing fairness guarantees through an elegant ticket-counter mechanism. Let's consolidate the key insights:
Module Complete:
Congratulations! You've completed Module 1: Spinlocks. You've journeyed from the fundamental concept of busy waiting through spinlock implementation, appropriate use scenarios, spin tuning, and finally to fair ticket locks.
You now have a comprehensive understanding of spinlock-based synchronization—the foundation for all other synchronization primitives in operating systems.
You've mastered spinlocks: busy waiting fundamentals, TAS and TTAS implementations, when to use spinlocks, spin timing and backoff strategies, and fairness-guaranteeing ticket locks. This knowledge prepares you for the next modules: hardware atomic operations (Test-and-Set, Compare-and-Swap) and higher-level primitives like mutexes, semaphores, and condition variables.