Loading learning content...
If there is one atomic operation that tower above all others in importance, it is Compare-and-Swap (CAS). This single primitive is the foundation upon which all lock-free programming is built.
CAS has a remarkable property: it is universal. Given only CAS and atomic load/store, you can implement any other atomic operation, and more importantly, you can implement any lock-free or wait-free data structure possible. This isn't just theoretical—it's a proven result in concurrency theory.
Understanding CAS deeply is essential for any engineer who wants to:
This page provides that understanding.
This page covers CAS mechanics in detail: how it works, its variants (weak vs strong CAS), common usage patterns, the notorious ABA problem and its solutions, and how CAS enables the entire field of lock-free programming.
CAS atomically performs the following operation:
compare_and_swap(address, expected, desired):
atomic {
if (*address == expected) {
*address = desired
return true // Success: we updated the value
} else {
return false // Failure: value was different
}
}
The three operands:
The semantics:
expected, replace it with desired and return success1234567891011121314151617181920212223242526272829303132333435
// Java: AtomicInteger.compareAndSetAtomicInteger counter = new AtomicInteger(10);boolean success = counter.compareAndSet(10, 20);// If counter was 10, it's now 20 and success is true// If counter was not 10, it's unchanged and success is false // Modern Java also has compareAndExchange (returns old value):int oldValue = counter.compareAndExchange(20, 30);// oldValue is the value BEFORE the operation// If oldValue == 20, the swap succeeded // C++11: std::atomic<T>::compare_exchange_strongstd::atomic<int> counter{10};int expected = 10;bool success = counter.compare_exchange_strong(expected, 20);// If counter was 10, it's now 20 and success is true// If counter was not 10, 'expected' is updated to actual value // C++: compare_exchange_weak (may fail spuriously)while (!counter.compare_exchange_weak(expected, expected + 1)) { // On failure, 'expected' is updated; retry with new value} // Rust: compare_exchangeuse std::sync::atomic::{AtomicI32, Ordering};let counter = AtomicI32::new(10);match counter.compare_exchange(10, 20, Ordering::SeqCst, Ordering::SeqCst) { Ok(old) => println!("Swapped, old value: {}", old), Err(actual) => println!("Failed, actual: {}", actual),} // Go: sync/atomic.CompareAndSwapInt64import "sync/atomic"var counter int64 = 10swapped := atomic.CompareAndSwapInt64(&counter, 10, 20)The Critical Insight: Optimistic Concurrency
CAS embodies optimistic concurrency: we assume no conflict and proceed, detecting conflicts only when we try to commit. This contrasts with pessimistic concurrency (locks), where we acquire exclusive access before doing anything.
The optimistic approach has two key advantages:
Think of CAS as a single-word transaction. You read a value, compute what you want to change it to, then atomically commit the change—but only if nothing modified it while you were computing. If something did, your 'transaction' aborts, and you retry.
Since CAS can fail (when another thread modified the value), most CAS-based algorithms use a retry loop. This is the fundamental pattern of lock-free programming:
do {
oldValue = load(address)
newValue = compute(oldValue)
} while (!CAS(address, oldValue, newValue))
The loop guarantees eventual success:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
// ATOMIC INCREMENT (how atomic.fetchAdd works internally)function atomicIncrement(counter: Atomic<number>): number { let oldValue: number; let newValue: number; do { oldValue = counter.load(); newValue = oldValue + 1; } while (!counter.compareAndSet(oldValue, newValue)); return oldValue; // Return old value like fetchAdd does} // ATOMIC MAX (update only if new value is greater)function atomicMax(value: Atomic<number>, candidate: number): number { let current: number; do { current = value.load(); if (candidate <= current) { return current; // Candidate not greater; no update needed } } while (!value.compareAndSet(current, candidate)); return candidate; // Successfully updated to the new max} // ATOMIC UPDATE WITH ARBITRARY FUNCTIONfunction atomicUpdate<T>( reference: AtomicReference<T>, updateFn: (current: T) => T): T { let current: T; let next: T; do { current = reference.load(); next = updateFn(current); } while (!reference.compareAndSet(current, next)); return next;} // Usage: atomically double a valueatomicUpdate(someAtomic, val => val * 2); // LOCK-FREE COUNTER WITH BOUNDED RETRIESfunction incrementWithLimit( counter: Atomic<number>, maxRetries: number = 100): boolean { for (let retry = 0; retry < maxRetries; retry++) { const current = counter.load(); if (counter.compareAndSet(current, current + 1)) { return true; // Success } // Optionally add exponential backoff here } return false; // Failed after max retries}Under extreme contention, CAS loops can retry many times. While the system makes progress (some thread always succeeds), individual threads might be delayed. Consider bounded retries with fallback to locks, or exponential backoff, for high-contention scenarios.
CAS Loop with C++ compare_exchange
C++ has a particularly elegant pattern because compare_exchange_* updates the expected value on failure:
1234567891011121314151617181920212223242526
#include <atomic> std::atomic<int> counter{0}; // Increment using CAS loopvoid increment() { int expected = counter.load(); // On failure, expected is automatically updated to actual value while (!counter.compare_exchange_weak(expected, expected + 1)) { // No need to reload; expected already has the current value }} // Generic update functiontemplate<typename T, typename F>T atomic_update(std::atomic<T>& atom, F update_fn) { T expected = atom.load(); T desired; do { desired = update_fn(expected); } while (!atom.compare_exchange_weak(expected, desired)); return desired;} // Usageatomic_update(counter, [](int x) { return x * 2; }); // Double atomicallyC++11 and some other languages distinguish between strong and weak compare-and-exchange:
Strong CAS (compare_exchange_strong):
Weak CAS (compare_exchange_weak):
Why does weak CAS exist?
On some CPU architectures (notably ARM and POWER), the atomic CAS instruction is implemented using load-linked/store-conditional (LL/SC) instructions. LL/SC can fail spuriously if:
Strong CAS hides this complexity by retrying internally, but that retry loop has overhead. If you're already in a retry loop (which is typical for CAS usage), using weak CAS eliminates a redundant inner loop.
1234567891011121314151617181920212223242526272829303132333435
// USE WEAK CAS: When you already have a retry loopvoid increment_with_weak() { int expected = counter.load(); // Already in a while loop; weak CAS is perfect while (!counter.compare_exchange_weak(expected, expected + 1)) { // Spurious failure just means we retry, no problem }} // USE STRONG CAS: When you take different action on failurebool try_acquire_lock() { bool expected = false; // We only want ONE attempt, not a retry loop // Use strong to avoid spurious failures affecting our logic return lock_flag.compare_exchange_strong(expected, true);} // USE STRONG CAS: When failure has side effectsbool try_reserve_slot() { auto expected = slots.load(); if (expected.count >= MAX_SLOTS) { return false; // No slots available } Slots desired = expected; desired.count++; // Use strong: we log failures, don't want spurious logs if (slots.compare_exchange_strong(expected, desired)) { return true; } logContention(); // Real contention, not spurious return false;}| Aspect | compare_exchange_strong | compare_exchange_weak |
|---|---|---|
| Spurious failure | Never | Possible |
| Internal retry | Yes | No |
| Overhead | Higher on LL/SC architectures | Lower |
| Use case | Single CAS without retry loop | Inside a retry loop |
| Mental model | Simpler | Requires understanding of spurious failure |
| x86-64 implementation | Identical | Identical (no spurious failures) |
When writing CAS loops (the common case), always use weak CAS. The slight performance improvement on ARM/POWER is free, and on x86-64 there's no difference. Reserve strong CAS for single-shot attempts where spurious failure would complicate your logic.
The ABA problem is the most famous pitfall of CAS-based programming. It occurs when:
CAS only checks if the value equals the expected value, not whether the value was ever changed. If the value goes A → B → A, CAS happily accepts it as "unchanged."
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
// LOCK-FREE STACK WITH ABA VULNERABILITYclass LockFreeStack<T> { head: AtomicReference<Node<T> | null>; pop(): T | null { while (true) { const oldHead = this.head.load(); // Step 1: Read head if (oldHead === null) return null; const newHead = oldHead.next; // Prepare new head // ABA can strike here! // Between load and CAS: // - Another thread pops oldHead // - That thread pushes a NEW node at the SAME memory address // - Our CAS sees the "same" pointer and succeeds // - But newHead (oldHead.next) is now GARBAGE! if (this.head.compareAndSet(oldHead, newHead)) { return oldHead.value; } } }} // TIMELINE OF ABA DISASTER:// // Initial state: head → [A] → [B] → [C] → null//// Thread 1: oldHead = [A], newHead = [B] (pauses before CAS)//// Thread 2: pop() → removes [A], gets value// Stack is now: head → [B] → [C] → null// [A] is freed (returned to memory allocator)//// Thread 2: pop() → removes [B], gets value// Stack is now: head → [C] → null// [B] is freed//// Thread 2: push(newValue) // Allocator reuses [A]'s memory for new node!// Stack is now: head → [A'] → [C] → null// [A'] has same address as old [A], but different content!//// Thread 1: CAS(oldHead=[A], newHead=[B])// head ([A']) == expected ([A]) ← Same address!// CAS SUCCEEDS!// Stack is now: head → [B] ← FREED MEMORY!//// CRASH or data corruption followsABA is not theoretical. It's particularly dangerous in languages with manual memory management (C/C++) where freed memory can be reused. It can cause data corruption, crashes, or security vulnerabilities. Any CAS-based algorithm using pointers must address ABA.
When ABA Is NOT a Problem:
Several techniques address the ABA problem:
1. Tagged Pointers / Version Counters
Package a version counter with the pointer. Each modification increments the counter. CAS checks both pointer AND counter.
struct TaggedPointer {
ptr: *Node // The actual pointer
tag: uint64 // Version counter
}
Even if the pointer returns to the same value, the tag differs, causing CAS to correctly fail.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
// TAGGED POINTER IMPLEMENTATIONinterface TaggedPointer<T> { ptr: T | null; version: number;} class ABAFreeStack<T> { // Atomic 128-bit CAS (or simulate with wider atomic types) private head: Atomic<TaggedPointer<Node<T>>>; constructor() { this.head = new Atomic({ ptr: null, version: 0 }); } push(value: T): void { const newNode = new Node(value); while (true) { const current = this.head.load(); newNode.next = current.ptr; const desired: TaggedPointer<Node<T>> = { ptr: newNode, version: current.version + 1 // Increment version! }; if (this.head.compareAndSet(current, desired)) { return; } } } pop(): T | null { while (true) { const current = this.head.load(); if (current.ptr === null) return null; const desired: TaggedPointer<Node<T>> = { ptr: current.ptr.next, version: current.version + 1 // Increment version! }; // CAS checks both ptr AND version // Even if ptr is same address, version will differ if (this.head.compareAndSet(current, desired)) { return current.ptr.value; } } }}2. Hazard Pointers
Each thread maintains a list of "hazard pointers"—pointers it is currently using. Before freeing memory, check if any thread's hazard pointers reference it. If so, defer the free.
This guarantees that no memory is freed while any thread might be using it, eliminating the reuse that causes ABA.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
// HAZARD POINTER CONCEPT (simplified)class HazardPointer<T> { // Per-thread hazard pointer slots private static hazardPointers: Map<ThreadId, (T | null)[]> = new Map(); private static retiredNodes: T[] = []; // Protect a pointer while we're using it static protect<T>(ptr: T): void { const threadHazards = this.getOrCreateHazards(); threadHazards[0] = ptr; // Set hazard pointer // Memory fence ensures the hazard is visible memoryBarrier(); } // Release protection when done static release(): void { const threadHazards = this.getOrCreateHazards(); threadHazards[0] = null; } // Retire a node for later deletion static retire<T>(node: T): void { this.retiredNodes.push(node); // Periodically scan and reclaim if (this.retiredNodes.length > THRESHOLD) { this.reclaimSafe(); } } // Reclaim nodes not protected by any hazard pointer private static reclaimSafe(): void { // Collect all hazard pointers const allHazards = new Set<T>(); for (const hazards of this.hazardPointers.values()) { for (const hp of hazards) { if (hp !== null) allHazards.add(hp); } } // Only free nodes not in any hazard pointer this.retiredNodes = this.retiredNodes.filter(node => { if (allHazards.has(node)) { return true; // Keep; someone is using it } else { freeMemory(node); // Safe to free return false; } }); }} // Usage in lock-free stackclass SafeStack<T> { pop(): T | null { while (true) { const oldHead = this.head.load(); if (oldHead === null) return null; // Protect the node before accessing its next pointer HazardPointer.protect(oldHead); // Re-check that head hasn't changed if (this.head.load() !== oldHead) { HazardPointer.release(); continue; // Head changed, retry } const newHead = oldHead.next; // Now safe to dereference if (this.head.compareAndSet(oldHead, newHead)) { const value = oldHead.value; HazardPointer.release(); HazardPointer.retire(oldHead); // Defer deletion return value; } HazardPointer.release(); } }}3. Epoch-Based Reclamation (EBR)
Divide time into epochs (generations). Objects retired in one epoch are only freed after all threads have advanced past that epoch. This ensures no thread is using old objects when they're freed.
4. Reference Counting
Traditional reference counting prevents ABA by not freeing objects that are still referenced. However, atomic reference counting has its own overhead and challenges (cycles, performance).
5. Language/Runtime Support
Garbage-collected languages often avoid explicit ABA mitigation because the runtime delays object reclamation until no references exist. However, this is not guaranteed with pooled objects or interned values.
| Solution | Overhead | Complexity | Best For |
|---|---|---|---|
| Tagged Pointers | Medium (wider CAS) | Low | Simple structures, available 128-bit CAS |
| Hazard Pointers | Low runtime, high implementation | High | General purpose, proven correctness |
| Epoch-Based Reclamation | Low amortized | Medium | Read-heavy workloads |
| Reference Counting | High (atomic inc/dec) | Medium | When ownership is clear |
| GC/Managed Runtime | Varies | None (handled by runtime) | Java, C#, Go, etc. |
CAS is the foundation for lock-free data structures. Let's examine some classic examples:
Lock-Free Stack (Treiber Stack)
The Treiber stack is the simplest lock-free data structure—a linked list where push and pop use CAS on the head pointer:
12345678910111213141516171819202122232425262728293031323334353637383940414243
class TreiberStack<T> { private head: AtomicReference<Node<T> | null> = new AtomicReference(null); push(value: T): void { const newNode = new Node(value); while (true) { const oldHead = this.head.load(); newNode.next = oldHead; // Linearization point: CAS that succeeds if (this.head.compareAndSet(oldHead, newNode)) { return; } // CAS failed; another push/pop changed head; retry } } pop(): T | null { while (true) { const oldHead = this.head.load(); if (oldHead === null) { return null; // Stack is empty } const newHead = oldHead.next; // Linearization point: CAS that succeeds if (this.head.compareAndSet(oldHead, newHead)) { return oldHead.value; } // CAS failed; another thread modified head; retry } } peek(): T | null { const head = this.head.load(); return head?.value ?? null; } isEmpty(): boolean { return this.head.load() === null; }}Lock-Free Queue (Michael-Scott Queue)
The Michael-Scott queue is a lock-free FIFO queue using two pointers (head and tail). Enqueue appends to tail; dequeue removes from head. It's more complex than the stack because we must coordinate two pointers:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
class MichaelScottQueue<T> { private head: AtomicReference<Node<T>>; private tail: AtomicReference<Node<T>>; constructor() { // Sentinel node simplifies edge cases const sentinel = new Node<T>(null as any); this.head = new AtomicReference(sentinel); this.tail = new AtomicReference(sentinel); } enqueue(value: T): void { const newNode = new Node(value); while (true) { const currentTail = this.tail.load(); const next = currentTail.next.load(); // Node.next is atomic // Check if tail is still the tail if (currentTail === this.tail.load()) { if (next === null) { // Tail is truly at the end; try to append if (currentTail.next.compareAndSet(null, newNode)) { // Successfully appended; try to advance tail this.tail.compareAndSet(currentTail, newNode); return; } } else { // Tail has fallen behind; help advance it this.tail.compareAndSet(currentTail, next); } } } } dequeue(): T | null { while (true) { const currentHead = this.head.load(); const currentTail = this.tail.load(); const next = currentHead.next.load(); if (currentHead === this.head.load()) { if (currentHead === currentTail) { if (next === null) { return null; // Queue is empty } // Tail is behind; help advance it this.tail.compareAndSet(currentTail, next); } else { // Read value before CAS (might be overwritten) const value = next.value; // Try to advance head if (this.head.compareAndSet(currentHead, next)) { return value; } } } } }}Notice how enqueue/dequeue 'help' advance the tail pointer if it has fallen behind. This cooperative approach is common in lock-free algorithms. Every thread ensures the structure is consistent, not just its own operation. This is key to lock-freedom: even if one thread stalls, others can complete operations.
Standard CAS operates on a single word (64 bits on modern systems). Double-Width CAS (DWCAS) atomically operates on two words (128 bits), enabling powerful patterns:
Use Cases:
Hardware Support:
| Architecture | Instruction | Notes |
|---|---|---|
| x86-64 | CMPXCHG16B | Requires 16-byte aligned addresses |
| ARM64 | CASP / LDXP+STXP | Pair instructions for 128-bit |
| POWER | Not native | Implemented with LL/SC sequences |
12345678910111213141516171819202122232425262728293031323334353637383940414243
#include <atomic>#include <cstdint> // 128-bit structure (pointer + counter)struct alignas(16) TaggedPtr { void* ptr; // 64 bits uint64_t counter; // 64 bits bool operator==(const TaggedPtr& other) const { return ptr == other.ptr && counter == other.counter; }}; class ABAFreeStack { std::atomic<TaggedPtr> head; public: void push(Node* node) { TaggedPtr old_head = head.load(); TaggedPtr new_head; do { node->next = static_cast<Node*>(old_head.ptr); new_head.ptr = node; new_head.counter = old_head.counter + 1; } while (!head.compare_exchange_weak(old_head, new_head)); } Node* pop() { TaggedPtr old_head = head.load(); TaggedPtr new_head; do { if (old_head.ptr == nullptr) { return nullptr; } new_head.ptr = static_cast<Node*>(old_head.ptr)->next; new_head.counter = old_head.counter + 1; } while (!head.compare_exchange_weak(old_head, new_head)); return static_cast<Node*>(old_head.ptr); }}; // Verify 128-bit atomic is lock-freestatic_assert(std::atomic<TaggedPtr>::is_always_lock_free);128-bit atomics typically require 16-byte alignment. Unaligned access may compile but cause runtime traps or correctness issues. Always use alignas(16) or equivalent alignment specifiers.
CAS has a remarkable theoretical property: it is universal. This means that any shared data structure with any finite set of operations can be implemented in a lock-free manner using only CAS and atomic load/store.
The Universality Result:
Herlihy's 1991 paper "Wait-Free Synchronization" established the consensus hierarchy—a ranking of synchronization primitives by their power. CAS has consensus number ∞, meaning it can solve consensus among any number of threads.
This turns out to be equivalent to saying CAS can implement any concurrent object. If you can solve consensus, you can build anything.
Practical Implications:
If you have CAS, you can implement:
Conversely, primitives like test-and-set or fetch-and-add have finite consensus numbers and cannot implement certain lock-free data structures.
| Consensus Number | Primitives | Can Implement |
|---|---|---|
| 1 | Atomic read, atomic write | Only sequential objects |
| 2 | Test-and-Set, Fetch-and-Add, Swap | 2-thread lock-free objects |
| ∞ (Universal) | Compare-and-Swap, LL/SC | Any n-thread lock-free object |
1234567891011121314151617181920212223242526272829303132333435
// In theory, you can make ANY object lock-free with CAS: interface Operation<T> { apply(state: T): T;} class UniversalConstruction<T> { private state: AtomicReference<T>; constructor(initial: T) { this.state = new AtomicReference(initial); } // Apply ANY operation lock-free apply(op: Operation<T>): T { while (true) { const oldState = this.state.load(); const newState = op.apply(oldState); // Compute new state if (this.state.compareAndSet(oldState, newState)) { return newState; } // Contention; retry with fresh state } }} // Usage: Any operation becomes lock-freeconst counter = new UniversalConstruction(0);counter.apply({ apply: n => n + 1 }); // Incrementcounter.apply({ apply: n => n * 2 }); // Doublecounter.apply({ apply: n => Math.max(n, 100) }); // Max with 100 // In practice, specialized implementations are more efficient,// but this proves CAS can do ANYTHING lock-free.While the universal construction proves CAS can implement anything, practical lock-free structures are specialized for efficiency. The Michael-Scott queue isn't just 'UniversalConstruction applied to a queue'—it's an optimized design. But knowing CAS is universal gives confidence that any problem has a lock-free solution.
We've thoroughly explored the most important atomic primitive. Let's consolidate:
What's Next:
With CAS mastered, we turn to the emerging field of lock-free programming. The next page covers lock-free vs wait-free guarantees, the design of lock-free algorithms, performance considerations, and when lock-free is (and isn't) the right choice.
You now have deep mastery of Compare-and-Swap—the universal building block of lock-free programming. From its basic mechanics through the ABA problem and its solutions, you understand both the power and the pitfalls of CAS-based concurrency.