System Design (LLD)Atomic Operations

Atomic Operations: The Indivisible Units of Concurrency

LevelAdvanced

Duration60 mins

TopicAtomic Operations

3 / 4

Compare-and-Swap (CAS)

The Universal Atomic Primitive

If there is one atomic operation that tower above all others in importance, it is Compare-and-Swap (CAS). This single primitive is the foundation upon which all lock-free programming is built.

CAS has a remarkable property: it is universal. Given only CAS and atomic load/store, you can implement any other atomic operation, and more importantly, you can implement any lock-free or wait-free data structure possible. This isn't just theoretical—it's a proven result in concurrency theory.

Understanding CAS deeply is essential for any engineer who wants to:

Implement lock-free data structures
Understand how concurrent containers work
Diagnose subtle concurrency bugs
Write high-performance concurrent code

This page provides that understanding.

What You Will Learn

This page covers CAS mechanics in detail: how it works, its variants (weak vs strong CAS), common usage patterns, the notorious ABA problem and its solutions, and how CAS enables the entire field of lock-free programming.

CAS Mechanics: How It Works

CAS atomically performs the following operation:

compare_and_swap(address, expected, desired):
    atomic {
        if (*address == expected) {
            *address = desired
            return true   // Success: we updated the value
        } else {
            return false  // Failure: value was different
        }
    }

The three operands:

Address — Memory location to update
Expected — The value we believe is currently there
Desired — The new value to write if our expectation is correct

The semantics:

If the current value equals expected, replace it with desired and return success
If the current value differs, leave it unchanged and return failure
The entire operation is atomic: no other thread can interleave between compare and swap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Java: AtomicInteger.compareAndSet
AtomicInteger counter = new AtomicInteger(10);
boolean success = counter.compareAndSet(10, 20);
// If counter was 10, it's now 20 and success is true
// If counter was not 10, it's unchanged and success is false
 
// Modern Java also has compareAndExchange (returns old value):
int oldValue = counter.compareAndExchange(20, 30);
// oldValue is the value BEFORE the operation
// If oldValue == 20, the swap succeeded
 
// C++11: std::atomic<T>::compare_exchange_strong
std::atomic<int> counter{10};
int expected = 10;
bool success = counter.compare_exchange_strong(expected, 20);
// If counter was 10, it's now 20 and success is true
// If counter was not 10, 'expected' is updated to actual value
 
// C++: compare_exchange_weak (may fail spuriously)
while (!counter.compare_exchange_weak(expected, expected + 1)) {
    // On failure, 'expected' is updated; retry with new value
}
 
// Rust: compare_exchange
use std::sync::atomic::{AtomicI32, Ordering};
let counter = AtomicI32::new(10);
match counter.compare_exchange(10, 20, Ordering::SeqCst, Ordering::SeqCst) {
    Ok(old) => println!("Swapped, old value: {}", old),
    Err(actual) => println!("Failed, actual: {}", actual),
}
 
// Go: sync/atomic.CompareAndSwapInt64
import "sync/atomic"
var counter int64 = 10
swapped := atomic.CompareAndSwapInt64(&counter, 10, 20)

The Critical Insight: Optimistic Concurrency

CAS embodies optimistic concurrency: we assume no conflict and proceed, detecting conflicts only when we try to commit. This contrasts with pessimistic concurrency (locks), where we acquire exclusive access before doing anything.

The optimistic approach has two key advantages:

No blocking: Threads never wait for each other; they retry on conflict
Progress guarantee: If one CAS fails, it means another succeeded—system-wide progress is always made

CAS as a Transaction

Think of CAS as a single-word transaction. You read a value, compute what you want to change it to, then atomically commit the change—but only if nothing modified it while you were computing. If something did, your 'transaction' aborts, and you retry.

The CAS Loop Pattern

Since CAS can fail (when another thread modified the value), most CAS-based algorithms use a retry loop. This is the fundamental pattern of lock-free programming:

do {
    oldValue = load(address)
    newValue = compute(oldValue)
} while (!CAS(address, oldValue, newValue))

The loop guarantees eventual success:

We read the current value
We compute the desired new value based on what we read
We try to atomically replace old with new
If it fails (someone else modified), we retry with the fresh value
Eventually, we'll catch a moment when no one else is modifying

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// ATOMIC INCREMENT (how atomic.fetchAdd works internally)
function atomicIncrement(counter: Atomic<number>): number {
    let oldValue: number;
    let newValue: number;
    do {
        oldValue = counter.load();
        newValue = oldValue + 1;
    } while (!counter.compareAndSet(oldValue, newValue));
    return oldValue;  // Return old value like fetchAdd does
}
 
// ATOMIC MAX (update only if new value is greater)
function atomicMax(value: Atomic<number>, candidate: number): number {
    let current: number;
    do {
        current = value.load();
        if (candidate <= current) {
            return current;  // Candidate not greater; no update needed
        }
    } while (!value.compareAndSet(current, candidate));
    return candidate;  // Successfully updated to the new max
}
 
// ATOMIC UPDATE WITH ARBITRARY FUNCTION
function atomicUpdate<T>(
    reference: AtomicReference<T>,
    updateFn: (current: T) => T
): T {
    let current: T;
    let next: T;
    do {
        current = reference.load();
        next = updateFn(current);
    } while (!reference.compareAndSet(current, next));
    return next;
}
 
// Usage: atomically double a value
atomicUpdate(someAtomic, val => val * 2);
 
// LOCK-FREE COUNTER WITH BOUNDED RETRIES
function incrementWithLimit(
    counter: Atomic<number>,
    maxRetries: number = 100
): boolean {
    for (let retry = 0; retry < maxRetries; retry++) {
        const current = counter.load();
        if (counter.compareAndSet(current, current + 1)) {
            return true;  // Success
        }
        // Optionally add exponential backoff here
    }
    return false;  // Failed after max retries
}

CAS Loops Can Spin Indefinitely

Under extreme contention, CAS loops can retry many times. While the system makes progress (some thread always succeeds), individual threads might be delayed. Consider bounded retries with fallback to locks, or exponential backoff, for high-contention scenarios.

CAS Loop with C++ compare_exchange

C++ has a particularly elegant pattern because compare_exchange_* updates the expected value on failure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <atomic>
 
std::atomic<int> counter{0};
 
// Increment using CAS loop
void increment() {
    int expected = counter.load();
    // On failure, expected is automatically updated to actual value
    while (!counter.compare_exchange_weak(expected, expected + 1)) {
        // No need to reload; expected already has the current value
    }
}
 
// Generic update function
template<typename T, typename F>
T atomic_update(std::atomic<T>& atom, F update_fn) {
    T expected = atom.load();
    T desired;
    do {
        desired = update_fn(expected);
    } while (!atom.compare_exchange_weak(expected, desired));
    return desired;
}
 
// Usage
atomic_update(counter, [](int x) { return x * 2; });  // Double atomically

Strong vs Weak CAS

C++11 and some other languages distinguish between strong and weak compare-and-exchange:

Strong CAS (compare_exchange_strong):

Fails ONLY if the current value differs from expected
Will loop internally until it succeeds or observes a different value
Easier to reason about

Weak CAS (compare_exchange_weak):

May fail spuriously: it can return false even when the value matches expected
Fails without the internal retry that strong CAS performs
More efficient on some architectures (ARM, POWER)

Why does weak CAS exist?

On some CPU architectures (notably ARM and POWER), the atomic CAS instruction is implemented using load-linked/store-conditional (LL/SC) instructions. LL/SC can fail spuriously if:

There was contention on the cache line (even without a modification)
An interrupt occurred during the operation
The CPU's load-link reservation was cleared for any reason

Strong CAS hides this complexity by retrying internally, but that retry loop has overhead. If you're already in a retry loop (which is typical for CAS usage), using weak CAS eliminates a redundant inner loop.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// USE WEAK CAS: When you already have a retry loop
void increment_with_weak() {
    int expected = counter.load();
    // Already in a while loop; weak CAS is perfect
    while (!counter.compare_exchange_weak(expected, expected + 1)) {
        // Spurious failure just means we retry, no problem
    }
}
 
// USE STRONG CAS: When you take different action on failure
bool try_acquire_lock() {
    bool expected = false;
    // We only want ONE attempt, not a retry loop
    // Use strong to avoid spurious failures affecting our logic
    return lock_flag.compare_exchange_strong(expected, true);
}
 
// USE STRONG CAS: When failure has side effects
bool try_reserve_slot() {
    auto expected = slots.load();
    if (expected.count >= MAX_SLOTS) {
        return false;  // No slots available
    }
    
    Slots desired = expected;
    desired.count++;
    
    // Use strong: we log failures, don't want spurious logs
    if (slots.compare_exchange_strong(expected, desired)) {
        return true;
    }
    
    logContention();  // Real contention, not spurious
    return false;
}

Strong vs Weak CAS Comparison
Aspect	compare_exchange_strong	compare_exchange_weak
Spurious failure	Never	Possible
Internal retry	Yes	No
Overhead	Higher on LL/SC architectures	Lower
Use case	Single CAS without retry loop	Inside a retry loop
Mental model	Simpler	Requires understanding of spurious failure
x86-64 implementation	Identical	Identical (no spurious failures)

Default to Weak in Loops

When writing CAS loops (the common case), always use weak CAS. The slight performance improvement on ARM/POWER is free, and on x86-64 there's no difference. Reserve strong CAS for single-shot attempts where spurious failure would complicate your logic.

The ABA Problem

The ABA problem is the most famous pitfall of CAS-based programming. It occurs when:

Thread 1 reads value A from a location
Thread 2 changes the value from A to B
Thread 2 (or another thread) changes the value from B back to A
Thread 1 performs CAS, sees A (as expected), and succeeds—but the underlying state has changed!

CAS only checks if the value equals the expected value, not whether the value was ever changed. If the value goes A → B → A, CAS happily accepts it as "unchanged."

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// LOCK-FREE STACK WITH ABA VULNERABILITY
class LockFreeStack<T> {
    head: AtomicReference<Node<T> | null>;
 
    pop(): T | null {
        while (true) {
            const oldHead = this.head.load();  // Step 1: Read head
            if (oldHead === null) return null;
            
            const newHead = oldHead.next;       // Prepare new head
            
            // ABA can strike here!
            // Between load and CAS:
            // - Another thread pops oldHead
            // - That thread pushes a NEW node at the SAME memory address
            // - Our CAS sees the "same" pointer and succeeds
            // - But newHead (oldHead.next) is now GARBAGE!
            
            if (this.head.compareAndSet(oldHead, newHead)) {
                return oldHead.value;
            }
        }
    }
}
 
// TIMELINE OF ABA DISASTER:
// 
// Initial state: head → [A] → [B] → [C] → null
//
// Thread 1: oldHead = [A], newHead = [B]  (pauses before CAS)
//
// Thread 2: pop() → removes [A], gets value
//           Stack is now: head → [B] → [C] → null
//           [A] is freed (returned to memory allocator)
//
// Thread 2: pop() → removes [B], gets value
//           Stack is now: head → [C] → null
//           [B] is freed
//
// Thread 2: push(newValue) 
//           Allocator reuses [A]'s memory for new node!
//           Stack is now: head → [A'] → [C] → null
//           [A'] has same address as old [A], but different content!
//
// Thread 1: CAS(oldHead=[A], newHead=[B])
//           head ([A']) == expected ([A]) ← Same address!
//           CAS SUCCEEDS!
//           Stack is now: head → [B] ← FREED MEMORY!
//
// CRASH or data corruption follows

ABA Is a Real Threat

ABA is not theoretical. It's particularly dangerous in languages with manual memory management (C/C++) where freed memory can be reused. It can cause data corruption, crashes, or security vulnerabilities. Any CAS-based algorithm using pointers must address ABA.

When ABA Is NOT a Problem:

Monotonic values: If values only increase (sequence numbers, version counters), they can never return to a previous value
Immutable objects: If objects are never modified after creation, their identity is stable
Garbage-collected languages: If the runtime guarantees objects aren't reused while reachable, ABA may be mitigated (though still possible with pooled objects)

Solutions to the ABA Problem

Several techniques address the ABA problem:

1. Tagged Pointers / Version Counters

Package a version counter with the pointer. Each modification increments the counter. CAS checks both pointer AND counter.

struct TaggedPointer {
    ptr: *Node    // The actual pointer
    tag: uint64   // Version counter
}

Even if the pointer returns to the same value, the tag differs, causing CAS to correctly fail.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// TAGGED POINTER IMPLEMENTATION
interface TaggedPointer<T> {
    ptr: T | null;
    version: number;
}
 
class ABAFreeStack<T> {
    // Atomic 128-bit CAS (or simulate with wider atomic types)
    private head: Atomic<TaggedPointer<Node<T>>>;
    
    constructor() {
        this.head = new Atomic({ ptr: null, version: 0 });
    }
    
    push(value: T): void {
        const newNode = new Node(value);
        while (true) {
            const current = this.head.load();
            newNode.next = current.ptr;
            
            const desired: TaggedPointer<Node<T>> = {
                ptr: newNode,
                version: current.version + 1  // Increment version!
            };
            
            if (this.head.compareAndSet(current, desired)) {
                return;
            }
        }
    }
    
    pop(): T | null {
        while (true) {
            const current = this.head.load();
            if (current.ptr === null) return null;
            
            const desired: TaggedPointer<Node<T>> = {
                ptr: current.ptr.next,
                version: current.version + 1  // Increment version!
            };
            
            // CAS checks both ptr AND version
            // Even if ptr is same address, version will differ
            if (this.head.compareAndSet(current, desired)) {
                return current.ptr.value;
            }
        }
    }
}

2. Hazard Pointers

Each thread maintains a list of "hazard pointers"—pointers it is currently using. Before freeing memory, check if any thread's hazard pointers reference it. If so, defer the free.

This guarantees that no memory is freed while any thread might be using it, eliminating the reuse that causes ABA.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
// HAZARD POINTER CONCEPT (simplified)
class HazardPointer<T> {
    // Per-thread hazard pointer slots
    private static hazardPointers: Map<ThreadId, (T | null)[]> = new Map();
    private static retiredNodes: T[] = [];
    
    // Protect a pointer while we're using it
    static protect<T>(ptr: T): void {
        const threadHazards = this.getOrCreateHazards();
        threadHazards[0] = ptr;  // Set hazard pointer
        
        // Memory fence ensures the hazard is visible
        memoryBarrier();
    }
    
    // Release protection when done
    static release(): void {
        const threadHazards = this.getOrCreateHazards();
        threadHazards[0] = null;
    }
    
    // Retire a node for later deletion
    static retire<T>(node: T): void {
        this.retiredNodes.push(node);
        
        // Periodically scan and reclaim
        if (this.retiredNodes.length > THRESHOLD) {
            this.reclaimSafe();
        }
    }
    
    // Reclaim nodes not protected by any hazard pointer
    private static reclaimSafe(): void {
        // Collect all hazard pointers
        const allHazards = new Set<T>();
        for (const hazards of this.hazardPointers.values()) {
            for (const hp of hazards) {
                if (hp !== null) allHazards.add(hp);
            }
        }
        
        // Only free nodes not in any hazard pointer
        this.retiredNodes = this.retiredNodes.filter(node => {
            if (allHazards.has(node)) {
                return true;  // Keep; someone is using it
            } else {
                freeMemory(node);  // Safe to free
                return false;
            }
        });
    }
}
 
// Usage in lock-free stack
class SafeStack<T> {
    pop(): T | null {
        while (true) {
            const oldHead = this.head.load();
            if (oldHead === null) return null;
            
            // Protect the node before accessing its next pointer
            HazardPointer.protect(oldHead);
            
            // Re-check that head hasn't changed
            if (this.head.load() !== oldHead) {
                HazardPointer.release();
                continue;  // Head changed, retry
            }
            
            const newHead = oldHead.next;  // Now safe to dereference
            
            if (this.head.compareAndSet(oldHead, newHead)) {
                const value = oldHead.value;
                HazardPointer.release();
                HazardPointer.retire(oldHead);  // Defer deletion
                return value;
            }
            
            HazardPointer.release();
        }
    }
}

3. Epoch-Based Reclamation (EBR)

Divide time into epochs (generations). Objects retired in one epoch are only freed after all threads have advanced past that epoch. This ensures no thread is using old objects when they're freed.

4. Reference Counting

Traditional reference counting prevents ABA by not freeing objects that are still referenced. However, atomic reference counting has its own overhead and challenges (cycles, performance).

5. Language/Runtime Support

Garbage-collected languages often avoid explicit ABA mitigation because the runtime delays object reclamation until no references exist. However, this is not guaranteed with pooled objects or interned values.

ABA Solutions Comparison
Solution	Overhead	Complexity	Best For
Tagged Pointers	Medium (wider CAS)	Low	Simple structures, available 128-bit CAS
Hazard Pointers	Low runtime, high implementation	High	General purpose, proven correctness
Epoch-Based Reclamation	Low amortized	Medium	Read-heavy workloads
Reference Counting	High (atomic inc/dec)	Medium	When ownership is clear
GC/Managed Runtime	Varies	None (handled by runtime)	Java, C#, Go, etc.

CAS-Based Data Structures

CAS is the foundation for lock-free data structures. Let's examine some classic examples:

Lock-Free Stack (Treiber Stack)

The Treiber stack is the simplest lock-free data structure—a linked list where push and pop use CAS on the head pointer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
class TreiberStack<T> {
    private head: AtomicReference<Node<T> | null> = new AtomicReference(null);
    
    push(value: T): void {
        const newNode = new Node(value);
        while (true) {
            const oldHead = this.head.load();
            newNode.next = oldHead;
            
            // Linearization point: CAS that succeeds
            if (this.head.compareAndSet(oldHead, newNode)) {
                return;
            }
            // CAS failed; another push/pop changed head; retry
        }
    }
    
    pop(): T | null {
        while (true) {
            const oldHead = this.head.load();
            if (oldHead === null) {
                return null;  // Stack is empty
            }
            
            const newHead = oldHead.next;
            
            // Linearization point: CAS that succeeds
            if (this.head.compareAndSet(oldHead, newHead)) {
                return oldHead.value;
            }
            // CAS failed; another thread modified head; retry
        }
    }
    
    peek(): T | null {
        const head = this.head.load();
        return head?.value ?? null;
    }
    
    isEmpty(): boolean {
        return this.head.load() === null;
    }
}

Lock-Free Queue (Michael-Scott Queue)

The Michael-Scott queue is a lock-free FIFO queue using two pointers (head and tail). Enqueue appends to tail; dequeue removes from head. It's more complex than the stack because we must coordinate two pointers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
class MichaelScottQueue<T> {
    private head: AtomicReference<Node<T>>;
    private tail: AtomicReference<Node<T>>;
    
    constructor() {
        // Sentinel node simplifies edge cases
        const sentinel = new Node<T>(null as any);
        this.head = new AtomicReference(sentinel);
        this.tail = new AtomicReference(sentinel);
    }
    
    enqueue(value: T): void {
        const newNode = new Node(value);
        
        while (true) {
            const currentTail = this.tail.load();
            const next = currentTail.next.load();  // Node.next is atomic
            
            // Check if tail is still the tail
            if (currentTail === this.tail.load()) {
                if (next === null) {
                    // Tail is truly at the end; try to append
                    if (currentTail.next.compareAndSet(null, newNode)) {
                        // Successfully appended; try to advance tail
                        this.tail.compareAndSet(currentTail, newNode);
                        return;
                    }
                } else {
                    // Tail has fallen behind; help advance it
                    this.tail.compareAndSet(currentTail, next);
                }
            }
        }
    }
    
    dequeue(): T | null {
        while (true) {
            const currentHead = this.head.load();
            const currentTail = this.tail.load();
            const next = currentHead.next.load();
            
            if (currentHead === this.head.load()) {
                if (currentHead === currentTail) {
                    if (next === null) {
                        return null;  // Queue is empty
                    }
                    // Tail is behind; help advance it
                    this.tail.compareAndSet(currentTail, next);
                } else {
                    // Read value before CAS (might be overwritten)
                    const value = next.value;
                    
                    // Try to advance head
                    if (this.head.compareAndSet(currentHead, next)) {
                        return value;
                    }
                }
            }
        }
    }
}

Helping in Lock-Free Structures

Notice how enqueue/dequeue 'help' advance the tail pointer if it has fallen behind. This cooperative approach is common in lock-free algorithms. Every thread ensures the structure is consistent, not just its own operation. This is key to lock-freedom: even if one thread stalls, others can complete operations.

Double-Width CAS (DWCAS)

Standard CAS operates on a single word (64 bits on modern systems). Double-Width CAS (DWCAS) atomically operates on two words (128 bits), enabling powerful patterns:

Use Cases:

Tagged pointers — Pointer + version counter in one atomic operation
Double pointers — Two related pointers updated atomically
Pointer + ABA counter — The classic ABA solution

Hardware Support:

Architecture	Instruction	Notes
x86-64	CMPXCHG16B	Requires 16-byte aligned addresses
ARM64	CASP / LDXP+STXP	Pair instructions for 128-bit
POWER	Not native	Implemented with LL/SC sequences

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <atomic>
#include <cstdint>
 
// 128-bit structure (pointer + counter)
struct alignas(16) TaggedPtr {
    void* ptr;           // 64 bits
    uint64_t counter;    // 64 bits
    
    bool operator==(const TaggedPtr& other) const {
        return ptr == other.ptr && counter == other.counter;
    }
};
 
class ABAFreeStack {
    std::atomic<TaggedPtr> head;
    
public:
    void push(Node* node) {
        TaggedPtr old_head = head.load();
        TaggedPtr new_head;
        do {
            node->next = static_cast<Node*>(old_head.ptr);
            new_head.ptr = node;
            new_head.counter = old_head.counter + 1;
        } while (!head.compare_exchange_weak(old_head, new_head));
    }
    
    Node* pop() {
        TaggedPtr old_head = head.load();
        TaggedPtr new_head;
        do {
            if (old_head.ptr == nullptr) {
                return nullptr;
            }
            new_head.ptr = static_cast<Node*>(old_head.ptr)->next;
            new_head.counter = old_head.counter + 1;
        } while (!head.compare_exchange_weak(old_head, new_head));
        return static_cast<Node*>(old_head.ptr);
    }
};
 
// Verify 128-bit atomic is lock-free
static_assert(std::atomic<TaggedPtr>::is_always_lock_free);

Alignment Requirements

128-bit atomics typically require 16-byte alignment. Unaligned access may compile but cause runtime traps or correctness issues. Always use alignas(16) or equivalent alignment specifiers.

CAS Universality: The Theoretical Foundation

CAS has a remarkable theoretical property: it is universal. This means that any shared data structure with any finite set of operations can be implemented in a lock-free manner using only CAS and atomic load/store.

The Universality Result:

Herlihy's 1991 paper "Wait-Free Synchronization" established the consensus hierarchy—a ranking of synchronization primitives by their power. CAS has consensus number ∞, meaning it can solve consensus among any number of threads.

This turns out to be equivalent to saying CAS can implement any concurrent object. If you can solve consensus, you can build anything.

Practical Implications:

If you have CAS, you can implement:

Any lock-free data structure (stack, queue, list, tree...)
Any atomic read-modify-write operation
Any concurrent algorithm that's possible to implement lock-free

Conversely, primitives like test-and-set or fetch-and-add have finite consensus numbers and cannot implement certain lock-free data structures.

Consensus Hierarchy (Herlihy)
Consensus Number	Primitives	Can Implement
1	Atomic read, atomic write	Only sequential objects
2	Test-and-Set, Fetch-and-Add, Swap	2-thread lock-free objects
∞ (Universal)	Compare-and-Swap, LL/SC	Any n-thread lock-free object

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// In theory, you can make ANY object lock-free with CAS:
 
interface Operation<T> {
    apply(state: T): T;
}
 
class UniversalConstruction<T> {
    private state: AtomicReference<T>;
    
    constructor(initial: T) {
        this.state = new AtomicReference(initial);
    }
    
    // Apply ANY operation lock-free
    apply(op: Operation<T>): T {
        while (true) {
            const oldState = this.state.load();
            const newState = op.apply(oldState);  // Compute new state
            
            if (this.state.compareAndSet(oldState, newState)) {
                return newState;
            }
            // Contention; retry with fresh state
        }
    }
}
 
// Usage: Any operation becomes lock-free
const counter = new UniversalConstruction(0);
counter.apply({ apply: n => n + 1 });  // Increment
counter.apply({ apply: n => n * 2 });  // Double
counter.apply({ apply: n => Math.max(n, 100) });  // Max with 100
 
// In practice, specialized implementations are more efficient,
// but this proves CAS can do ANYTHING lock-free.

Theory Meets Practice

While the universal construction proves CAS can implement anything, practical lock-free structures are specialized for efficiency. The Michael-Scott queue isn't just 'UniversalConstruction applied to a queue'—it's an optimized design. But knowing CAS is universal gives confidence that any problem has a lock-free solution.

Summary: Mastering Compare-and-Swap

We've thoroughly explored the most important atomic primitive. Let's consolidate:

Key Takeaways

•CAS is conditional update — Atomically compare and swap only if the current value matches expected
•CAS loops enable optimistic concurrency — Read, compute, try to commit; retry on conflict
•Weak vs strong CAS — Use weak in loops (faster on ARM/POWER); use strong for single attempts
•ABA is a real threat — Values that cycle back can make CAS succeed when it shouldn't
•Tagged pointers solve ABA — Combining pointer + version counter makes ABA detectable
•Hazard pointers and EBR — Advanced techniques prevent memory reuse during access
•Lock-free data structures — Treiber Stack, Michael-Scott Queue, and many more built on CAS
•DWCAS extends capability — 128-bit CAS enables pointer + counter atomicity
•CAS is universal — Any lock-free data structure can be built with CAS

What's Next:

With CAS mastered, we turn to the emerging field of lock-free programming. The next page covers lock-free vs wait-free guarantees, the design of lock-free algorithms, performance considerations, and when lock-free is (and isn't) the right choice.

Page Complete

You now have deep mastery of Compare-and-Swap—the universal building block of lock-free programming. From its basic mechanics through the ABA problem and its solutions, you understand both the power and the pitfalls of CAS-based concurrency.

3 / 4

Loading learning content...

System Design (LLD)Atomic Operations

Atomic Operations: The Indivisible Units of Concurrency

LevelAdvanced

Duration60 mins

TopicAtomic Operations

3 / 4

Compare-and-Swap (CAS)

The Universal Atomic Primitive

If there is one atomic operation that tower above all others in importance, it is Compare-and-Swap (CAS). This single primitive is the foundation upon which all lock-free programming is built.

Understanding CAS deeply is essential for any engineer who wants to:

Implement lock-free data structures
Understand how concurrent containers work
Diagnose subtle concurrency bugs
Write high-performance concurrent code

This page provides that understanding.

What You Will Learn

CAS Mechanics: How It Works

CAS atomically performs the following operation:

compare_and_swap(address, expected, desired):
    atomic {
        if (*address == expected) {
            *address = desired
            return true   // Success: we updated the value
        } else {
            return false  // Failure: value was different
        }
    }

The three operands:

Address — Memory location to update
Expected — The value we believe is currently there
Desired — The new value to write if our expectation is correct

The semantics:

If the current value equals expected, replace it with desired and return success
If the current value differs, leave it unchanged and return failure
The entire operation is atomic: no other thread can interleave between compare and swap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Java: AtomicInteger.compareAndSet
AtomicInteger counter = new AtomicInteger(10);
boolean success = counter.compareAndSet(10, 20);
// If counter was 10, it's now 20 and success is true
// If counter was not 10, it's unchanged and success is false
 
// Modern Java also has compareAndExchange (returns old value):
int oldValue = counter.compareAndExchange(20, 30);
// oldValue is the value BEFORE the operation
// If oldValue == 20, the swap succeeded
 
// C++11: std::atomic<T>::compare_exchange_strong
std::atomic<int> counter{10};
int expected = 10;
bool success = counter.compare_exchange_strong(expected, 20);
// If counter was 10, it's now 20 and success is true
// If counter was not 10, 'expected' is updated to actual value
 
// C++: compare_exchange_weak (may fail spuriously)
while (!counter.compare_exchange_weak(expected, expected + 1)) {
    // On failure, 'expected' is updated; retry with new value
}
 
// Rust: compare_exchange
use std::sync::atomic::{AtomicI32, Ordering};
let counter = AtomicI32::new(10);
match counter.compare_exchange(10, 20, Ordering::SeqCst, Ordering::SeqCst) {
    Ok(old) => println!("Swapped, old value: {}", old),
    Err(actual) => println!("Failed, actual: {}", actual),
}
 
// Go: sync/atomic.CompareAndSwapInt64
import "sync/atomic"
var counter int64 = 10
swapped := atomic.CompareAndSwapInt64(&counter, 10, 20)

The Critical Insight: Optimistic Concurrency

The optimistic approach has two key advantages:

No blocking: Threads never wait for each other; they retry on conflict
Progress guarantee: If one CAS fails, it means another succeeded—system-wide progress is always made

CAS as a Transaction

The CAS Loop Pattern

Since CAS can fail (when another thread modified the value), most CAS-based algorithms use a retry loop. This is the fundamental pattern of lock-free programming:

do {
    oldValue = load(address)
    newValue = compute(oldValue)
} while (!CAS(address, oldValue, newValue))

The loop guarantees eventual success:

We read the current value
We compute the desired new value based on what we read
We try to atomically replace old with new
If it fails (someone else modified), we retry with the fresh value
Eventually, we'll catch a moment when no one else is modifying

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// ATOMIC INCREMENT (how atomic.fetchAdd works internally)
function atomicIncrement(counter: Atomic<number>): number {
    let oldValue: number;
    let newValue: number;
    do {
        oldValue = counter.load();
        newValue = oldValue + 1;
    } while (!counter.compareAndSet(oldValue, newValue));
    return oldValue;  // Return old value like fetchAdd does
}
 
// ATOMIC MAX (update only if new value is greater)
function atomicMax(value: Atomic<number>, candidate: number): number {
    let current: number;
    do {
        current = value.load();
        if (candidate <= current) {
            return current;  // Candidate not greater; no update needed
        }
    } while (!value.compareAndSet(current, candidate));
    return candidate;  // Successfully updated to the new max
}
 
// ATOMIC UPDATE WITH ARBITRARY FUNCTION
function atomicUpdate<T>(
    reference: AtomicReference<T>,
    updateFn: (current: T) => T
): T {
    let current: T;
    let next: T;
    do {
        current = reference.load();
        next = updateFn(current);
    } while (!reference.compareAndSet(current, next));
    return next;
}
 
// Usage: atomically double a value
atomicUpdate(someAtomic, val => val * 2);
 
// LOCK-FREE COUNTER WITH BOUNDED RETRIES
function incrementWithLimit(
    counter: Atomic<number>,
    maxRetries: number = 100
): boolean {
    for (let retry = 0; retry < maxRetries; retry++) {
        const current = counter.load();
        if (counter.compareAndSet(current, current + 1)) {
            return true;  // Success
        }
        // Optionally add exponential backoff here
    }
    return false;  // Failed after max retries
}

CAS Loops Can Spin Indefinitely

CAS Loop with C++ compare_exchange

C++ has a particularly elegant pattern because compare_exchange_* updates the expected value on failure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <atomic>
 
std::atomic<int> counter{0};
 
// Increment using CAS loop
void increment() {
    int expected = counter.load();
    // On failure, expected is automatically updated to actual value
    while (!counter.compare_exchange_weak(expected, expected + 1)) {
        // No need to reload; expected already has the current value
    }
}
 
// Generic update function
template<typename T, typename F>
T atomic_update(std::atomic<T>& atom, F update_fn) {
    T expected = atom.load();
    T desired;
    do {
        desired = update_fn(expected);
    } while (!atom.compare_exchange_weak(expected, desired));
    return desired;
}
 
// Usage
atomic_update(counter, [](int x) { return x * 2; });  // Double atomically

Strong vs Weak CAS

C++11 and some other languages distinguish between strong and weak compare-and-exchange:

Strong CAS (compare_exchange_strong):

Fails ONLY if the current value differs from expected
Will loop internally until it succeeds or observes a different value
Easier to reason about

Weak CAS (compare_exchange_weak):

May fail spuriously: it can return false even when the value matches expected
Fails without the internal retry that strong CAS performs
More efficient on some architectures (ARM, POWER)

Why does weak CAS exist?

On some CPU architectures (notably ARM and POWER), the atomic CAS instruction is implemented using load-linked/store-conditional (LL/SC) instructions. LL/SC can fail spuriously if:

There was contention on the cache line (even without a modification)
An interrupt occurred during the operation
The CPU's load-link reservation was cleared for any reason

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// USE WEAK CAS: When you already have a retry loop
void increment_with_weak() {
    int expected = counter.load();
    // Already in a while loop; weak CAS is perfect
    while (!counter.compare_exchange_weak(expected, expected + 1)) {
        // Spurious failure just means we retry, no problem
    }
}
 
// USE STRONG CAS: When you take different action on failure
bool try_acquire_lock() {
    bool expected = false;
    // We only want ONE attempt, not a retry loop
    // Use strong to avoid spurious failures affecting our logic
    return lock_flag.compare_exchange_strong(expected, true);
}
 
// USE STRONG CAS: When failure has side effects
bool try_reserve_slot() {
    auto expected = slots.load();
    if (expected.count >= MAX_SLOTS) {
        return false;  // No slots available
    }
    
    Slots desired = expected;
    desired.count++;
    
    // Use strong: we log failures, don't want spurious logs
    if (slots.compare_exchange_strong(expected, desired)) {
        return true;
    }
    
    logContention();  // Real contention, not spurious
    return false;
}

Strong vs Weak CAS Comparison
Aspect	compare_exchange_strong	compare_exchange_weak
Spurious failure	Never	Possible
Internal retry	Yes	No
Overhead	Higher on LL/SC architectures	Lower
Use case	Single CAS without retry loop	Inside a retry loop
Mental model	Simpler	Requires understanding of spurious failure
x86-64 implementation	Identical	Identical (no spurious failures)

Default to Weak in Loops

The ABA Problem

The ABA problem is the most famous pitfall of CAS-based programming. It occurs when:

Thread 1 reads value A from a location
Thread 2 changes the value from A to B
Thread 2 (or another thread) changes the value from B back to A
Thread 1 performs CAS, sees A (as expected), and succeeds—but the underlying state has changed!

CAS only checks if the value equals the expected value, not whether the value was ever changed. If the value goes A → B → A, CAS happily accepts it as "unchanged."

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// LOCK-FREE STACK WITH ABA VULNERABILITY
class LockFreeStack<T> {
    head: AtomicReference<Node<T> | null>;
 
    pop(): T | null {
        while (true) {
            const oldHead = this.head.load();  // Step 1: Read head
            if (oldHead === null) return null;
            
            const newHead = oldHead.next;       // Prepare new head
            
            // ABA can strike here!
            // Between load and CAS:
            // - Another thread pops oldHead
            // - That thread pushes a NEW node at the SAME memory address
            // - Our CAS sees the "same" pointer and succeeds
            // - But newHead (oldHead.next) is now GARBAGE!
            
            if (this.head.compareAndSet(oldHead, newHead)) {
                return oldHead.value;
            }
        }
    }
}
 
// TIMELINE OF ABA DISASTER:
// 
// Initial state: head → [A] → [B] → [C] → null
//
// Thread 1: oldHead = [A], newHead = [B]  (pauses before CAS)
//
// Thread 2: pop() → removes [A], gets value
//           Stack is now: head → [B] → [C] → null
//           [A] is freed (returned to memory allocator)
//
// Thread 2: pop() → removes [B], gets value
//           Stack is now: head → [C] → null
//           [B] is freed
//
// Thread 2: push(newValue) 
//           Allocator reuses [A]'s memory for new node!
//           Stack is now: head → [A'] → [C] → null
//           [A'] has same address as old [A], but different content!
//
// Thread 1: CAS(oldHead=[A], newHead=[B])
//           head ([A']) == expected ([A]) ← Same address!
//           CAS SUCCEEDS!
//           Stack is now: head → [B] ← FREED MEMORY!
//
// CRASH or data corruption follows

ABA Is a Real Threat

When ABA Is NOT a Problem:

Monotonic values: If values only increase (sequence numbers, version counters), they can never return to a previous value
Immutable objects: If objects are never modified after creation, their identity is stable
Garbage-collected languages: If the runtime guarantees objects aren't reused while reachable, ABA may be mitigated (though still possible with pooled objects)

Solutions to the ABA Problem

Several techniques address the ABA problem:

1. Tagged Pointers / Version Counters

Package a version counter with the pointer. Each modification increments the counter. CAS checks both pointer AND counter.

struct TaggedPointer {
    ptr: *Node    // The actual pointer
    tag: uint64   // Version counter
}

Even if the pointer returns to the same value, the tag differs, causing CAS to correctly fail.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// TAGGED POINTER IMPLEMENTATION
interface TaggedPointer<T> {
    ptr: T | null;
    version: number;
}
 
class ABAFreeStack<T> {
    // Atomic 128-bit CAS (or simulate with wider atomic types)
    private head: Atomic<TaggedPointer<Node<T>>>;
    
    constructor() {
        this.head = new Atomic({ ptr: null, version: 0 });
    }
    
    push(value: T): void {
        const newNode = new Node(value);
        while (true) {
            const current = this.head.load();
            newNode.next = current.ptr;
            
            const desired: TaggedPointer<Node<T>> = {
                ptr: newNode,
                version: current.version + 1  // Increment version!
            };
            
            if (this.head.compareAndSet(current, desired)) {
                return;
            }
        }
    }
    
    pop(): T | null {
        while (true) {
            const current = this.head.load();
            if (current.ptr === null) return null;
            
            const desired: TaggedPointer<Node<T>> = {
                ptr: current.ptr.next,
                version: current.version + 1  // Increment version!
            };
            
            // CAS checks both ptr AND version
            // Even if ptr is same address, version will differ
            if (this.head.compareAndSet(current, desired)) {
                return current.ptr.value;
            }
        }
    }
}

2. Hazard Pointers

Each thread maintains a list of "hazard pointers"—pointers it is currently using. Before freeing memory, check if any thread's hazard pointers reference it. If so, defer the free.

This guarantees that no memory is freed while any thread might be using it, eliminating the reuse that causes ABA.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
// HAZARD POINTER CONCEPT (simplified)
class HazardPointer<T> {
    // Per-thread hazard pointer slots
    private static hazardPointers: Map<ThreadId, (T | null)[]> = new Map();
    private static retiredNodes: T[] = [];
    
    // Protect a pointer while we're using it
    static protect<T>(ptr: T): void {
        const threadHazards = this.getOrCreateHazards();
        threadHazards[0] = ptr;  // Set hazard pointer
        
        // Memory fence ensures the hazard is visible
        memoryBarrier();
    }
    
    // Release protection when done
    static release(): void {
        const threadHazards = this.getOrCreateHazards();
        threadHazards[0] = null;
    }
    
    // Retire a node for later deletion
    static retire<T>(node: T): void {
        this.retiredNodes.push(node);
        
        // Periodically scan and reclaim
        if (this.retiredNodes.length > THRESHOLD) {
            this.reclaimSafe();
        }
    }
    
    // Reclaim nodes not protected by any hazard pointer
    private static reclaimSafe(): void {
        // Collect all hazard pointers
        const allHazards = new Set<T>();
        for (const hazards of this.hazardPointers.values()) {
            for (const hp of hazards) {
                if (hp !== null) allHazards.add(hp);
            }
        }
        
        // Only free nodes not in any hazard pointer
        this.retiredNodes = this.retiredNodes.filter(node => {
            if (allHazards.has(node)) {
                return true;  // Keep; someone is using it
            } else {
                freeMemory(node);  // Safe to free
                return false;
            }
        });
    }
}
 
// Usage in lock-free stack
class SafeStack<T> {
    pop(): T | null {
        while (true) {
            const oldHead = this.head.load();
            if (oldHead === null) return null;
            
            // Protect the node before accessing its next pointer
            HazardPointer.protect(oldHead);
            
            // Re-check that head hasn't changed
            if (this.head.load() !== oldHead) {
                HazardPointer.release();
                continue;  // Head changed, retry
            }
            
            const newHead = oldHead.next;  // Now safe to dereference
            
            if (this.head.compareAndSet(oldHead, newHead)) {
                const value = oldHead.value;
                HazardPointer.release();
                HazardPointer.retire(oldHead);  // Defer deletion
                return value;
            }
            
            HazardPointer.release();
        }
    }
}

3. Epoch-Based Reclamation (EBR)

Divide time into epochs (generations). Objects retired in one epoch are only freed after all threads have advanced past that epoch. This ensures no thread is using old objects when they're freed.

4. Reference Counting

Traditional reference counting prevents ABA by not freeing objects that are still referenced. However, atomic reference counting has its own overhead and challenges (cycles, performance).

5. Language/Runtime Support

ABA Solutions Comparison
Solution	Overhead	Complexity	Best For
Tagged Pointers	Medium (wider CAS)	Low	Simple structures, available 128-bit CAS
Hazard Pointers	Low runtime, high implementation	High	General purpose, proven correctness
Epoch-Based Reclamation	Low amortized	Medium	Read-heavy workloads
Reference Counting	High (atomic inc/dec)	Medium	When ownership is clear
GC/Managed Runtime	Varies	None (handled by runtime)	Java, C#, Go, etc.

CAS-Based Data Structures

CAS is the foundation for lock-free data structures. Let's examine some classic examples:

Lock-Free Stack (Treiber Stack)

The Treiber stack is the simplest lock-free data structure—a linked list where push and pop use CAS on the head pointer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
class TreiberStack<T> {
    private head: AtomicReference<Node<T> | null> = new AtomicReference(null);
    
    push(value: T): void {
        const newNode = new Node(value);
        while (true) {
            const oldHead = this.head.load();
            newNode.next = oldHead;
            
            // Linearization point: CAS that succeeds
            if (this.head.compareAndSet(oldHead, newNode)) {
                return;
            }
            // CAS failed; another push/pop changed head; retry
        }
    }
    
    pop(): T | null {
        while (true) {
            const oldHead = this.head.load();
            if (oldHead === null) {
                return null;  // Stack is empty
            }
            
            const newHead = oldHead.next;
            
            // Linearization point: CAS that succeeds
            if (this.head.compareAndSet(oldHead, newHead)) {
                return oldHead.value;
            }
            // CAS failed; another thread modified head; retry
        }
    }
    
    peek(): T | null {
        const head = this.head.load();
        return head?.value ?? null;
    }
    
    isEmpty(): boolean {
        return this.head.load() === null;
    }
}

Lock-Free Queue (Michael-Scott Queue)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
class MichaelScottQueue<T> {
    private head: AtomicReference<Node<T>>;
    private tail: AtomicReference<Node<T>>;
    
    constructor() {
        // Sentinel node simplifies edge cases
        const sentinel = new Node<T>(null as any);
        this.head = new AtomicReference(sentinel);
        this.tail = new AtomicReference(sentinel);
    }
    
    enqueue(value: T): void {
        const newNode = new Node(value);
        
        while (true) {
            const currentTail = this.tail.load();
            const next = currentTail.next.load();  // Node.next is atomic
            
            // Check if tail is still the tail
            if (currentTail === this.tail.load()) {
                if (next === null) {
                    // Tail is truly at the end; try to append
                    if (currentTail.next.compareAndSet(null, newNode)) {
                        // Successfully appended; try to advance tail
                        this.tail.compareAndSet(currentTail, newNode);
                        return;
                    }
                } else {
                    // Tail has fallen behind; help advance it
                    this.tail.compareAndSet(currentTail, next);
                }
            }
        }
    }
    
    dequeue(): T | null {
        while (true) {
            const currentHead = this.head.load();
            const currentTail = this.tail.load();
            const next = currentHead.next.load();
            
            if (currentHead === this.head.load()) {
                if (currentHead === currentTail) {
                    if (next === null) {
                        return null;  // Queue is empty
                    }
                    // Tail is behind; help advance it
                    this.tail.compareAndSet(currentTail, next);
                } else {
                    // Read value before CAS (might be overwritten)
                    const value = next.value;
                    
                    // Try to advance head
                    if (this.head.compareAndSet(currentHead, next)) {
                        return value;
                    }
                }
            }
        }
    }
}

Helping in Lock-Free Structures

Double-Width CAS (DWCAS)

Standard CAS operates on a single word (64 bits on modern systems). Double-Width CAS (DWCAS) atomically operates on two words (128 bits), enabling powerful patterns:

Use Cases:

Tagged pointers — Pointer + version counter in one atomic operation
Double pointers — Two related pointers updated atomically
Pointer + ABA counter — The classic ABA solution

Hardware Support:

Architecture	Instruction	Notes
x86-64	CMPXCHG16B	Requires 16-byte aligned addresses
ARM64	CASP / LDXP+STXP	Pair instructions for 128-bit
POWER	Not native	Implemented with LL/SC sequences

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <atomic>
#include <cstdint>
 
// 128-bit structure (pointer + counter)
struct alignas(16) TaggedPtr {
    void* ptr;           // 64 bits
    uint64_t counter;    // 64 bits
    
    bool operator==(const TaggedPtr& other) const {
        return ptr == other.ptr && counter == other.counter;
    }
};
 
class ABAFreeStack {
    std::atomic<TaggedPtr> head;
    
public:
    void push(Node* node) {
        TaggedPtr old_head = head.load();
        TaggedPtr new_head;
        do {
            node->next = static_cast<Node*>(old_head.ptr);
            new_head.ptr = node;
            new_head.counter = old_head.counter + 1;
        } while (!head.compare_exchange_weak(old_head, new_head));
    }
    
    Node* pop() {
        TaggedPtr old_head = head.load();
        TaggedPtr new_head;
        do {
            if (old_head.ptr == nullptr) {
                return nullptr;
            }
            new_head.ptr = static_cast<Node*>(old_head.ptr)->next;
            new_head.counter = old_head.counter + 1;
        } while (!head.compare_exchange_weak(old_head, new_head));
        return static_cast<Node*>(old_head.ptr);
    }
};
 
// Verify 128-bit atomic is lock-free
static_assert(std::atomic<TaggedPtr>::is_always_lock_free);

Alignment Requirements

128-bit atomics typically require 16-byte alignment. Unaligned access may compile but cause runtime traps or correctness issues. Always use alignas(16) or equivalent alignment specifiers.

CAS Universality: The Theoretical Foundation

The Universality Result:

This turns out to be equivalent to saying CAS can implement any concurrent object. If you can solve consensus, you can build anything.

Practical Implications:

If you have CAS, you can implement:

Any lock-free data structure (stack, queue, list, tree...)
Any atomic read-modify-write operation
Any concurrent algorithm that's possible to implement lock-free

Conversely, primitives like test-and-set or fetch-and-add have finite consensus numbers and cannot implement certain lock-free data structures.

Consensus Hierarchy (Herlihy)
Consensus Number	Primitives	Can Implement
1	Atomic read, atomic write	Only sequential objects
2	Test-and-Set, Fetch-and-Add, Swap	2-thread lock-free objects
∞ (Universal)	Compare-and-Swap, LL/SC	Any n-thread lock-free object

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// In theory, you can make ANY object lock-free with CAS:
 
interface Operation<T> {
    apply(state: T): T;
}
 
class UniversalConstruction<T> {
    private state: AtomicReference<T>;
    
    constructor(initial: T) {
        this.state = new AtomicReference(initial);
    }
    
    // Apply ANY operation lock-free
    apply(op: Operation<T>): T {
        while (true) {
            const oldState = this.state.load();
            const newState = op.apply(oldState);  // Compute new state
            
            if (this.state.compareAndSet(oldState, newState)) {
                return newState;
            }
            // Contention; retry with fresh state
        }
    }
}
 
// Usage: Any operation becomes lock-free
const counter = new UniversalConstruction(0);
counter.apply({ apply: n => n + 1 });  // Increment
counter.apply({ apply: n => n * 2 });  // Double
counter.apply({ apply: n => Math.max(n, 100) });  // Max with 100
 
// In practice, specialized implementations are more efficient,
// but this proves CAS can do ANYTHING lock-free.

Theory Meets Practice

Summary: Mastering Compare-and-Swap

We've thoroughly explored the most important atomic primitive. Let's consolidate:

Key Takeaways

•CAS is conditional update — Atomically compare and swap only if the current value matches expected
•CAS loops enable optimistic concurrency — Read, compute, try to commit; retry on conflict
•Weak vs strong CAS — Use weak in loops (faster on ARM/POWER); use strong for single attempts
•ABA is a real threat — Values that cycle back can make CAS succeed when it shouldn't
•Tagged pointers solve ABA — Combining pointer + version counter makes ABA detectable
•Hazard pointers and EBR — Advanced techniques prevent memory reuse during access
•Lock-free data structures — Treiber Stack, Michael-Scott Queue, and many more built on CAS
•DWCAS extends capability — 128-bit CAS enables pointer + counter atomicity
•CAS is universal — Any lock-free data structure can be built with CAS

What's Next:

Page Complete

3 / 4