System Design (LLD)Atomic Operations

Atomic Operations: The Indivisible Units of Concurrency

LevelAdvanced

Duration60 mins

TopicAtomic Operations

2 / 4

Atomic Primitives

The Building Blocks of Concurrency

With our understanding of what atomicity means, we now turn to the concrete atomic primitives that programming languages and hardware provide. These primitives are the vocabulary of concurrent programming—the individual operations from which all correct concurrent algorithms are constructed.

Atomic primitives are not all created equal. They differ in:

Power: What problems can they solve?
Cost: How expensive are they in terms of CPU cycles?
Universality: Can they simulate other primitives?
Hardware support: Are they directly mapped to CPU instructions?

Understanding this taxonomy enables you to choose the right primitive for each situation—balancing correctness, performance, and simplicity.

What You Will Learn

This page covers the complete spectrum of atomic primitives: atomic loads/stores, fetch-and-X operations, exchange, compare-and-swap, and their variants. You'll learn when to use each, their performance characteristics, and how they map to hardware instructions.

Atomic Load and Store

The most fundamental atomic operations are load (read) and store (write). While they may seem trivial, they provide critical guarantees that regular memory access does not.

Atomic Load

An atomic load reads the entire value atomically. No other thread can observe a partially updated value.

value = atomicVar.load();

Guarantees:

The read is indivisible: even for multi-word values, you get a consistent snapshot
No torn reads occur
With appropriate memory ordering, you see all writes that "happened before" the store that set this value

Atomic Store

An atomic store writes the entire value atomically. Other threads see either the old value or the new value—never a mix.

atomicVar.store(newValue);

Guarantees:

The write is indivisible: other threads never see partial updates
With appropriate memory ordering, writes before this store are visible to threads that subsequently load the new value

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Java
AtomicReference<Config> config = new AtomicReference<>(initialConfig);
Config current = config.get();      // Atomic load
config.set(newConfig);              // Atomic store
 
// C++11
std::atomic<Config*> config{initialConfig};
Config* current = config.load();    // Atomic load (default: seq_cst)
config.store(newConfig);            // Atomic store (default: seq_cst)
 
// With explicit memory ordering:
Config* current = config.load(std::memory_order_acquire);
config.store(newConfig, std::memory_order_release);
 
// Rust
use std::sync::atomic::{AtomicPtr, Ordering};
let config: AtomicPtr<Config> = AtomicPtr::new(initial);
let current = config.load(Ordering::Acquire);
config.store(new_config, Ordering::Release);
 
// Go
import "sync/atomic"
var config atomic.Pointer[Config]
current := config.Load()
config.Store(newConfig)

Use Cases for Atomic Load/Store

Configuration Publishing: A writer thread updates a config object; reader threads observe the latest config.
Flag Signaling: One thread sets a flag to signal completion; other threads poll the flag.
Lazy Initialization: Store a computed value once; subsequent reads return the initialized value.
Single-Writer Patterns: When only one thread modifies data and many threads read, atomic load/store suffices.

Load/Store Are NOT Sufficient for RMW

Atomic load and store cannot implement safe increment, decrement, or conditional update. Even reading a value with load(), computing, and writing with store() creates a window where other threads can interleave. For read-modify-write scenarios, you need fetch-and-X or compare-and-swap.

Atomic Load/Store Memory Ordering Options
Ordering	Load Behavior	Store Behavior	Use Case
Relaxed	No ordering guarantees	No ordering guarantees	Statistics counters where order doesn't matter
Acquire	Subsequent reads/writes not moved before	N/A (use Release for store)	Reading shared state after synchronization
Release	N/A (use Acquire for load)	Prior writes not moved after	Publishing results for other threads
Seq_Cst	Total order with all seq_cst operations	Total order with all seq_cst operations	When in doubt; simplest mental model

Fetch-and-X Operations

Fetch-and-X operations atomically read a value, perform a computation, and write the result—returning the original value before modification. These are the workhorses of concurrent counter implementation.

The Fetch-and-X Family:

Operation	Semantics	Use Case
`fetch_add`	`old = p; p += v; return old`	Counters, sequence numbers
`fetch_sub`	`old = p; p -= v; return old`	Reference counting, resource allocation
`fetch_or`	`old = p; p	= v; return old`
`fetch_and`	`old = p; p &= v; return old`	Clearing flag bits
`fetch_xor`	`old = p; p ^= v; return old`	Toggling flag bits
`fetch_nand`	`old = p; p = ~(*p & v); return old`	Rare; specialized bit manipulation

Each operation is atomic from the perspective of all concurrent threads. If 100 threads each call fetch_add(1) on the same variable initialized to 0, the variable will be exactly 100 afterward, and each thread receives a unique sequence number from 0 to 99.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
// THREAD-SAFE COUNTER
class AtomicCounter {
    private value: AtomicInteger = new AtomicInteger(0);
    
    // Returns the count BEFORE incrementing (useful for sequence numbers)
    getAndIncrement(): number {
        return this.value.fetchAdd(1);
    }
    
    // Returns the count AFTER incrementing (useful for total counts)
    incrementAndGet(): number {
        return this.value.fetchAdd(1) + 1;
    }
    
    // Add arbitrary amount, return previous value
    getAndAdd(delta: number): number {
        return this.value.fetchAdd(delta);
    }
}
 
// SEQUENCE NUMBER GENERATOR
class SequenceGenerator {
    private sequence: AtomicLong = new AtomicLong(0);
    
    nextId(): number {
        // Each caller gets a unique ID, regardless of concurrency
        return this.sequence.fetchAdd(1);
    }
}
 
// REFERENCE COUNTING
class RefCountedHandle<T> {
    private refCount: AtomicInteger;
    private resource: T;
    
    constructor(resource: T) {
        this.resource = resource;
        this.refCount = new AtomicInteger(1);
    }
    
    acquire(): RefCountedHandle<T> {
        this.refCount.fetchAdd(1);
        return this;
    }
    
    release(): void {
        // fetch_sub returns the OLD value
        const oldCount = this.refCount.fetchSub(1);
        if (oldCount === 1) {
            // We decremented from 1 to 0; we're the last holder
            this.cleanup();
        }
    }
    
    private cleanup(): void {
        // Release the underlying resource
        console.log("Resource released");
    }
}
 
// BIT FLAGS
class AtomicFlags {
    private flags: AtomicInteger = new AtomicInteger(0);
    
    static readonly FLAG_INITIALIZED = 0b0001;
    static readonly FLAG_RUNNING     = 0b0010;
    static readonly FLAG_SHUTDOWN    = 0b0100;
    
    setFlag(flag: number): number {
        // Set bit(s), return previous state
        return this.flags.fetchOr(flag);
    }
    
    clearFlag(flag: number): number {
        // Clear bit(s), return previous state  
        return this.flags.fetchAnd(~flag);
    }
    
    toggleFlag(flag: number): number {
        // Toggle bit(s), return previous state
        return this.flags.fetchXor(flag);
    }
    
    hasFlag(flag: number): boolean {
        return (this.flags.load() & flag) !== 0;
    }
}

Why Return the Old Value?

Fetch-and-X operations return the value BEFORE modification. This is crucial for correctness. In reference counting, you need to know if the OLD count was 1 (meaning you just decremented to 0). If it returned the NEW value, you'd need a separate check, creating a race condition.

Hardware Support

Fetch-and-X operations map directly to CPU instructions:

Operation	x86-64 Instruction	ARM64 Instruction
fetch_add	`LOCK XADD`	`LDADD` / `LDSET`
fetch_sub	`LOCK XADD` (negative)	`LDADD` (negative)
fetch_or	`LOCK OR` + loop*	`LDSET`
fetch_and	`LOCK AND` + loop*	`LDCLR`

*On older x86, bitwise operations may compile to CAS loops. Modern x86 has LOCK OR etc. ARM64 has dedicated instructions for all operations.

Exchange (Swap)

Exchange (also called swap) atomically replaces a value with a new value and returns the old value:

exchange(address, new_value):
    atomic {
        old = *address
        *address = new_value
        return old
    }

This is conceptually simpler than compare-and-swap because it succeeds unconditionally—there's no comparison. You always set the new value, regardless of the old value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
// SIMPLE SPINLOCK (non-fair, minimal)
class SpinLock {
    private locked: AtomicBoolean = new AtomicBoolean(false);
    
    lock(): void {
        // Try to exchange false → true
        // If we get back false, we acquired the lock
        // If we get back true, someone else has it; retry
        while (this.locked.exchange(true)) {
            // Spin until we acquire
            // Could add yielding or backoff here
        }
    }
    
    unlock(): void {
        this.locked.store(false);
    }
}
 
// ATOMIC VALUE REPLACEMENT
class AtomicBox<T> {
    private value: AtomicReference<T>;
    
    constructor(initial: T) {
        this.value = new AtomicReference(initial);
    }
    
    // Replace the value, return the previous value
    getAndSet(newValue: T): T {
        return this.value.exchange(newValue);
    }
}
 
// WORK STEALING
class WorkDeque {
    private head: AtomicReference<Node | null>;
    
    // Steal from the front (used by other threads)
    steal(): Work | null {
        // Atomically take the head
        const node = this.head.exchange(null);
        if (node !== null) {
            // Got work! Restore the rest of the queue
            this.head.store(node.next);
            return node.work;
        }
        return null;
    }
}
 
// TOKEN PASSING
class TokenRing {
    private token: AtomicReference<Token | null>;
    
    // Take the token if available
    acquireToken(): Token | null {
        return this.token.exchange(null);
    }
    
    // Return the token
    releaseToken(token: Token): void {
        const old = this.token.exchange(token);
        if (old !== null) {
            throw new Error("Duplicate token in ring!");
        }
    }
}

Exchange vs Compare-and-Swap

•Exchange always succeeds — No retry loop needed; the new value is always written
•CAS can fail — If the current value doesn't match expected, CAS fails and must retry
•Exchange can't do conditional updates — You can't say "only update if the value is X"
•CAS enables conditional logic — Update only if no other thread modified since you read
•Exchange is conceptually simpler — Less room for bugs, but less flexible

Hardware Implementation

x86-64: XCHG instruction is implicitly locked (atomic) ARM64: SWPAL (swap with acquire-release semantics) or LDXR/STXR loop

Exchange is well-supported on all modern architectures and is typically as fast as fetch_add.

Test-and-Set

Test-and-Set (TAS) is historically one of the first synchronization primitives. It atomically sets a memory location to true (or 1) and returns the previous value:

test_and_set(address):
    atomic {
        old = *address
        *address = true
        return old
    }

TAS is essentially exchange(address, true) with a fixed new value. It's a building block for the simplest possible lock.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// TEST-AND-SET LOCK (TAS Lock)
class TASLock {
    private state: AtomicBoolean = new AtomicBoolean(false);
    
    lock(): void {
        // testAndSet returns the OLD value
        // If old value was false, we acquired the lock
        // If old value was true, someone else has it
        while (this.state.testAndSet()) {
            // Spin until we get false back
        }
    }
    
    unlock(): void {
        this.state.store(false);
    }
}
 
// EQUIVALENT USING EXCHANGE
class TASLockWithExchange {
    private state: AtomicBoolean = new AtomicBoolean(false);
    
    lock(): void {
        while (this.state.exchange(true)) {
            // Same behavior as testAndSet
        }
    }
    
    unlock(): void {
        this.state.store(false);
    }
}

Problems with Pure TAS Locks

While TAS locks are simple, they have serious performance problems:

Cache Line Thrashing: Every testAndSet() call writes to the lock, even when the lock is held. On multi-core systems, this bounces the cache line between CPUs constantly.
Bus Contention: Locked instructions generate memory bus traffic, even when they "fail" to acquire.
No Fairness: There's no guarantee that a waiting thread will ever acquire the lock. Under contention, some threads may starve.

The TTAS Optimization

Test-and-Test-and-Set (TTAS) improves on TAS by first spinning on a regular read, only attempting TAS when the lock appears free:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// TEST-AND-TEST-AND-SET LOCK (TTAS Lock)
class TTASLock {
    private state: AtomicBoolean = new AtomicBoolean(false);
    
    lock(): void {
        while (true) {
            // First test: spin on a regular load (cheap, cache-friendly)
            while (this.state.load()) {
                // Lock is held; spin on local cache
                // No bus traffic as long as cache is valid
            }
            
            // Lock appears free; try to acquire with TAS
            if (!this.state.testAndSet()) {
                // Got the lock!
                return;
            }
            // Someone else got it first; back to spinning
        }
    }
    
    unlock(): void {
        this.state.store(false);
    }
}

TTAS Performance Improvement

TTAS dramatically reduces bus traffic compared to TAS. The inner loop spins on cached data without generating memory traffic. Only when the lock is released (invalidating the cache line) do waiting threads retry acquisition. This can improve performance by 10x or more under contention.

Atomic Pointer Operations

Pointers are fundamental to data structure implementation, and atomic pointer operations are essential for lock-free data structures.

Atomic Pointer Types

Most languages provide atomic wrapper types for pointers:

Language	Atomic Pointer Type
Java	`AtomicReference<T>`
C++	`std::atomic<T*>`
Rust	`AtomicPtr<T>`
Go	`atomic.Pointer[T]`
C#	Uses `Interlocked` methods with regular references

Atomic pointers support all the standard operations: load, store, exchange, and compare-and-swap.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
// SAFE PUBLICATION PATTERN
// Problem: Publishing a newly constructed object for other threads
 
interface Config {
    timeout: number;
    maxRetries: number;
}
 
class ConfigManager {
    private config: AtomicReference<Config>;
    
    constructor() {
        this.config = new AtomicReference({
            timeout: 1000,
            maxRetries: 3
        });
    }
    
    // Safe: Other threads will see either old or new config, never garbage
    updateConfig(newConfig: Config): void {
        this.config.store(newConfig);
    }
    
    // Safe: Will read a complete, consistent Config object
    getConfig(): Config {
        return this.config.load();
    }
}
 
// LOCK-FREE STACK (Treiber Stack)
class LockFreeStack<T> {
    private head: AtomicReference<Node<T> | null> = new AtomicReference(null);
    
    push(value: T): void {
        const newNode = new Node(value);
        while (true) {
            const oldHead = this.head.load();
            newNode.next = oldHead;
            // CAS: only update if head hasn't changed
            if (this.head.compareAndSet(oldHead, newNode)) {
                return;
            }
            // Head changed; retry
        }
    }
    
    pop(): T | null {
        while (true) {
            const oldHead = this.head.load();
            if (oldHead === null) return null;
            const newHead = oldHead.next;
            // CAS: only update if head hasn't changed
            if (this.head.compareAndSet(oldHead, newHead)) {
                return oldHead.value;
            }
            // Head changed; retry
        }
    }
}
 
class Node<T> {
    value: T;
    next: Node<T> | null = null;
    
    constructor(value: T) {
        this.value = value;
    }
}

The ABA Problem

Atomic pointer CAS is vulnerable to the ABA problem: a pointer changes from A → B → A. CAS sees the expected value A and succeeds, but the underlying object may have been deallocated and reallocated! Solutions include hazard pointers, epoch-based reclamation, or tagged pointers.

Tagged Pointers

One mitigation for the ABA problem is to combine a version counter with the pointer. Since pointers typically don't use all bits (due to alignment), we can pack a counter into the unused bits:

// Using the low bits (pointers are 8-byte aligned)
struct TaggedPtr {
    uintptr_t ptr : 61;  // Actual pointer (high bits)
    uintptr_t tag : 3;   // Version counter (low bits)
};

// Using atomic double-width CAS
struct TaggedPtr {
    void* ptr;           // 64-bit pointer
    uint64_t version;    // 64-bit counter
};  // Total: 128 bits, atomically updated with CMPXCHG16B on x86-64

Each modification increments the tag/version. CAS compares both pointer and tag, preventing ABA confusion.

Atomic Flag

Atomic Flag is the most minimal atomic type—it only supports test-and-set and clear operations. Unlike other atomic types, atomic_flag is guaranteed to be lock-free on all platforms.

// C++
std::atomic_flag flag = ATOMIC_FLAG_INIT;

// Returns previous value (false = was clear, true = was set)
bool was_set = flag.test_and_set();

// Unconditionally clear the flag
flag.clear();

This is the only atomic type in C++ that is guaranteed lock-free. Other atomic types (atomic<int>, etc.) may fall back to locks on platforms without hardware support.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <atomic>
 
class Spinlock {
    std::atomic_flag flag = ATOMIC_FLAG_INIT;
    
public:
    void lock() {
        // Keep setting the flag until we find it was clear
        while (flag.test_and_set(std::memory_order_acquire)) {
            // Spin
            // Optional: add pause/yield for power efficiency
            #if defined(__x86_64__)
            __builtin_ia32_pause();  // CPU hint: we're spinning
            #endif
        }
    }
    
    void unlock() {
        flag.clear(std::memory_order_release);
    }
};
 
// Usage
Spinlock lock;
 
void critical_section() {
    lock.lock();
    // ... protected code ...
    lock.unlock();
}
 
// RAII wrapper for exception safety
class SpinGuard {
    Spinlock& lock_;
public:
    SpinGuard(Spinlock& lock) : lock_(lock) { lock_.lock(); }
    ~SpinGuard() { lock_.unlock(); }
    SpinGuard(const SpinGuard&) = delete;
    SpinGuard& operator=(const SpinGuard&) = delete;
};

When to Use Atomic Flag

•Implementing spinlocks — The minimal, guaranteed lock-free primitive
•One-time initialization guards — Set once, never cleared
•Simple synchronization signals — When you only need "done" or "not done"
•Platform portability — When you need absolute lock-free guarantees

Atomic Flag Limitations

Atomic flag doesn't support reading the value without modifying it (before C++20). You can't simply check if the flag is set. C++20 added test() for read-only access. If you need to read without modifying, use atomic<bool> instead, accepting that it may not be lock-free on exotic platforms.

Choosing the Right Atomic Primitive

With multiple atomic primitives available, how do you choose the right one? The decision depends on what you're trying to accomplish:

Decision Framework:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Do you need to UPDATE the value?
│
├─ NO → Use atomic LOAD
│       (Just reading current value)
│
└─ YES → Is the update UNCONDITIONAL?
         │
         ├─ YES → Is it a simple write?
         │        │
         │        ├─ YES → Use atomic STORE
         │        │
         │        └─ NO (need old value?) → Use EXCHANGE
         │
         └─ NO (update depends on current value)
                  │
                  ├─ Is it arithmetic? (+, -)
                  │  │
                  │  └─ YES → Use FETCH_ADD or FETCH_SUB
                  │
                  ├─ Is it bitwise? (|, &, ^)
                  │  │
                  │  └─ YES → Use FETCH_OR, FETCH_AND, FETCH_XOR
                  │
                  └─ Is it conditional? (if value == X, then set to Y)
                     │
                     └─ YES → Use COMPARE_AND_SWAP (CAS)

Atomic Primitive Quick Reference
Scenario	Best Primitive	Why
Simple counter increment	fetch_add(1)	Single instruction, returns old value for sequence numbers
Reference counting decrement	fetch_sub(1)	Returns old value to detect reaching zero
Setting configuration	store()	No old value needed, unconditional update
Reading shared state	load()	Atomic snapshot of current value
Simple spinlock	exchange(true)	Unconditional set, returns old value to check acquisition
Lock-free stack push/pop	compare_exchange	Conditional update only if state unchanged
Setting flag bits	fetch_or(bits)	Set specific bits, returns old state
Clearing flag bits	fetch_and(~bits)	Clear specific bits, returns old state

Performance Consideration

More specific primitives are usually faster than CAS. If you need a counter, use fetch_add—don't implement an increment loop with CAS. The specialized instructions are optimized in hardware. Use CAS only when you need its conditional update capability.

Summary: The Atomic Toolkit

We've surveyed the complete spectrum of atomic primitives. Let's consolidate the key concepts:

Key Takeaways

•Load/Store are the foundation — Atomic reads and writes ensure no torn values and enable safe publication
•Fetch-and-X are workhorses — Atomic arithmetic and bitwise operations enable efficient counters and flags
•Exchange swaps unconditionally — Useful for locks and replacing values
•Test-and-Set is minimal — The simplest synchronization primitive; basis for spinlocks
•Atomic pointers enable data structures — Lock-free stacks, queues, and more
•Atomic flag guarantees lock-free — The only primitive guaranteed lock-free on all platforms
•Choose the most specific primitive — fetch_add beats a CAS loop; use CAS only when needed
•All primitives build toward CAS — The universal primitive that enables all lock-free algorithms

What's Next:

The next page focuses on the most powerful atomic primitive: Compare-and-Swap (CAS). We'll explore its mechanics, implementation patterns, the notorious ABA problem, and how CAS enables the entire field of lock-free programming.

Page Complete

You now have a comprehensive understanding of atomic primitives—the building blocks from which all concurrent algorithms are constructed. From simple load/store to complex fetch-and-X operations, you know when and how to use each primitive effectively.

2 / 4

Loading learning content...

System Design (LLD)Atomic Operations

Atomic Operations: The Indivisible Units of Concurrency

LevelAdvanced

Duration60 mins

TopicAtomic Operations

2 / 4

Atomic Primitives

The Building Blocks of Concurrency

Atomic primitives are not all created equal. They differ in:

Power: What problems can they solve?
Cost: How expensive are they in terms of CPU cycles?
Universality: Can they simulate other primitives?
Hardware support: Are they directly mapped to CPU instructions?

Understanding this taxonomy enables you to choose the right primitive for each situation—balancing correctness, performance, and simplicity.

What You Will Learn

Atomic Load and Store

The most fundamental atomic operations are load (read) and store (write). While they may seem trivial, they provide critical guarantees that regular memory access does not.

Atomic Load

An atomic load reads the entire value atomically. No other thread can observe a partially updated value.

value = atomicVar.load();

Guarantees:

The read is indivisible: even for multi-word values, you get a consistent snapshot
No torn reads occur
With appropriate memory ordering, you see all writes that "happened before" the store that set this value

Atomic Store

An atomic store writes the entire value atomically. Other threads see either the old value or the new value—never a mix.

atomicVar.store(newValue);

Guarantees:

The write is indivisible: other threads never see partial updates
With appropriate memory ordering, writes before this store are visible to threads that subsequently load the new value

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Java
AtomicReference<Config> config = new AtomicReference<>(initialConfig);
Config current = config.get();      // Atomic load
config.set(newConfig);              // Atomic store
 
// C++11
std::atomic<Config*> config{initialConfig};
Config* current = config.load();    // Atomic load (default: seq_cst)
config.store(newConfig);            // Atomic store (default: seq_cst)
 
// With explicit memory ordering:
Config* current = config.load(std::memory_order_acquire);
config.store(newConfig, std::memory_order_release);
 
// Rust
use std::sync::atomic::{AtomicPtr, Ordering};
let config: AtomicPtr<Config> = AtomicPtr::new(initial);
let current = config.load(Ordering::Acquire);
config.store(new_config, Ordering::Release);
 
// Go
import "sync/atomic"
var config atomic.Pointer[Config]
current := config.Load()
config.Store(newConfig)

Use Cases for Atomic Load/Store

Configuration Publishing: A writer thread updates a config object; reader threads observe the latest config.
Flag Signaling: One thread sets a flag to signal completion; other threads poll the flag.
Lazy Initialization: Store a computed value once; subsequent reads return the initialized value.
Single-Writer Patterns: When only one thread modifies data and many threads read, atomic load/store suffices.

Load/Store Are NOT Sufficient for RMW

Atomic Load/Store Memory Ordering Options
Ordering	Load Behavior	Store Behavior	Use Case
Relaxed	No ordering guarantees	No ordering guarantees	Statistics counters where order doesn't matter
Acquire	Subsequent reads/writes not moved before	N/A (use Release for store)	Reading shared state after synchronization
Release	N/A (use Acquire for load)	Prior writes not moved after	Publishing results for other threads
Seq_Cst	Total order with all seq_cst operations	Total order with all seq_cst operations	When in doubt; simplest mental model

Fetch-and-X Operations

The Fetch-and-X Family:

Operation	Semantics	Use Case
`fetch_add`	`old = p; p += v; return old`	Counters, sequence numbers
`fetch_sub`	`old = p; p -= v; return old`	Reference counting, resource allocation
`fetch_or`	`old = p; p	= v; return old`
`fetch_and`	`old = p; p &= v; return old`	Clearing flag bits
`fetch_xor`	`old = p; p ^= v; return old`	Toggling flag bits
`fetch_nand`	`old = p; p = ~(*p & v); return old`	Rare; specialized bit manipulation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
// THREAD-SAFE COUNTER
class AtomicCounter {
    private value: AtomicInteger = new AtomicInteger(0);
    
    // Returns the count BEFORE incrementing (useful for sequence numbers)
    getAndIncrement(): number {
        return this.value.fetchAdd(1);
    }
    
    // Returns the count AFTER incrementing (useful for total counts)
    incrementAndGet(): number {
        return this.value.fetchAdd(1) + 1;
    }
    
    // Add arbitrary amount, return previous value
    getAndAdd(delta: number): number {
        return this.value.fetchAdd(delta);
    }
}
 
// SEQUENCE NUMBER GENERATOR
class SequenceGenerator {
    private sequence: AtomicLong = new AtomicLong(0);
    
    nextId(): number {
        // Each caller gets a unique ID, regardless of concurrency
        return this.sequence.fetchAdd(1);
    }
}
 
// REFERENCE COUNTING
class RefCountedHandle<T> {
    private refCount: AtomicInteger;
    private resource: T;
    
    constructor(resource: T) {
        this.resource = resource;
        this.refCount = new AtomicInteger(1);
    }
    
    acquire(): RefCountedHandle<T> {
        this.refCount.fetchAdd(1);
        return this;
    }
    
    release(): void {
        // fetch_sub returns the OLD value
        const oldCount = this.refCount.fetchSub(1);
        if (oldCount === 1) {
            // We decremented from 1 to 0; we're the last holder
            this.cleanup();
        }
    }
    
    private cleanup(): void {
        // Release the underlying resource
        console.log("Resource released");
    }
}
 
// BIT FLAGS
class AtomicFlags {
    private flags: AtomicInteger = new AtomicInteger(0);
    
    static readonly FLAG_INITIALIZED = 0b0001;
    static readonly FLAG_RUNNING     = 0b0010;
    static readonly FLAG_SHUTDOWN    = 0b0100;
    
    setFlag(flag: number): number {
        // Set bit(s), return previous state
        return this.flags.fetchOr(flag);
    }
    
    clearFlag(flag: number): number {
        // Clear bit(s), return previous state  
        return this.flags.fetchAnd(~flag);
    }
    
    toggleFlag(flag: number): number {
        // Toggle bit(s), return previous state
        return this.flags.fetchXor(flag);
    }
    
    hasFlag(flag: number): boolean {
        return (this.flags.load() & flag) !== 0;
    }
}

Why Return the Old Value?

Hardware Support

Fetch-and-X operations map directly to CPU instructions:

Operation	x86-64 Instruction	ARM64 Instruction
fetch_add	`LOCK XADD`	`LDADD` / `LDSET`
fetch_sub	`LOCK XADD` (negative)	`LDADD` (negative)
fetch_or	`LOCK OR` + loop*	`LDSET`
fetch_and	`LOCK AND` + loop*	`LDCLR`

*On older x86, bitwise operations may compile to CAS loops. Modern x86 has LOCK OR etc. ARM64 has dedicated instructions for all operations.

Exchange (Swap)

Exchange (also called swap) atomically replaces a value with a new value and returns the old value:

exchange(address, new_value):
    atomic {
        old = *address
        *address = new_value
        return old
    }

This is conceptually simpler than compare-and-swap because it succeeds unconditionally—there's no comparison. You always set the new value, regardless of the old value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
// SIMPLE SPINLOCK (non-fair, minimal)
class SpinLock {
    private locked: AtomicBoolean = new AtomicBoolean(false);
    
    lock(): void {
        // Try to exchange false → true
        // If we get back false, we acquired the lock
        // If we get back true, someone else has it; retry
        while (this.locked.exchange(true)) {
            // Spin until we acquire
            // Could add yielding or backoff here
        }
    }
    
    unlock(): void {
        this.locked.store(false);
    }
}
 
// ATOMIC VALUE REPLACEMENT
class AtomicBox<T> {
    private value: AtomicReference<T>;
    
    constructor(initial: T) {
        this.value = new AtomicReference(initial);
    }
    
    // Replace the value, return the previous value
    getAndSet(newValue: T): T {
        return this.value.exchange(newValue);
    }
}
 
// WORK STEALING
class WorkDeque {
    private head: AtomicReference<Node | null>;
    
    // Steal from the front (used by other threads)
    steal(): Work | null {
        // Atomically take the head
        const node = this.head.exchange(null);
        if (node !== null) {
            // Got work! Restore the rest of the queue
            this.head.store(node.next);
            return node.work;
        }
        return null;
    }
}
 
// TOKEN PASSING
class TokenRing {
    private token: AtomicReference<Token | null>;
    
    // Take the token if available
    acquireToken(): Token | null {
        return this.token.exchange(null);
    }
    
    // Return the token
    releaseToken(token: Token): void {
        const old = this.token.exchange(token);
        if (old !== null) {
            throw new Error("Duplicate token in ring!");
        }
    }
}

Exchange vs Compare-and-Swap

•Exchange always succeeds — No retry loop needed; the new value is always written
•CAS can fail — If the current value doesn't match expected, CAS fails and must retry
•Exchange can't do conditional updates — You can't say "only update if the value is X"
•CAS enables conditional logic — Update only if no other thread modified since you read
•Exchange is conceptually simpler — Less room for bugs, but less flexible

Hardware Implementation

x86-64: XCHG instruction is implicitly locked (atomic) ARM64: SWPAL (swap with acquire-release semantics) or LDXR/STXR loop

Exchange is well-supported on all modern architectures and is typically as fast as fetch_add.

Test-and-Set

Test-and-Set (TAS) is historically one of the first synchronization primitives. It atomically sets a memory location to true (or 1) and returns the previous value:

test_and_set(address):
    atomic {
        old = *address
        *address = true
        return old
    }

TAS is essentially exchange(address, true) with a fixed new value. It's a building block for the simplest possible lock.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// TEST-AND-SET LOCK (TAS Lock)
class TASLock {
    private state: AtomicBoolean = new AtomicBoolean(false);
    
    lock(): void {
        // testAndSet returns the OLD value
        // If old value was false, we acquired the lock
        // If old value was true, someone else has it
        while (this.state.testAndSet()) {
            // Spin until we get false back
        }
    }
    
    unlock(): void {
        this.state.store(false);
    }
}
 
// EQUIVALENT USING EXCHANGE
class TASLockWithExchange {
    private state: AtomicBoolean = new AtomicBoolean(false);
    
    lock(): void {
        while (this.state.exchange(true)) {
            // Same behavior as testAndSet
        }
    }
    
    unlock(): void {
        this.state.store(false);
    }
}

Problems with Pure TAS Locks

While TAS locks are simple, they have serious performance problems:

Cache Line Thrashing: Every testAndSet() call writes to the lock, even when the lock is held. On multi-core systems, this bounces the cache line between CPUs constantly.
Bus Contention: Locked instructions generate memory bus traffic, even when they "fail" to acquire.
No Fairness: There's no guarantee that a waiting thread will ever acquire the lock. Under contention, some threads may starve.

The TTAS Optimization

Test-and-Test-and-Set (TTAS) improves on TAS by first spinning on a regular read, only attempting TAS when the lock appears free:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// TEST-AND-TEST-AND-SET LOCK (TTAS Lock)
class TTASLock {
    private state: AtomicBoolean = new AtomicBoolean(false);
    
    lock(): void {
        while (true) {
            // First test: spin on a regular load (cheap, cache-friendly)
            while (this.state.load()) {
                // Lock is held; spin on local cache
                // No bus traffic as long as cache is valid
            }
            
            // Lock appears free; try to acquire with TAS
            if (!this.state.testAndSet()) {
                // Got the lock!
                return;
            }
            // Someone else got it first; back to spinning
        }
    }
    
    unlock(): void {
        this.state.store(false);
    }
}

TTAS Performance Improvement

Atomic Pointer Operations

Pointers are fundamental to data structure implementation, and atomic pointer operations are essential for lock-free data structures.

Atomic Pointer Types

Most languages provide atomic wrapper types for pointers:

Language	Atomic Pointer Type
Java	`AtomicReference<T>`
C++	`std::atomic<T*>`
Rust	`AtomicPtr<T>`
Go	`atomic.Pointer[T]`
C#	Uses `Interlocked` methods with regular references

Atomic pointers support all the standard operations: load, store, exchange, and compare-and-swap.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
// SAFE PUBLICATION PATTERN
// Problem: Publishing a newly constructed object for other threads
 
interface Config {
    timeout: number;
    maxRetries: number;
}
 
class ConfigManager {
    private config: AtomicReference<Config>;
    
    constructor() {
        this.config = new AtomicReference({
            timeout: 1000,
            maxRetries: 3
        });
    }
    
    // Safe: Other threads will see either old or new config, never garbage
    updateConfig(newConfig: Config): void {
        this.config.store(newConfig);
    }
    
    // Safe: Will read a complete, consistent Config object
    getConfig(): Config {
        return this.config.load();
    }
}
 
// LOCK-FREE STACK (Treiber Stack)
class LockFreeStack<T> {
    private head: AtomicReference<Node<T> | null> = new AtomicReference(null);
    
    push(value: T): void {
        const newNode = new Node(value);
        while (true) {
            const oldHead = this.head.load();
            newNode.next = oldHead;
            // CAS: only update if head hasn't changed
            if (this.head.compareAndSet(oldHead, newNode)) {
                return;
            }
            // Head changed; retry
        }
    }
    
    pop(): T | null {
        while (true) {
            const oldHead = this.head.load();
            if (oldHead === null) return null;
            const newHead = oldHead.next;
            // CAS: only update if head hasn't changed
            if (this.head.compareAndSet(oldHead, newHead)) {
                return oldHead.value;
            }
            // Head changed; retry
        }
    }
}
 
class Node<T> {
    value: T;
    next: Node<T> | null = null;
    
    constructor(value: T) {
        this.value = value;
    }
}

The ABA Problem

Tagged Pointers

One mitigation for the ABA problem is to combine a version counter with the pointer. Since pointers typically don't use all bits (due to alignment), we can pack a counter into the unused bits:

// Using the low bits (pointers are 8-byte aligned)
struct TaggedPtr {
    uintptr_t ptr : 61;  // Actual pointer (high bits)
    uintptr_t tag : 3;   // Version counter (low bits)
};

// Using atomic double-width CAS
struct TaggedPtr {
    void* ptr;           // 64-bit pointer
    uint64_t version;    // 64-bit counter
};  // Total: 128 bits, atomically updated with CMPXCHG16B on x86-64

Each modification increments the tag/version. CAS compares both pointer and tag, preventing ABA confusion.

Atomic Flag

Atomic Flag is the most minimal atomic type—it only supports test-and-set and clear operations. Unlike other atomic types, atomic_flag is guaranteed to be lock-free on all platforms.

// C++
std::atomic_flag flag = ATOMIC_FLAG_INIT;

// Returns previous value (false = was clear, true = was set)
bool was_set = flag.test_and_set();

// Unconditionally clear the flag
flag.clear();

This is the only atomic type in C++ that is guaranteed lock-free. Other atomic types (atomic<int>, etc.) may fall back to locks on platforms without hardware support.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <atomic>
 
class Spinlock {
    std::atomic_flag flag = ATOMIC_FLAG_INIT;
    
public:
    void lock() {
        // Keep setting the flag until we find it was clear
        while (flag.test_and_set(std::memory_order_acquire)) {
            // Spin
            // Optional: add pause/yield for power efficiency
            #if defined(__x86_64__)
            __builtin_ia32_pause();  // CPU hint: we're spinning
            #endif
        }
    }
    
    void unlock() {
        flag.clear(std::memory_order_release);
    }
};
 
// Usage
Spinlock lock;
 
void critical_section() {
    lock.lock();
    // ... protected code ...
    lock.unlock();
}
 
// RAII wrapper for exception safety
class SpinGuard {
    Spinlock& lock_;
public:
    SpinGuard(Spinlock& lock) : lock_(lock) { lock_.lock(); }
    ~SpinGuard() { lock_.unlock(); }
    SpinGuard(const SpinGuard&) = delete;
    SpinGuard& operator=(const SpinGuard&) = delete;
};

When to Use Atomic Flag

•Implementing spinlocks — The minimal, guaranteed lock-free primitive
•One-time initialization guards — Set once, never cleared
•Simple synchronization signals — When you only need "done" or "not done"
•Platform portability — When you need absolute lock-free guarantees

Atomic Flag Limitations

Choosing the Right Atomic Primitive

With multiple atomic primitives available, how do you choose the right one? The decision depends on what you're trying to accomplish:

Decision Framework:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Do you need to UPDATE the value?
│
├─ NO → Use atomic LOAD
│       (Just reading current value)
│
└─ YES → Is the update UNCONDITIONAL?
         │
         ├─ YES → Is it a simple write?
         │        │
         │        ├─ YES → Use atomic STORE
         │        │
         │        └─ NO (need old value?) → Use EXCHANGE
         │
         └─ NO (update depends on current value)
                  │
                  ├─ Is it arithmetic? (+, -)
                  │  │
                  │  └─ YES → Use FETCH_ADD or FETCH_SUB
                  │
                  ├─ Is it bitwise? (|, &, ^)
                  │  │
                  │  └─ YES → Use FETCH_OR, FETCH_AND, FETCH_XOR
                  │
                  └─ Is it conditional? (if value == X, then set to Y)
                     │
                     └─ YES → Use COMPARE_AND_SWAP (CAS)

Atomic Primitive Quick Reference
Scenario	Best Primitive	Why
Simple counter increment	fetch_add(1)	Single instruction, returns old value for sequence numbers
Reference counting decrement	fetch_sub(1)	Returns old value to detect reaching zero
Setting configuration	store()	No old value needed, unconditional update
Reading shared state	load()	Atomic snapshot of current value
Simple spinlock	exchange(true)	Unconditional set, returns old value to check acquisition
Lock-free stack push/pop	compare_exchange	Conditional update only if state unchanged
Setting flag bits	fetch_or(bits)	Set specific bits, returns old state
Clearing flag bits	fetch_and(~bits)	Clear specific bits, returns old state

Performance Consideration

Summary: The Atomic Toolkit

We've surveyed the complete spectrum of atomic primitives. Let's consolidate the key concepts:

Key Takeaways

•Load/Store are the foundation — Atomic reads and writes ensure no torn values and enable safe publication
•Fetch-and-X are workhorses — Atomic arithmetic and bitwise operations enable efficient counters and flags
•Exchange swaps unconditionally — Useful for locks and replacing values
•Test-and-Set is minimal — The simplest synchronization primitive; basis for spinlocks
•Atomic pointers enable data structures — Lock-free stacks, queues, and more
•Atomic flag guarantees lock-free — The only primitive guaranteed lock-free on all platforms
•Choose the most specific primitive — fetch_add beats a CAS loop; use CAS only when needed
•All primitives build toward CAS — The universal primitive that enables all lock-free algorithms

What's Next:

Page Complete

2 / 4