Loading learning content...
With our understanding of what atomicity means, we now turn to the concrete atomic primitives that programming languages and hardware provide. These primitives are the vocabulary of concurrent programming—the individual operations from which all correct concurrent algorithms are constructed.
Atomic primitives are not all created equal. They differ in:
Understanding this taxonomy enables you to choose the right primitive for each situation—balancing correctness, performance, and simplicity.
This page covers the complete spectrum of atomic primitives: atomic loads/stores, fetch-and-X operations, exchange, compare-and-swap, and their variants. You'll learn when to use each, their performance characteristics, and how they map to hardware instructions.
The most fundamental atomic operations are load (read) and store (write). While they may seem trivial, they provide critical guarantees that regular memory access does not.
Atomic Load
An atomic load reads the entire value atomically. No other thread can observe a partially updated value.
value = atomicVar.load();
Guarantees:
Atomic Store
An atomic store writes the entire value atomically. Other threads see either the old value or the new value—never a mix.
atomicVar.store(newValue);
Guarantees:
12345678910111213141516171819202122232425
// JavaAtomicReference<Config> config = new AtomicReference<>(initialConfig);Config current = config.get(); // Atomic loadconfig.set(newConfig); // Atomic store // C++11std::atomic<Config*> config{initialConfig};Config* current = config.load(); // Atomic load (default: seq_cst)config.store(newConfig); // Atomic store (default: seq_cst) // With explicit memory ordering:Config* current = config.load(std::memory_order_acquire);config.store(newConfig, std::memory_order_release); // Rustuse std::sync::atomic::{AtomicPtr, Ordering};let config: AtomicPtr<Config> = AtomicPtr::new(initial);let current = config.load(Ordering::Acquire);config.store(new_config, Ordering::Release); // Goimport "sync/atomic"var config atomic.Pointer[Config]current := config.Load()config.Store(newConfig)Use Cases for Atomic Load/Store
Configuration Publishing: A writer thread updates a config object; reader threads observe the latest config.
Flag Signaling: One thread sets a flag to signal completion; other threads poll the flag.
Lazy Initialization: Store a computed value once; subsequent reads return the initialized value.
Single-Writer Patterns: When only one thread modifies data and many threads read, atomic load/store suffices.
Atomic load and store cannot implement safe increment, decrement, or conditional update. Even reading a value with load(), computing, and writing with store() creates a window where other threads can interleave. For read-modify-write scenarios, you need fetch-and-X or compare-and-swap.
| Ordering | Load Behavior | Store Behavior | Use Case |
|---|---|---|---|
| Relaxed | No ordering guarantees | No ordering guarantees | Statistics counters where order doesn't matter |
| Acquire | Subsequent reads/writes not moved before | N/A (use Release for store) | Reading shared state after synchronization |
| Release | N/A (use Acquire for load) | Prior writes not moved after | Publishing results for other threads |
| Seq_Cst | Total order with all seq_cst operations | Total order with all seq_cst operations | When in doubt; simplest mental model |
Fetch-and-X operations atomically read a value, perform a computation, and write the result—returning the original value before modification. These are the workhorses of concurrent counter implementation.
The Fetch-and-X Family:
| Operation | Semantics | Use Case |
|---|---|---|
fetch_add | old = *p; *p += v; return old | Counters, sequence numbers |
fetch_sub | old = *p; *p -= v; return old | Reference counting, resource allocation |
fetch_or | `old = *p; *p | = v; return old` |
fetch_and | old = *p; *p &= v; return old | Clearing flag bits |
fetch_xor | old = *p; *p ^= v; return old | Toggling flag bits |
fetch_nand | old = *p; *p = ~(*p & v); return old | Rare; specialized bit manipulation |
Each operation is atomic from the perspective of all concurrent threads. If 100 threads each call fetch_add(1) on the same variable initialized to 0, the variable will be exactly 100 afterward, and each thread receives a unique sequence number from 0 to 99.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
// THREAD-SAFE COUNTERclass AtomicCounter { private value: AtomicInteger = new AtomicInteger(0); // Returns the count BEFORE incrementing (useful for sequence numbers) getAndIncrement(): number { return this.value.fetchAdd(1); } // Returns the count AFTER incrementing (useful for total counts) incrementAndGet(): number { return this.value.fetchAdd(1) + 1; } // Add arbitrary amount, return previous value getAndAdd(delta: number): number { return this.value.fetchAdd(delta); }} // SEQUENCE NUMBER GENERATORclass SequenceGenerator { private sequence: AtomicLong = new AtomicLong(0); nextId(): number { // Each caller gets a unique ID, regardless of concurrency return this.sequence.fetchAdd(1); }} // REFERENCE COUNTINGclass RefCountedHandle<T> { private refCount: AtomicInteger; private resource: T; constructor(resource: T) { this.resource = resource; this.refCount = new AtomicInteger(1); } acquire(): RefCountedHandle<T> { this.refCount.fetchAdd(1); return this; } release(): void { // fetch_sub returns the OLD value const oldCount = this.refCount.fetchSub(1); if (oldCount === 1) { // We decremented from 1 to 0; we're the last holder this.cleanup(); } } private cleanup(): void { // Release the underlying resource console.log("Resource released"); }} // BIT FLAGSclass AtomicFlags { private flags: AtomicInteger = new AtomicInteger(0); static readonly FLAG_INITIALIZED = 0b0001; static readonly FLAG_RUNNING = 0b0010; static readonly FLAG_SHUTDOWN = 0b0100; setFlag(flag: number): number { // Set bit(s), return previous state return this.flags.fetchOr(flag); } clearFlag(flag: number): number { // Clear bit(s), return previous state return this.flags.fetchAnd(~flag); } toggleFlag(flag: number): number { // Toggle bit(s), return previous state return this.flags.fetchXor(flag); } hasFlag(flag: number): boolean { return (this.flags.load() & flag) !== 0; }}Fetch-and-X operations return the value BEFORE modification. This is crucial for correctness. In reference counting, you need to know if the OLD count was 1 (meaning you just decremented to 0). If it returned the NEW value, you'd need a separate check, creating a race condition.
Hardware Support
Fetch-and-X operations map directly to CPU instructions:
| Operation | x86-64 Instruction | ARM64 Instruction |
|---|---|---|
| fetch_add | LOCK XADD | LDADD / LDSET |
| fetch_sub | LOCK XADD (negative) | LDADD (negative) |
| fetch_or | LOCK OR + loop* | LDSET |
| fetch_and | LOCK AND + loop* | LDCLR |
*On older x86, bitwise operations may compile to CAS loops. Modern x86 has LOCK OR etc. ARM64 has dedicated instructions for all operations.
Exchange (also called swap) atomically replaces a value with a new value and returns the old value:
exchange(address, new_value):
atomic {
old = *address
*address = new_value
return old
}
This is conceptually simpler than compare-and-swap because it succeeds unconditionally—there's no comparison. You always set the new value, regardless of the old value.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
// SIMPLE SPINLOCK (non-fair, minimal)class SpinLock { private locked: AtomicBoolean = new AtomicBoolean(false); lock(): void { // Try to exchange false → true // If we get back false, we acquired the lock // If we get back true, someone else has it; retry while (this.locked.exchange(true)) { // Spin until we acquire // Could add yielding or backoff here } } unlock(): void { this.locked.store(false); }} // ATOMIC VALUE REPLACEMENTclass AtomicBox<T> { private value: AtomicReference<T>; constructor(initial: T) { this.value = new AtomicReference(initial); } // Replace the value, return the previous value getAndSet(newValue: T): T { return this.value.exchange(newValue); }} // WORK STEALINGclass WorkDeque { private head: AtomicReference<Node | null>; // Steal from the front (used by other threads) steal(): Work | null { // Atomically take the head const node = this.head.exchange(null); if (node !== null) { // Got work! Restore the rest of the queue this.head.store(node.next); return node.work; } return null; }} // TOKEN PASSINGclass TokenRing { private token: AtomicReference<Token | null>; // Take the token if available acquireToken(): Token | null { return this.token.exchange(null); } // Return the token releaseToken(token: Token): void { const old = this.token.exchange(token); if (old !== null) { throw new Error("Duplicate token in ring!"); } }}Hardware Implementation
x86-64: XCHG instruction is implicitly locked (atomic)
ARM64: SWPAL (swap with acquire-release semantics) or LDXR/STXR loop
Exchange is well-supported on all modern architectures and is typically as fast as fetch_add.
Test-and-Set (TAS) is historically one of the first synchronization primitives. It atomically sets a memory location to true (or 1) and returns the previous value:
test_and_set(address):
atomic {
old = *address
*address = true
return old
}
TAS is essentially exchange(address, true) with a fixed new value. It's a building block for the simplest possible lock.
1234567891011121314151617181920212223242526272829303132
// TEST-AND-SET LOCK (TAS Lock)class TASLock { private state: AtomicBoolean = new AtomicBoolean(false); lock(): void { // testAndSet returns the OLD value // If old value was false, we acquired the lock // If old value was true, someone else has it while (this.state.testAndSet()) { // Spin until we get false back } } unlock(): void { this.state.store(false); }} // EQUIVALENT USING EXCHANGEclass TASLockWithExchange { private state: AtomicBoolean = new AtomicBoolean(false); lock(): void { while (this.state.exchange(true)) { // Same behavior as testAndSet } } unlock(): void { this.state.store(false); }}Problems with Pure TAS Locks
While TAS locks are simple, they have serious performance problems:
Cache Line Thrashing: Every testAndSet() call writes to the lock, even when the lock is held. On multi-core systems, this bounces the cache line between CPUs constantly.
Bus Contention: Locked instructions generate memory bus traffic, even when they "fail" to acquire.
No Fairness: There's no guarantee that a waiting thread will ever acquire the lock. Under contention, some threads may starve.
The TTAS Optimization
Test-and-Test-and-Set (TTAS) improves on TAS by first spinning on a regular read, only attempting TAS when the lock appears free:
12345678910111213141516171819202122232425
// TEST-AND-TEST-AND-SET LOCK (TTAS Lock)class TTASLock { private state: AtomicBoolean = new AtomicBoolean(false); lock(): void { while (true) { // First test: spin on a regular load (cheap, cache-friendly) while (this.state.load()) { // Lock is held; spin on local cache // No bus traffic as long as cache is valid } // Lock appears free; try to acquire with TAS if (!this.state.testAndSet()) { // Got the lock! return; } // Someone else got it first; back to spinning } } unlock(): void { this.state.store(false); }}TTAS dramatically reduces bus traffic compared to TAS. The inner loop spins on cached data without generating memory traffic. Only when the lock is released (invalidating the cache line) do waiting threads retry acquisition. This can improve performance by 10x or more under contention.
Pointers are fundamental to data structure implementation, and atomic pointer operations are essential for lock-free data structures.
Atomic Pointer Types
Most languages provide atomic wrapper types for pointers:
| Language | Atomic Pointer Type |
|---|---|
| Java | AtomicReference<T> |
| C++ | std::atomic<T*> |
| Rust | AtomicPtr<T> |
| Go | atomic.Pointer[T] |
| C# | Uses Interlocked methods with regular references |
Atomic pointers support all the standard operations: load, store, exchange, and compare-and-swap.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
// SAFE PUBLICATION PATTERN// Problem: Publishing a newly constructed object for other threads interface Config { timeout: number; maxRetries: number;} class ConfigManager { private config: AtomicReference<Config>; constructor() { this.config = new AtomicReference({ timeout: 1000, maxRetries: 3 }); } // Safe: Other threads will see either old or new config, never garbage updateConfig(newConfig: Config): void { this.config.store(newConfig); } // Safe: Will read a complete, consistent Config object getConfig(): Config { return this.config.load(); }} // LOCK-FREE STACK (Treiber Stack)class LockFreeStack<T> { private head: AtomicReference<Node<T> | null> = new AtomicReference(null); push(value: T): void { const newNode = new Node(value); while (true) { const oldHead = this.head.load(); newNode.next = oldHead; // CAS: only update if head hasn't changed if (this.head.compareAndSet(oldHead, newNode)) { return; } // Head changed; retry } } pop(): T | null { while (true) { const oldHead = this.head.load(); if (oldHead === null) return null; const newHead = oldHead.next; // CAS: only update if head hasn't changed if (this.head.compareAndSet(oldHead, newHead)) { return oldHead.value; } // Head changed; retry } }} class Node<T> { value: T; next: Node<T> | null = null; constructor(value: T) { this.value = value; }}Atomic pointer CAS is vulnerable to the ABA problem: a pointer changes from A → B → A. CAS sees the expected value A and succeeds, but the underlying object may have been deallocated and reallocated! Solutions include hazard pointers, epoch-based reclamation, or tagged pointers.
Tagged Pointers
One mitigation for the ABA problem is to combine a version counter with the pointer. Since pointers typically don't use all bits (due to alignment), we can pack a counter into the unused bits:
// Using the low bits (pointers are 8-byte aligned)
struct TaggedPtr {
uintptr_t ptr : 61; // Actual pointer (high bits)
uintptr_t tag : 3; // Version counter (low bits)
};
// Using atomic double-width CAS
struct TaggedPtr {
void* ptr; // 64-bit pointer
uint64_t version; // 64-bit counter
}; // Total: 128 bits, atomically updated with CMPXCHG16B on x86-64
Each modification increments the tag/version. CAS compares both pointer and tag, preventing ABA confusion.
Atomic Flag is the most minimal atomic type—it only supports test-and-set and clear operations. Unlike other atomic types, atomic_flag is guaranteed to be lock-free on all platforms.
// C++
std::atomic_flag flag = ATOMIC_FLAG_INIT;
// Returns previous value (false = was clear, true = was set)
bool was_set = flag.test_and_set();
// Unconditionally clear the flag
flag.clear();
This is the only atomic type in C++ that is guaranteed lock-free. Other atomic types (atomic<int>, etc.) may fall back to locks on platforms without hardware support.
12345678910111213141516171819202122232425262728293031323334353637383940
#include <atomic> class Spinlock { std::atomic_flag flag = ATOMIC_FLAG_INIT; public: void lock() { // Keep setting the flag until we find it was clear while (flag.test_and_set(std::memory_order_acquire)) { // Spin // Optional: add pause/yield for power efficiency #if defined(__x86_64__) __builtin_ia32_pause(); // CPU hint: we're spinning #endif } } void unlock() { flag.clear(std::memory_order_release); }}; // UsageSpinlock lock; void critical_section() { lock.lock(); // ... protected code ... lock.unlock();} // RAII wrapper for exception safetyclass SpinGuard { Spinlock& lock_;public: SpinGuard(Spinlock& lock) : lock_(lock) { lock_.lock(); } ~SpinGuard() { lock_.unlock(); } SpinGuard(const SpinGuard&) = delete; SpinGuard& operator=(const SpinGuard&) = delete;};Atomic flag doesn't support reading the value without modifying it (before C++20). You can't simply check if the flag is set. C++20 added test() for read-only access. If you need to read without modifying, use atomic<bool> instead, accepting that it may not be lock-free on exotic platforms.
With multiple atomic primitives available, how do you choose the right one? The decision depends on what you're trying to accomplish:
Decision Framework:
1234567891011121314151617181920212223242526
Do you need to UPDATE the value?│├─ NO → Use atomic LOAD│ (Just reading current value)│└─ YES → Is the update UNCONDITIONAL? │ ├─ YES → Is it a simple write? │ │ │ ├─ YES → Use atomic STORE │ │ │ └─ NO (need old value?) → Use EXCHANGE │ └─ NO (update depends on current value) │ ├─ Is it arithmetic? (+, -) │ │ │ └─ YES → Use FETCH_ADD or FETCH_SUB │ ├─ Is it bitwise? (|, &, ^) │ │ │ └─ YES → Use FETCH_OR, FETCH_AND, FETCH_XOR │ └─ Is it conditional? (if value == X, then set to Y) │ └─ YES → Use COMPARE_AND_SWAP (CAS)| Scenario | Best Primitive | Why |
|---|---|---|
| Simple counter increment | fetch_add(1) | Single instruction, returns old value for sequence numbers |
| Reference counting decrement | fetch_sub(1) | Returns old value to detect reaching zero |
| Setting configuration | store() | No old value needed, unconditional update |
| Reading shared state | load() | Atomic snapshot of current value |
| Simple spinlock | exchange(true) | Unconditional set, returns old value to check acquisition |
| Lock-free stack push/pop | compare_exchange | Conditional update only if state unchanged |
| Setting flag bits | fetch_or(bits) | Set specific bits, returns old state |
| Clearing flag bits | fetch_and(~bits) | Clear specific bits, returns old state |
More specific primitives are usually faster than CAS. If you need a counter, use fetch_add—don't implement an increment loop with CAS. The specialized instructions are optimized in hardware. Use CAS only when you need its conditional update capability.
We've surveyed the complete spectrum of atomic primitives. Let's consolidate the key concepts:
What's Next:
The next page focuses on the most powerful atomic primitive: Compare-and-Swap (CAS). We'll explore its mechanics, implementation patterns, the notorious ABA problem, and how CAS enables the entire field of lock-free programming.
You now have a comprehensive understanding of atomic primitives—the building blocks from which all concurrent algorithms are constructed. From simple load/store to complex fetch-and-X operations, you know when and how to use each primitive effectively.