Loading content...
Consider the most innocuous operation in programming: incrementing a counter.
count++;
This single line of code appears atomic—it looks like one indivisible action. But appearances deceive. At the machine level, this simple increment decomposes into multiple distinct operations:
count from memoryIn a single-threaded world, this decomposition is irrelevant. But the moment two threads execute count++ simultaneously, disaster lurks. Between any two of these steps, another thread can interleave—reading stale data, computing incorrect values, or overwriting updates. This is the lost update problem, and it has caused more production outages than most developers realize.
Atomic operations are the programming primitive that makes this problem tractable. They provide guarantees that certain operations will execute as if they were indivisible—no interleaving, no partial states visible to other threads, no surprises.
By the end of this page, you will understand what atomicity truly means at the hardware and software level, why non-atomic operations are dangerous in concurrent contexts, and how atomic operations provide the foundation for building correct, high-performance concurrent systems.
The term atomic derives from the Greek atomos, meaning "indivisible." In computing, an atomic operation exhibits two critical properties:
1. Indivisibility (All-or-Nothing Execution)
An atomic operation completes entirely or not at all. There is no intermediate state where the operation is "half done." If a system fails during an atomic operation, the operation either completed fully before the failure or is guaranteed not to have taken effect at all.
2. Isolation (No Intermediate States Visible)
No other thread or process can observe the operation in a partially completed state. From the perspective of all other concurrent observers, the operation appears to happen instantaneously—at a single discrete point in time.
These properties combine to create a powerful abstraction: linearizability. An atomic operation appears to take effect atomically at some single instant between its invocation and completion. This instant is called the linearization point.
Think of the linearization point as the precise moment when an atomic operation 'takes effect.' Before that instant, the world is in the old state. After that instant, the world is in the new state. There is never a moment where the operation is visible as 'in progress.'
Formal Definition:
An operation is atomic if and only if:
Important Distinction: Atomicity vs. Thread Safety
Atomicity is a property of individual operations, not of entire algorithms or data structures. A single atomic operation guarantees that operation's correctness. Combining multiple atomic operations does not automatically produce an atomic compound operation. This distinction is the source of many subtle concurrency bugs.
1234567891011121314151617181920212223242526272829
// DANGEROUS: Non-atomic compound operation// Even if balance.get() and balance.set() are individually atomic,// the compound operation is NOT atomic function withdraw(amount: number): boolean { const current = balance.get(); // Atomic read if (current >= amount) { balance.set(current - amount); // Atomic write return true; } return false;} // Thread A: current = 100, about to set balance to 50// Thread B: current = 100, sets balance to 30// Thread A: sets balance to 50 (Thread B's deduction is lost!) // SAFE: Single atomic operationfunction withdrawAtomic(amount: number): boolean { // compareAndSet is atomic: read-compare-write happens as one operation while (true) { const current = balance.get(); if (current < amount) return false; if (balance.compareAndSet(current, current - amount)) { return true; } // If CAS failed, another thread modified balance; retry }}To truly appreciate atomic operations, we must understand what goes wrong without them. Non-atomic operations open the door to several categories of concurrency bugs:
The Read-Modify-Write Problem (Lost Updates)
When multiple threads perform read-modify-write sequences on shared data, updates can be lost. Consider two threads incrementing a counter from 0:
Time Thread A Thread B Actual Value
────────────────────────────────────────────────────────────────
t1 Read: 0 0
t2 Read: 0 0
t3 Add: 0 + 1 = 1 0
t4 Add: 0 + 1 = 1 0
t5 Write: 1 1
t6 Write: 1 1 ← Lost update!
Both threads thought they were incrementing from 0 to 1. The final value should be 2, but it's 1. One increment was lost.
Lost updates have caused real production disasters: inventory systems selling non-existent stock, bank accounts losing deposits, analytics dashboards showing wrong counts. The Knight Capital incident (2012) lost $440 million in 45 minutes partly due to race conditions in trading software.
The Torn Read/Write Problem
Some data types (like 64-bit integers on 32-bit systems) cannot be read or written in a single CPU instruction. A thread reading a 64-bit value might see the new high 32 bits combined with the old low 32 bits—a value that never existed. This is called a torn read.
12345678910111213
// On a 32-bit system, writing a 64-bit value requires two operations// Let's say we're updating 'value' from 0x00000000_00000000 to 0xFFFFFFFF_FFFFFFFF // Thread A (writing):// Step 1: Write high 32 bits → memory = 0xFFFFFFFF_00000000// Step 2: Write low 32 bits → memory = 0xFFFFFFFF_FFFFFFFF // Thread B (reading between Step 1 and Step 2):// Reads: 0xFFFFFFFF_00000000 ← A "phantom" value that was never intended! // In numeric terms:// Thread A intended: 0 → 18,446,744,073,709,551,615// Thread B observed: 18,446,744,069,414,584,320 ← WRONG!The Check-Then-Act Problem (TOCTOU)
Time Of Check to Time Of Use (TOCTOU) bugs occur when a condition is checked, and action is taken based on that check—but the condition changes between check and action.
1234567891011121314151617181920212223
// DANGEROUS: TOCTOU bug in file operationsasync function safeWrite(filePath: string, data: string) { // Check if file exists if (!await fileExists(filePath)) { // ← Another thread creates the file HERE! // We think it doesn't exist, but now it does await writeFile(filePath, data); // Overwrites unexpected data! }} // DANGEROUS: TOCTOU in resource allocationclass ConnectionPool { private available: number = 10; acquire(): Connection | null { if (this.available > 0) { // Check // ← Another thread acquires connection HERE! this.available--; // Act: But available might now be 0! return this.createConnection(); } return null; }}Atomicity doesn't come from thin air—it's ultimately enforced by hardware. Understanding how CPUs provide atomic guarantees illuminates why certain operations are atomic and others are not.
Memory Access Atomicity
Modern CPUs guarantee atomicity for naturally aligned memory accesses of word size (typically 32 or 64 bits). A 64-bit read from an 8-byte aligned address is atomic on 64-bit x86 processors. This means:
However, this only applies to single memory accesses. Read-modify-write operations (like increment) require multiple accesses and are NOT inherently atomic.
| Operation Type | x86-64 | ARM64 | RISC-V | Key Constraint |
|---|---|---|---|---|
| Aligned 8-byte read | Atomic | Atomic | Atomic | Address must be 8-byte aligned |
| Aligned 8-byte write | Atomic | Atomic | Atomic | Address must be 8-byte aligned |
| Unaligned read/write | Usually atomic* | Not atomic | Not atomic | *Only guaranteed in recent Intel CPUs |
| Read-modify-write | Not atomic | Not atomic | Not atomic | Requires special instructions |
| 128-bit access | Not atomic | Not atomic | Not atomic | Always requires synchronization |
Bus Locking and Cache Coherence
For operations that span multiple memory accesses, CPUs provide special instructions that enforce atomicity through hardware mechanisms:
1. Bus Locking (Legacy Approach)
Early processors used a LOCK signal that prevented other CPUs from accessing memory during the locked operation. The LOCK prefix on x86 instructions (e.g., LOCK INC) asserts this signal. However, bus locking is expensive—it serializes all memory access across the entire system.
2. Cache Locking (Modern Approach)
Modern CPUs use cache coherence protocols (like MESI or MOESI) to achieve atomicity more efficiently. Instead of locking the entire memory bus:
12345678910111213141516
; Non-atomic increment (three separate operations)mov eax, [counter] ; Read from memory to registeradd eax, 1 ; Add 1 in registermov [counter], eax ; Write back to memory ; Atomic increment (single locked operation)lock inc [counter] ; LOCK prefix makes entire operation atomic ; Atomic compare-and-swaplock cmpxchg [target], new_value; If [target] == eax (expected), set [target] = new_value and ZF=1; Otherwise, eax = [target] and ZF=0 ; Atomic exchangelock xchg [target], new_value; Atomically swaps [target] and new_valueCPUs transfer memory in cache line units (typically 64 bytes). When two unrelated atomic variables share a cache line, threads modifying them contend for the same line—even though they're accessing different variables. This 'false sharing' can devastate performance. Padding atomic variables to separate cache lines is a common optimization.
While hardware provides the foundation, programming languages and libraries provide higher-level abstractions that make atomic operations accessible and portable.
Atomic Types in Modern Languages
Most modern languages provide atomic wrapper types that guarantee atomic operations on the wrapped value. These types expose operations that map to underlying hardware atomics.
1234567891011121314151617181920212223242526272829
// Java: java.util.concurrent.atomicimport java.util.concurrent.atomic.AtomicInteger;AtomicInteger counter = new AtomicInteger(0);counter.incrementAndGet(); // Atomic increment, returns new valuecounter.compareAndSet(10, 20); // Atomic CAS // C++11: <atomic>#include <atomic>std::atomic<int> counter{0};counter++; // Atomic incrementcounter.compare_exchange_strong(expected, desired); // Rust: std::sync::atomicuse std::sync::atomic::{AtomicUsize, Ordering};let counter = AtomicUsize::new(0);counter.fetch_add(1, Ordering::SeqCst); // Atomic incrementcounter.compare_exchange(10, 20, Ordering::SeqCst, Ordering::SeqCst); // Go: sync/atomicimport "sync/atomic"var counter int64 = 0atomic.AddInt64(&counter, 1) // Atomic incrementatomic.CompareAndSwapInt64(&counter, 10, 20) // C#: System.Threading.Interlockedusing System.Threading;int counter = 0;Interlocked.Increment(ref counter); // Atomic incrementInterlocked.CompareExchange(ref counter, 20, 10); // Atomic CASMemory Ordering and Visibility
Atomic operations alone don't guarantee that other memory accesses are properly ordered relative to them. Modern CPUs and compilers aggressively reorder instructions for performance. Memory ordering specifications control how atomic operations synchronize memory visibility.
The C++11 memory model (adopted by many languages) defines several ordering levels:
| Ordering | Guarantee | Performance |
|---|---|---|
relaxed | Atomicity only; no ordering | Fastest |
acquire | Reads after this see writes before matching release | Moderate |
release | Writes before this are visible to matching acquire | Moderate |
acq_rel | Both acquire and release semantics | Moderate |
seq_cst | Total order across all seq_cst operations | Slowest |
We'll explore memory ordering in depth later. For now, understand that atomicity and ordering are separate concerns, and getting ordering wrong can cause bugs even with atomic operations.
When in doubt, use the strongest ordering guarantee (sequential consistency). Weaker orderings are optimizations that require deep understanding of memory models. Incorrect memory ordering causes bugs that are nearly impossible to reproduce and debug.
Atomic operations can be categorized by what they do and how they achieve atomicity:
1. Simple Atomic Loads and Stores
The most basic atomic operations: reading and writing a value atomically. These guarantee no torn reads or writes, but they do NOT prevent race conditions in read-modify-write sequences.
1234567891011121314
// Atomic load: Reads the current value atomicallyconst value = atomicVar.load(); // Guaranteed to see a complete value // Atomic store: Writes a value atomicallyatomicVar.store(42); // Other threads will see 42 or the previous value, never garbage // These are sufficient for:// - Publishing fully-constructed data (store)// - Reading published data (load)// - Flag-based signaling // These are NOT sufficient for:// - Incrementing counters (read-modify-write)// - Conditional updates (check-then-act)2. Read-Modify-Write (RMW) Operations
These operations atomically read a value, perform a computation, and write the result—all as a single atomic step. The three parts are indivisible.
fetch_add) — Atomically adds a value and returns the previous value. Ideal for counters and sequence generators.fetch_sub) — Atomically subtracts a value and returns the previous value. Used for decreasing reference counts.fetch_or) — Atomically ORs bits and returns the previous value. Used for setting flags.fetch_and) — Atomically ANDs bits and returns the previous value. Used for clearing flags.exchange) — Atomically replaces a value and returns the previous value. Useful for implementing locks.compare_exchange) — The king of atomic operations. Conditionally updates only if the current value matches expected. Enables all lock-free algorithms.12345678910111213141516171819202122232425262728293031323334353637383940414243
// Atomic counter using fetch_addclass AtomicCounter { private value = new Atomic<number>(0); increment(): number { // Returns OLD value, but value is atomically incremented return this.value.fetchAdd(1); } get(): number { return this.value.load(); }} // Atomic flag using exchangeclass SpinLock { private locked = new Atomic<boolean>(false); lock(): void { // Keep trying until we successfully change false → true while (this.locked.exchange(true)) { // Another thread holds the lock; spin } } unlock(): void { this.locked.store(false); }} // Atomic reference counting with fetch_subclass RefCounted<T> { private refCount = new Atomic<number>(1); private resource: T; release(): void { // Decrement and get OLD value if (this.refCount.fetchSub(1) === 1) { // We were the last reference; cleanup this.destroy(); } }}3. Compare-and-Swap (CAS)
CAS is so fundamental that it deserves special attention. It atomically compares a memory location against an expected value and, only if they match, writes a new value:
CAS(address, expected, new_value):
atomic {
if (*address == expected) {
*address = new_value
return true
} else {
return false
}
}
CAS is universal—any other atomic operation can be implemented using CAS (though perhaps less efficiently than hardware-native operations). It's the foundation of lock-free programming, enabling algorithms that make progress even when individual threads are delayed or suspended.
We'll dive deep into CAS implementation and applications in a dedicated section.
Both atomic operations and locks (mutexes) solve concurrency problems, but they have different characteristics and use cases.
Locks (Mutexes)
Atomic Operations
Start with locks. They're easier to reason about and harder to get wrong. Only switch to lock-free atomics when profiling proves that lock contention is a bottleneck AND you're prepared to invest in verifying correctness. Premature optimization with atomics has caused more bugs than it has solved performance problems.
| Aspect | Mutex Lock | Atomic Operation |
|---|---|---|
| Uncontended latency | ~25-50 ns | ~5-10 ns |
| Contended latency | Microseconds (context switch) | Variable (retry loops) |
| Memory overhead | ~40-80 bytes per lock | Same as underlying type |
| Scalability | Degrades with thread count | Better under high contention |
| Progress guarantee | Blocking | Lock-free or wait-free |
| Fairness | Configurable (fair locks) | Typically unfair |
Atomic operations are frequently misunderstood. Let's address the most dangerous misconceptions:
volatile provides atomicity — In languages like C/C++, volatile only prevents compiler optimizations. It does NOT provide atomicity or memory ordering. In Java, volatile provides atomic reads/writes and visibility guarantees, but NOT atomic read-modify-write.123456789101112131415161718192021222324
// WRONG: volatile is NOT atomic in C/C++!volatile int counter = 0; void increment() { counter++; // NOT ATOMIC! Compiles to multiple instructions. // volatile only prevents compiler from optimizing away reads/writes} // CORRECT: Use std::atomic#include <atomic>std::atomic<int> counter{0}; void increment() { counter++; // ATOMIC! Uses hardware atomic instructions.} // In Java, volatile IS atomic for read/write, but NOT for compound operations:// Java:volatile int counter = 0;counter++; // NOT atomic! Read-modify-write is still non-atomic. // Java atomic:AtomicInteger counter = new AtomicInteger(0);counter.incrementAndGet(); // ATOMIC!Different languages have vastly different atomicity semantics. What's atomic in Java may not be atomic in C++. What's safe in Go may be unsafe in Rust without explicit Ordering. Always consult your language's memory model documentation.
We've established the conceptual foundation for atomic operations. Let's consolidate our understanding:
count++ is NOT atomic; it's read-modify-write under the hood.What's Next:
Now that we understand what atomic operations are and why they matter, we'll examine the specific atomic primitives available in modern programming. The next page covers atomic types, their operations, and how to use them correctly in practice.
You now understand the fundamental concept of atomicity—what makes operations atomic, why non-atomic operations are dangerous, and how hardware and software provide atomic guarantees. This foundation is essential for understanding the atomic primitives and lock-free techniques we'll explore next.