Atomic Operations - Learning Module

Loading content...

0/246

What Are Atomic Operations

The Illusion of Simplicity

Consider the most innocuous operation in programming: incrementing a counter.

count++;

This single line of code appears atomic—it looks like one indivisible action. But appearances deceive. At the machine level, this simple increment decomposes into multiple distinct operations:

Read the current value of count from memory
Add 1 to that value in a CPU register
Write the new value back to memory

In a single-threaded world, this decomposition is irrelevant. But the moment two threads execute count++ simultaneously, disaster lurks. Between any two of these steps, another thread can interleave—reading stale data, computing incorrect values, or overwriting updates. This is the lost update problem, and it has caused more production outages than most developers realize.

Atomic operations are the programming primitive that makes this problem tractable. They provide guarantees that certain operations will execute as if they were indivisible—no interleaving, no partial states visible to other threads, no surprises.

What You Will Learn

By the end of this page, you will understand what atomicity truly means at the hardware and software level, why non-atomic operations are dangerous in concurrent contexts, and how atomic operations provide the foundation for building correct, high-performance concurrent systems.

Defining Atomicity: What It Really Means

The term atomic derives from the Greek atomos, meaning "indivisible." In computing, an atomic operation exhibits two critical properties:

1. Indivisibility (All-or-Nothing Execution)

An atomic operation completes entirely or not at all. There is no intermediate state where the operation is "half done." If a system fails during an atomic operation, the operation either completed fully before the failure or is guaranteed not to have taken effect at all.

2. Isolation (No Intermediate States Visible)

No other thread or process can observe the operation in a partially completed state. From the perspective of all other concurrent observers, the operation appears to happen instantaneously—at a single discrete point in time.

These properties combine to create a powerful abstraction: linearizability. An atomic operation appears to take effect atomically at some single instant between its invocation and completion. This instant is called the linearization point.

The Linearization Point

Think of the linearization point as the precise moment when an atomic operation 'takes effect.' Before that instant, the world is in the old state. After that instant, the world is in the new state. There is never a moment where the operation is visible as 'in progress.'

Formal Definition:

An operation is atomic if and only if:

It executes as a single, indivisible step with respect to all other operations
The effects of the operation are completely visible to all threads at once
No partial effects are ever observable

Important Distinction: Atomicity vs. Thread Safety

Atomicity is a property of individual operations, not of entire algorithms or data structures. A single atomic operation guarantees that operation's correctness. Combining multiple atomic operations does not automatically produce an atomic compound operation. This distinction is the source of many subtle concurrency bugs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// DANGEROUS: Non-atomic compound operation
// Even if balance.get() and balance.set() are individually atomic,
// the compound operation is NOT atomic
 
function withdraw(amount: number): boolean {
    const current = balance.get();  // Atomic read
    if (current >= amount) {
        balance.set(current - amount);  // Atomic write
        return true;
    }
    return false;
}
 
// Thread A: current = 100, about to set balance to 50
// Thread B: current = 100, sets balance to 30
// Thread A: sets balance to 50 (Thread B's deduction is lost!)
 
// SAFE: Single atomic operation
function withdrawAtomic(amount: number): boolean {
    // compareAndSet is atomic: read-compare-write happens as one operation
    while (true) {
        const current = balance.get();
        if (current < amount) return false;
        if (balance.compareAndSet(current, current - amount)) {
            return true;
        }
        // If CAS failed, another thread modified balance; retry
    }
}

Why Non-Atomic Operations Are Dangerous

To truly appreciate atomic operations, we must understand what goes wrong without them. Non-atomic operations open the door to several categories of concurrency bugs:

The Read-Modify-Write Problem (Lost Updates)

When multiple threads perform read-modify-write sequences on shared data, updates can be lost. Consider two threads incrementing a counter from 0:

Time    Thread A              Thread B              Actual Value
────────────────────────────────────────────────────────────────
t1      Read: 0                                     0
t2                            Read: 0               0
t3      Add: 0 + 1 = 1                              0
t4                            Add: 0 + 1 = 1        0
t5      Write: 1                                    1
t6                            Write: 1              1 ← Lost update!

Both threads thought they were incrementing from 0 to 1. The final value should be 2, but it's 1. One increment was lost.

Lost Updates in Production

Lost updates have caused real production disasters: inventory systems selling non-existent stock, bank accounts losing deposits, analytics dashboards showing wrong counts. The Knight Capital incident (2012) lost $440 million in 45 minutes partly due to race conditions in trading software.

The Torn Read/Write Problem

Some data types (like 64-bit integers on 32-bit systems) cannot be read or written in a single CPU instruction. A thread reading a 64-bit value might see the new high 32 bits combined with the old low 32 bits—a value that never existed. This is called a torn read.

1
2
3
4
5
6
7
8
9
10
11
12
13
// On a 32-bit system, writing a 64-bit value requires two operations
// Let's say we're updating 'value' from 0x00000000_00000000 to 0xFFFFFFFF_FFFFFFFF
 
// Thread A (writing):
// Step 1: Write high 32 bits → memory = 0xFFFFFFFF_00000000
// Step 2: Write low 32 bits  → memory = 0xFFFFFFFF_FFFFFFFF
 
// Thread B (reading between Step 1 and Step 2):
// Reads: 0xFFFFFFFF_00000000  ← A "phantom" value that was never intended!
 
// In numeric terms:
// Thread A intended: 0 → 18,446,744,073,709,551,615
// Thread B observed: 18,446,744,069,414,584,320 ← WRONG!

The Check-Then-Act Problem (TOCTOU)

Time Of Check to Time Of Use (TOCTOU) bugs occur when a condition is checked, and action is taken based on that check—but the condition changes between check and action.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// DANGEROUS: TOCTOU bug in file operations
async function safeWrite(filePath: string, data: string) {
    // Check if file exists
    if (!await fileExists(filePath)) {
        // ← Another thread creates the file HERE!
        // We think it doesn't exist, but now it does
        await writeFile(filePath, data);  // Overwrites unexpected data!
    }
}
 
// DANGEROUS: TOCTOU in resource allocation
class ConnectionPool {
    private available: number = 10;
    
    acquire(): Connection | null {
        if (this.available > 0) {  // Check
            // ← Another thread acquires connection HERE!
            this.available--;       // Act: But available might now be 0!
            return this.createConnection();
        }
        return null;
    }
}

Categories of Non-Atomic Operation Bugs

•Lost Updates — Multiple concurrent read-modify-write sequences result in some modifications being silently overwritten
•Torn Reads/Writes — Partial observation of multi-word values yields impossible states
•TOCTOU Races — Conditions change between being checked and being acted upon
•Publication Bugs — Objects become visible before they are fully constructed
•Stale Reads — Threads continue observing old values after updates due to caching

Hardware Foundations of Atomicity

Atomicity doesn't come from thin air—it's ultimately enforced by hardware. Understanding how CPUs provide atomic guarantees illuminates why certain operations are atomic and others are not.

Memory Access Atomicity

Modern CPUs guarantee atomicity for naturally aligned memory accesses of word size (typically 32 or 64 bits). A 64-bit read from an 8-byte aligned address is atomic on 64-bit x86 processors. This means:

The entire value is read in a single operation
No other CPU can observe a partially written value
No torn reads or writes occur

However, this only applies to single memory accesses. Read-modify-write operations (like increment) require multiple accesses and are NOT inherently atomic.

CPU Atomicity Guarantees by Architecture
Operation Type	x86-64	ARM64	RISC-V	Key Constraint
Aligned 8-byte read	Atomic	Atomic	Atomic	Address must be 8-byte aligned
Aligned 8-byte write	Atomic	Atomic	Atomic	Address must be 8-byte aligned
Unaligned read/write	Usually atomic*	Not atomic	Not atomic	*Only guaranteed in recent Intel CPUs
Read-modify-write	Not atomic	Not atomic	Not atomic	Requires special instructions
128-bit access	Not atomic	Not atomic	Not atomic	Always requires synchronization

Bus Locking and Cache Coherence

For operations that span multiple memory accesses, CPUs provide special instructions that enforce atomicity through hardware mechanisms:

1. Bus Locking (Legacy Approach)

Early processors used a LOCK signal that prevented other CPUs from accessing memory during the locked operation. The LOCK prefix on x86 instructions (e.g., LOCK INC) asserts this signal. However, bus locking is expensive—it serializes all memory access across the entire system.

2. Cache Locking (Modern Approach)

Modern CPUs use cache coherence protocols (like MESI or MOESI) to achieve atomicity more efficiently. Instead of locking the entire memory bus:

The cache line containing the target address is locked locally
The operation completes in the CPU's private cache
Cache coherence protocols ensure other CPUs see consistent values
Only if the data spans cache lines does bus locking occur

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
; Non-atomic increment (three separate operations)
mov eax, [counter]    ; Read from memory to register
add eax, 1            ; Add 1 in register
mov [counter], eax    ; Write back to memory
 
; Atomic increment (single locked operation)
lock inc [counter]    ; LOCK prefix makes entire operation atomic
 
; Atomic compare-and-swap
lock cmpxchg [target], new_value
; If [target] == eax (expected), set [target] = new_value and ZF=1
; Otherwise, eax = [target] and ZF=0
 
; Atomic exchange
lock xchg [target], new_value
; Atomically swaps [target] and new_value

Cache Lines and False Sharing

CPUs transfer memory in cache line units (typically 64 bytes). When two unrelated atomic variables share a cache line, threads modifying them contend for the same line—even though they're accessing different variables. This 'false sharing' can devastate performance. Padding atomic variables to separate cache lines is a common optimization.

Software Atomicity Abstractions

While hardware provides the foundation, programming languages and libraries provide higher-level abstractions that make atomic operations accessible and portable.

Atomic Types in Modern Languages

Most modern languages provide atomic wrapper types that guarantee atomic operations on the wrapped value. These types expose operations that map to underlying hardware atomics.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Java: java.util.concurrent.atomic
import java.util.concurrent.atomic.AtomicInteger;
AtomicInteger counter = new AtomicInteger(0);
counter.incrementAndGet();  // Atomic increment, returns new value
counter.compareAndSet(10, 20);  // Atomic CAS
 
// C++11: <atomic>
#include <atomic>
std::atomic<int> counter{0};
counter++;  // Atomic increment
counter.compare_exchange_strong(expected, desired);
 
// Rust: std::sync::atomic
use std::sync::atomic::{AtomicUsize, Ordering};
let counter = AtomicUsize::new(0);
counter.fetch_add(1, Ordering::SeqCst);  // Atomic increment
counter.compare_exchange(10, 20, Ordering::SeqCst, Ordering::SeqCst);
 
// Go: sync/atomic
import "sync/atomic"
var counter int64 = 0
atomic.AddInt64(&counter, 1)  // Atomic increment
atomic.CompareAndSwapInt64(&counter, 10, 20)
 
// C#: System.Threading.Interlocked
using System.Threading;
int counter = 0;
Interlocked.Increment(ref counter);  // Atomic increment
Interlocked.CompareExchange(ref counter, 20, 10);  // Atomic CAS

Memory Ordering and Visibility

Atomic operations alone don't guarantee that other memory accesses are properly ordered relative to them. Modern CPUs and compilers aggressively reorder instructions for performance. Memory ordering specifications control how atomic operations synchronize memory visibility.

The C++11 memory model (adopted by many languages) defines several ordering levels:

Ordering	Guarantee	Performance
`relaxed`	Atomicity only; no ordering	Fastest
`acquire`	Reads after this see writes before matching release	Moderate
`release`	Writes before this are visible to matching acquire	Moderate
`acq_rel`	Both acquire and release semantics	Moderate
`seq_cst`	Total order across all seq_cst operations	Slowest

We'll explore memory ordering in depth later. For now, understand that atomicity and ordering are separate concerns, and getting ordering wrong can cause bugs even with atomic operations.

Sequential Consistency by Default

When in doubt, use the strongest ordering guarantee (sequential consistency). Weaker orderings are optimizations that require deep understanding of memory models. Incorrect memory ordering causes bugs that are nearly impossible to reproduce and debug.

Categories of Atomic Operations

Atomic operations can be categorized by what they do and how they achieve atomicity:

1. Simple Atomic Loads and Stores

The most basic atomic operations: reading and writing a value atomically. These guarantee no torn reads or writes, but they do NOT prevent race conditions in read-modify-write sequences.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Atomic load: Reads the current value atomically
const value = atomicVar.load();  // Guaranteed to see a complete value
 
// Atomic store: Writes a value atomically
atomicVar.store(42);  // Other threads will see 42 or the previous value, never garbage
 
// These are sufficient for:
// - Publishing fully-constructed data (store)
// - Reading published data (load)
// - Flag-based signaling
 
// These are NOT sufficient for:
// - Incrementing counters (read-modify-write)
// - Conditional updates (check-then-act)

2. Read-Modify-Write (RMW) Operations

These operations atomically read a value, perform a computation, and write the result—all as a single atomic step. The three parts are indivisible.

Common Read-Modify-Write Operations

•Fetch-and-Add (fetch_add) — Atomically adds a value and returns the previous value. Ideal for counters and sequence generators.
•Fetch-and-Sub (fetch_sub) — Atomically subtracts a value and returns the previous value. Used for decreasing reference counts.
•Fetch-and-Or (fetch_or) — Atomically ORs bits and returns the previous value. Used for setting flags.
•Fetch-and-And (fetch_and) — Atomically ANDs bits and returns the previous value. Used for clearing flags.
•Exchange (exchange) — Atomically replaces a value and returns the previous value. Useful for implementing locks.
•Compare-and-Swap (compare_exchange) — The king of atomic operations. Conditionally updates only if the current value matches expected. Enables all lock-free algorithms.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// Atomic counter using fetch_add
class AtomicCounter {
    private value = new Atomic<number>(0);
    
    increment(): number {
        // Returns OLD value, but value is atomically incremented
        return this.value.fetchAdd(1);
    }
    
    get(): number {
        return this.value.load();
    }
}
 
// Atomic flag using exchange
class SpinLock {
    private locked = new Atomic<boolean>(false);
    
    lock(): void {
        // Keep trying until we successfully change false → true
        while (this.locked.exchange(true)) {
            // Another thread holds the lock; spin
        }
    }
    
    unlock(): void {
        this.locked.store(false);
    }
}
 
// Atomic reference counting with fetch_sub
class RefCounted<T> {
    private refCount = new Atomic<number>(1);
    private resource: T;
    
    release(): void {
        // Decrement and get OLD value
        if (this.refCount.fetchSub(1) === 1) {
            // We were the last reference; cleanup
            this.destroy();
        }
    }
}

3. Compare-and-Swap (CAS)

CAS is so fundamental that it deserves special attention. It atomically compares a memory location against an expected value and, only if they match, writes a new value:

CAS(address, expected, new_value):
    atomic {
        if (*address == expected) {
            *address = new_value
            return true
        } else {
            return false
        }
    }

CAS is universal—any other atomic operation can be implemented using CAS (though perhaps less efficiently than hardware-native operations). It's the foundation of lock-free programming, enabling algorithms that make progress even when individual threads are delayed or suspended.

We'll dive deep into CAS implementation and applications in a dedicated section.

Atomicity vs. Locking: When to Use What

Both atomic operations and locks (mutexes) solve concurrency problems, but they have different characteristics and use cases.

Locks (Mutexes)

Protect arbitrary critical sections (multiple operations)
Block waiting threads (context switch overhead)
Composable: nested locks protect compound operations
Deadlock risk if acquisition order is inconsistent
Simple mental model

Atomic Operations

Protect single values or small data
Non-blocking: threads never wait on each other (usually)
Not composable: combining atomics requires care
No deadlock risk (no waiting)
Complex mental model for anything beyond simple cases

Prefer Locks When

•Multiple variables must be updated together consistently
•The critical section contains complex logic
•Contention is low (few threads competing)
•The protected operation is slow (I/O, computation)
•Simplicity and correctness trump maximum performance

Prefer Atomics When

•A single variable needs concurrent updates
•Contention is extremely high
•Lock-free guarantees are required (real-time systems)
•The operation is simple (increment, flag toggle)
•Maximum throughput is critical

The Pragmatic Default

Start with locks. They're easier to reason about and harder to get wrong. Only switch to lock-free atomics when profiling proves that lock contention is a bottleneck AND you're prepared to invest in verifying correctness. Premature optimization with atomics has caused more bugs than it has solved performance problems.

Performance Characteristics
Aspect	Mutex Lock	Atomic Operation
Uncontended latency	~25-50 ns	~5-10 ns
Contended latency	Microseconds (context switch)	Variable (retry loops)
Memory overhead	~40-80 bytes per lock	Same as underlying type
Scalability	Degrades with thread count	Better under high contention
Progress guarantee	Blocking	Lock-free or wait-free
Fairness	Configurable (fair locks)	Typically unfair

Common Misconceptions About Atomicity

Atomic operations are frequently misunderstood. Let's address the most dangerous misconceptions:

Dangerous Misconceptions

•Misconception: volatile provides atomicity — In languages like C/C++, volatile only prevents compiler optimizations. It does NOT provide atomicity or memory ordering. In Java, volatile provides atomic reads/writes and visibility guarantees, but NOT atomic read-modify-write.
•Misconception: Single-variable access is always atomic — Accessing a 64-bit variable on a 32-bit system without explicit atomic types can result in torn reads. Never assume atomicity without verifying hardware and language guarantees.
•Misconception: Atomic operations are always faster than locks — Under low contention, locks can be faster due to simpler code paths. Atomic retry loops under high contention can burn CPU cycles. Always measure.
•Misconception: Combining atomic operations creates an atomic compound — Two atomic operations in sequence are NOT atomic together. Between them, other threads can interleave. Compound atomicity requires CAS loops or locks.
•Misconception: Lock-free means wait-free — Lock-free guarantees that SOME thread makes progress; individual threads may still starve. Wait-free guarantees every thread completes in bounded steps—a much stronger (and rarer) property.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// WRONG: volatile is NOT atomic in C/C++!
volatile int counter = 0;
 
void increment() {
    counter++;  // NOT ATOMIC! Compiles to multiple instructions.
                // volatile only prevents compiler from optimizing away reads/writes
}
 
// CORRECT: Use std::atomic
#include <atomic>
std::atomic<int> counter{0};
 
void increment() {
    counter++;  // ATOMIC! Uses hardware atomic instructions.
}
 
// In Java, volatile IS atomic for read/write, but NOT for compound operations:
// Java:
volatile int counter = 0;
counter++;  // NOT atomic! Read-modify-write is still non-atomic.
 
// Java atomic:
AtomicInteger counter = new AtomicInteger(0);
counter.incrementAndGet();  // ATOMIC!

Check Your Language's Guarantees

Different languages have vastly different atomicity semantics. What's atomic in Java may not be atomic in C++. What's safe in Go may be unsafe in Rust without explicit Ordering. Always consult your language's memory model documentation.

Summary: The Foundation of Correctness

We've established the conceptual foundation for atomic operations. Let's consolidate our understanding:

Key Takeaways

•Atomicity means indivisibility — Atomic operations either complete entirely or not at all, with no partial states visible to other threads.
•Simple operations decompose — Even count++ is NOT atomic; it's read-modify-write under the hood.
•Non-atomic operations cause real bugs — Lost updates, torn reads, and TOCTOU races cause production failures.
•Hardware provides the foundation — CPUs offer atomic instructions (LOCK prefix, CAS) that software abstractions build upon.
•Atomicity and ordering are separate — Atomic operations need memory ordering specifications to ensure visibility correctness.
•Atomic types abstract the complexity — Modern languages provide portable atomic types that map to hardware capabilities.
•CAS is the universal primitive — Compare-and-swap enables lock-free programming and complex atomic algorithms.
•Locks vs. atomics is a design choice — Neither is universally better; choose based on contention, complexity, and requirements.

What's Next:

Now that we understand what atomic operations are and why they matter, we'll examine the specific atomic primitives available in modern programming. The next page covers atomic types, their operations, and how to use them correctly in practice.

Page Complete

You now understand the fundamental concept of atomicity—what makes operations atomic, why non-atomic operations are dangerous, and how hardware and software provide atomic guarantees. This foundation is essential for understanding the atomic primitives and lock-free techniques we'll explore next.