Loading content...
A memory model is a formal specification that defines how threads interact through memory in a concurrent program. It answers critical questions: When one thread writes to a variable, when (if ever) is that write visible to other threads? What reorderings of memory operations are permitted by the compiler and hardware?
Without understanding memory models, it's impossible to write correct concurrent code. The breakdown of Double-Checked Locking in its naive form is a direct consequence of the gap between what programmers assume about memory behavior and what memory models actually guarantee.
In this page, we explore the memory models of Java, C++, and other languages, understanding the primitives they provide to control memory visibility and ordering. This knowledge is essential for implementing DCL correctly—and for all concurrent programming.
By the end of this page, you will understand what memory models are, why they're necessary, the specific guarantees provided by volatile (Java), atomic (C++), and memory barriers. You'll understand the "happens-before" relationship and why it's the key to making DCL work.
Modern computers don't execute instructions in the simple, sequential manner that source code suggests. Several layers of optimization can reorder and transform memory operations:
1. Compiler Optimizations
Compilers analyze code and reorder instructions for better CPU utilization. They may:
2. CPU Instruction Reordering
CPUs execute instructions out-of-order to maximize throughput. While they maintain the illusion of sequential execution for a single thread, operations may complete in a different order than issued.
3. CPU Cache Hierarchies
Each CPU core has its own L1/L2 cache. Writes to memory may sit in a core's cache and not be immediately visible to other cores. Cache coherency protocols eventually synchronize caches, but the timing is not guaranteed without explicit barriers.
4. Store Buffers and Invalidation Queues
CPUs use store buffers to decouple execution from memory writes. A core might "see" its own store immediately but other cores don't see it until the buffer drains.
| Reordering Type | x86/x64 | ARM | POWER | Description |
|---|---|---|---|---|
| Store-Store | No | Yes | Yes | Stores can complete out of order |
| Load-Load | No | Yes | Yes | Loads can complete out of order |
| Load-Store | No | Yes | Yes | Load can move after later store |
| Store-Load | Yes | Yes | Yes | Store can move after later load |
The Fundamental Problem
All these optimizations are transparent to single-threaded code—the final result is always as if instructions executed in order. But multi-threaded code observes intermediate states. What's harmless for one thread becomes dangerous when another thread sees partial results.
Memory models exist to provide programmers with tools to control these behaviors when necessary for correctness. They define:
Memory models balance performance and programmability. Allowing reorderings enables significant hardware and compiler optimizations. Providing synchronization primitives gives programmers control when needed. The default permits reordering; explicit synchronization constrains it.
The core concept in most concurrent memory models is the happens-before relationship. This is a partial ordering on memory operations that defines visibility guarantees.
Definition: If operation A happens-before operation B, then:
Key insight: Happens-before is not about wall-clock time. Two operations might execute at the same physical time, yet still have a happens-before relationship because of synchronization.
1234567891011121314151617181920212223242526272829303132333435363738394041
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━// PROGRAM ORDER (within a single thread)// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━// Thread A:x = 1; // (1)y = 2; // (2)// (1) happens-before (2) due to program order.// Any read of both x and y by Thread A will see these values. // ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━// SYNCHRONIZATION ORDER (between threads)// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━// Thread A:x = 1; // (1)release(lock); // (2) Release lock // Thread B:acquire(lock); // (3) Acquire same lockprint(x); // (4) // (2) happens-before (3) due to lock synchronization.// Since (1) HB (2) and (2) HB (3) and (3) HB (4):// By transitivity, (1) happens-before (4).// Therefore, Thread B is GUARANTEED to see x == 1. // ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━// NO HAPPENS-BEFORE (the dangerous case)// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━// Thread A:x = 1; // (1)flag = true; // (2) Regular (non-volatile) write // Thread B:if (flag) { // (3) Regular (non-volatile) read print(x); // (4)} // WITHOUT proper synchronization:// There is NO happens-before between (2) and (3)!// Thread B might see flag == true but x == 0 (uninitialized).// This is EXACTLY the DCL bug.To make DCL work, we need a happens-before edge from the constructor completion to the first unsynchronized read of the instance reference. Volatile writes provide this guarantee; regular writes do not.
The Java Memory Model, defined in JSR-133 (2004) and incorporated into Java 5.0, provides precise semantics for concurrent Java programs. Understanding JMM is essential for writing correct concurrent Java code.
Key JMM Guarantees:
1. Synchronized Block Semantics
When a thread enters a synchronized block, it:
When a thread exits a synchronized block, it:
All actions inside one synchronized block on a monitor happen-before actions in any later synchronized block on the same monitor.
2. Volatile Variable Semantics
A volatile read or write is like acquiring/releasing a lock on the variable:
Critical for DCL: Volatile prevents reordering across the volatile access and ensures visibility across threads.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
/** * Volatile provides happens-before edges */public class VolatileSemantics { private int nonVolatileData = 0; private volatile boolean dataReady = false; // volatile! // Thread A: Writer public void writer() { nonVolatileData = 42; // (1) Regular write dataReady = true; // (2) Volatile write // JMM guarantees: // (1) happens-before (2) due to program order // All writes before a volatile write are visible // to any thread that subsequently reads that volatile } // Thread B: Reader public void reader() { if (dataReady) { // (3) Volatile read int value = nonVolatileData; // (4) Regular read System.out.println(value); // GUARANTEED to print 42 (if dataReady was true) // JMM guarantees: // (2) happens-before (3) due to volatile semantics // Since (1) HB (2) and (2) HB (3) and (3) HB (4): // By transitivity, (1) happens-before (4) // Therefore, the read at (4) sees the write at (1) } }} /** * WITHOUT volatile, this could fail: */public class BrokenVisibility { private int nonVolatileData = 0; private boolean dataReady = false; // NOT volatile! public void writer() { nonVolatileData = 42; dataReady = true; } public void reader() { if (dataReady) { int value = nonVolatileData; System.out.println(value); // MIGHT print 0! No happens-before guarantee. } }}3. Final Field Semantics
Java provides special guarantees for final fields that are useful for immutable objects:
This is a weaker but sufficient guarantee for some use cases. It's the basis for the "Initialization-on-demand holder idiom" alternative to DCL.
4. JMM and Object Construction
The JMM specifically allows the reordering that breaks naive DCL:
instance = new Singleton();
This can be reordered to:
temp = allocate(Singleton.class); // Allocate memory
instance = temp; // Publish reference
Singleton.<init>(temp); // Run constructor
This reordering doesn't change single-threaded semantics (the caller still gets a valid object), but it allows other threads to observe the partially constructed object via the early publication of instance.
Before Java 5.0 and JSR-133, even volatile didn't provide sufficient guarantees to make DCL work. The fix required both language changes and JVM implementation updates. Code written for Java 1.4 or earlier cannot use DCL safely.
C++11 introduced a formal memory model with fine-grained control over memory ordering. Unlike Java's binary (volatile vs. non-volatile), C++ provides a spectrum of ordering constraints.
Memory Ordering Options:
| Order | Description | Use Case |
|---|---|---|
memory_order_relaxed | No ordering guarantees | Counters, statistics where order doesn't matter |
memory_order_acquire | Prevents reads from moving before this read | Reading a "ready" flag |
memory_order_release | Prevents writes from moving after this write | Writing a "ready" flag after setup |
memory_order_acq_rel | Both acquire and release semantics | Read-modify-write operations |
memory_order_seq_cst | Total ordering across all threads | When you need global ordering |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
#include <atomic>#include <thread> /** * C++11 Memory Ordering Demonstration */class VisibilityExample { int data = 0; // Regular variable std::atomic<bool> ready{false}; // Atomic variable public: void writer() { data = 42; // (1) Regular write // Release semantics: All writes before this are visible // to any thread that acquires this atomic ready.store(true, std::memory_order_release); // (2) } void reader() { // Acquire semantics: All reads after this see writes // released by the matching release operation if (ready.load(std::memory_order_acquire)) { // (3) int value = data; // (4) // GUARANTEED to see data == 42 // (1) happens-before (2) due to program order // (2) synchronizes-with (3) due to release-acquire // (3) happens-before (4) due to program order // By transitivity: (1) happens-before (4) } }}; /** * Common mistake: Using relaxed ordering */class BrokenExample { int data = 0; std::atomic<bool> ready{false}; public: void writer() { data = 42; ready.store(true, std::memory_order_relaxed); // BROKEN! } void reader() { if (ready.load(std::memory_order_relaxed)) { // BROKEN! int value = data; // May see data == 0! Relaxed provides no ordering. } }}; /** * Sequential consistency: The safest (and slowest) option */class SeqCstExample { int data = 0; std::atomic<bool> ready{false}; // Default is seq_cst public: void writer() { data = 42; ready.store(true); // Uses memory_order_seq_cst by default } void reader() { if (ready.load()) { // Uses memory_order_seq_cst by default int value = data; // GUARANTEED to see data == 42 // seq_cst is equivalent to release-acquire plus global ordering } }};Before C++11, there was no portable way to implement DCL correctly. The language simply didn't define multi-threaded semantics. The addition of std::atomic and std::memory_order finally made correct DCL possible in standard C++.
Memory barriers (also called fences) are low-level primitives that control the ordering of memory operations. They're the building blocks that volatile and atomic are built upon.
Types of Memory Barriers:
1. Store Barrier (StoreStore fence) Ensures all stores before the barrier complete before any stores after the barrier.
2. Load Barrier (LoadLoad fence) Ensures all loads before the barrier complete before any loads after the barrier.
3. Full Barrier (StoreLoad fence) The strongest barrier. Ensures all loads and stores before the barrier complete before any loads or stores after the barrier.
4. Acquire Barrier Loads can't move before this point. Used after loading a flag or lock acquisition.
5. Release Barrier Stores can't move after this point. Used before storing a flag or lock release.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
/** * How volatile in Java translates to barriers */ // Java:volatile int flag; // Writing to volatile (release semantics):// Before: x = 1; y = 2;// Barrier: [StoreStore] + [LoadStore]// After: flag = value;// Result: x and y writes complete before flag write is visible // Reading from volatile (acquire semantics):// Before: value = flag;// Barrier: [LoadLoad] + [LoadStore]// After: a = x; b = y;// Result: flag read completes before x and y reads /** * Why DCL needs these barriers */ // Thread A (creating singleton):// 1. Allocate memory// 2. Initialize fields: obj.field1 = ...; obj.field2 = ...;// 3. [Release Barrier] — ensures field writes complete// 4. Write reference: instance = obj; // Thread B (reading singleton):// 1. Read reference: temp = instance;// 2. [Acquire Barrier] — ensures subsequent reads see init// 3. If temp != null, use fields: temp.field1, temp.field2 // The release-acquire pair creates the happens-before edge// that guarantees Thread B sees the initialized fields. /** * x86 architecture (strong memory model): * - StoreStore, LoadLoad, LoadStore barriers are free (no-ops) * - Only StoreLoad requires an actual fence instruction (MFENCE) * - This is why DCL bugs often don't manifest on x86 */ /** * ARM architecture (weak memory model): * - All barrier types may require actual instructions * - DMB (Data Memory Barrier) instructions are needed * - DCL bugs are much more likely to manifest on ARM */If you're deploying to ARM (mobile devices, Raspberry Pi, AWS Graviton, Apple Silicon), test there too. Code that works on x86 may fail on ARM due to its weaker memory model. Many production DCL bugs were only discovered when services moved to ARM-based cloud instances.
Now let's apply our understanding of memory models to see exactly how volatile/atomic fixes the DCL pattern.
The Core Fix:
The instance reference must be volatile (Java) or atomic (C++). This ensures:
When Thread A writes the reference after construction:
When Thread B reads the reference:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
/** * CORRECT Double-Checked Locking in Java 5+ * The key fix is: volatile keyword on instance */public class Singleton { // volatile ensures happens-before edge from construction to read private static volatile Singleton instance = null; // ← THE FIX private Singleton() { // Constructor initializes all fields } public static Singleton getInstance() { // First check: Uses volatile read semantics // If non-null, has happens-before edge from the write if (instance == null) { synchronized (Singleton.class) { // Second check: Protected by lock if (instance == null) { // Construction completes... // Volatile write ensures all constructor // writes happen-before this assignment instance = new Singleton(); // volatile write } } } // Volatile read: sees fully constructed object or null return instance; }} /** * Trace through with happens-before: * * Thread A creates singleton: * A1: field1 = value1; // Constructor writes * A2: field2 = value2; // Constructor writes * A3: instance = this; // Volatile write * [All of A1, A2 happen-before A3 due to program order] * [A3 publishes with release semantics] * * Thread B reads singleton: * B1: temp = instance; // Volatile read * [A3 happens-before B1 due to volatile semantics] * [Therefore A1, A2 happen-before B1 by transitivity] * B2: temp.field1; // Sees value1 * B3: temp.field2; // Sees value2 */Notice that the only change required is marking the instance field as volatile (Java) or using std::atomic (C++). The algorithm structure is unchanged. But this minimal change is absolutely essential—without it, the code is broken.
We've covered the memory model foundations necessary to understand and implement DCL correctly:
What's Next
The next page presents the complete, safe implementation of DCL in multiple languages, along with alternative approaches (like holder patterns) that avoid the complexity of DCL entirely.
You now understand the memory model foundations that make DCL work (or fail). This knowledge applies far beyond DCL—to all concurrent programming where threads communicate through shared memory.