Double Checked Locking - Learning Module

Loading content...

0/246

Memory Model Considerations

Understanding Memory Models

A memory model is a formal specification that defines how threads interact through memory in a concurrent program. It answers critical questions: When one thread writes to a variable, when (if ever) is that write visible to other threads? What reorderings of memory operations are permitted by the compiler and hardware?

Without understanding memory models, it's impossible to write correct concurrent code. The breakdown of Double-Checked Locking in its naive form is a direct consequence of the gap between what programmers assume about memory behavior and what memory models actually guarantee.

In this page, we explore the memory models of Java, C++, and other languages, understanding the primitives they provide to control memory visibility and ordering. This knowledge is essential for implementing DCL correctly—and for all concurrent programming.

What You Will Learn

By the end of this page, you will understand what memory models are, why they're necessary, the specific guarantees provided by volatile (Java), atomic (C++), and memory barriers. You'll understand the "happens-before" relationship and why it's the key to making DCL work.

Why Memory Models Exist

Modern computers don't execute instructions in the simple, sequential manner that source code suggests. Several layers of optimization can reorder and transform memory operations:

1. Compiler Optimizations

Compilers analyze code and reorder instructions for better CPU utilization. They may:

Hoist loads out of loops (read once instead of every iteration)
Sink stores (delay writes until necessary)
Eliminate redundant reads (cache value in a register)
Reorder independent operations to fill pipeline bubbles

2. CPU Instruction Reordering

CPUs execute instructions out-of-order to maximize throughput. While they maintain the illusion of sequential execution for a single thread, operations may complete in a different order than issued.

3. CPU Cache Hierarchies

Each CPU core has its own L1/L2 cache. Writes to memory may sit in a core's cache and not be immediately visible to other cores. Cache coherency protocols eventually synchronize caches, but the timing is not guaranteed without explicit barriers.

4. Store Buffers and Invalidation Queues

CPUs use store buffers to decouple execution from memory writes. A core might "see" its own store immediately but other cores don't see it until the buffer drains.

Memory Operation Reorderings by Architecture
Reordering Type	x86/x64	ARM	POWER	Description
Store-Store	No	Yes	Yes	Stores can complete out of order
Load-Load	No	Yes	Yes	Loads can complete out of order
Load-Store	No	Yes	Yes	Load can move after later store
Store-Load	Yes	Yes	Yes	Store can move after later load

The Fundamental Problem

All these optimizations are transparent to single-threaded code—the final result is always as if instructions executed in order. But multi-threaded code observes intermediate states. What's harmless for one thread becomes dangerous when another thread sees partial results.

Memory models exist to provide programmers with tools to control these behaviors when necessary for correctness. They define:

What reorderings are permitted by default
What mechanisms enforce ordering between threads
What visibility guarantees synchronization primitives provide

The Tradeoff

Memory models balance performance and programmability. Allowing reorderings enables significant hardware and compiler optimizations. Providing synchronization primitives gives programmers control when needed. The default permits reordering; explicit synchronization constrains it.

The Happens-Before Relationship

The core concept in most concurrent memory models is the happens-before relationship. This is a partial ordering on memory operations that defines visibility guarantees.

Definition: If operation A happens-before operation B, then:

A is fully completed before B starts (from B's perspective)
Any memory effects of A are visible to B

Key insight: Happens-before is not about wall-clock time. Two operations might execute at the same physical time, yet still have a happens-before relationship because of synchronization.

happens-before-examples.ts
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// PROGRAM ORDER (within a single thread)
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Thread A:
x = 1;      // (1)
y = 2;      // (2)
// (1) happens-before (2) due to program order.
// Any read of both x and y by Thread A will see these values.
 
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// SYNCHRONIZATION ORDER (between threads)
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Thread A:
x = 1;              // (1)
release(lock);      // (2) Release lock
 
// Thread B:
acquire(lock);      // (3) Acquire same lock
print(x);           // (4)
 
// (2) happens-before (3) due to lock synchronization.
// Since (1) HB (2) and (2) HB (3) and (3) HB (4):
// By transitivity, (1) happens-before (4).
// Therefore, Thread B is GUARANTEED to see x == 1.
 
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// NO HAPPENS-BEFORE (the dangerous case)
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Thread A:
x = 1;              // (1)
flag = true;        // (2) Regular (non-volatile) write
 
// Thread B:
if (flag) {         // (3) Regular (non-volatile) read
    print(x);       // (4)
}
 
// WITHOUT proper synchronization:
// There is NO happens-before between (2) and (3)!
// Thread B might see flag == true but x == 0 (uninitialized).
// This is EXACTLY the DCL bug.

Common Happens-Before Rules

•Program Order Rule — Each action in a thread happens-before every action that comes later in program order within that thread.
•Monitor Lock Rule — An unlock on a monitor happens-before every subsequent lock on that same monitor.
•Volatile Variable Rule — A write to a volatile field happens-before every subsequent read of that same volatile field.
•Thread Start Rule — A call to Thread.start() happens-before any action in the started thread.
•Thread Termination Rule — Any action in a thread happens-before another thread detects that thread's termination.
•Transitivity — If A happens-before B, and B happens-before C, then A happens-before C.

The Key Insight for DCL

To make DCL work, we need a happens-before edge from the constructor completion to the first unsynchronized read of the instance reference. Volatile writes provide this guarantee; regular writes do not.

Java Memory Model (JMM)

The Java Memory Model, defined in JSR-133 (2004) and incorporated into Java 5.0, provides precise semantics for concurrent Java programs. Understanding JMM is essential for writing correct concurrent Java code.

Key JMM Guarantees:

1. Synchronized Block Semantics

When a thread enters a synchronized block, it:

Acquires the monitor (lock)
Invalidates its cache for the synchronized object's fields
Reads from main memory

When a thread exits a synchronized block, it:

Flushes writes to main memory
Releases the monitor

All actions inside one synchronized block on a monitor happen-before actions in any later synchronized block on the same monitor.

2. Volatile Variable Semantics

A volatile read or write is like acquiring/releasing a lock on the variable:

Volatile writes flush to main memory immediately
Volatile reads read from main memory (not cache)
A volatile write happens-before any subsequent volatile read of the same variable

Critical for DCL: Volatile prevents reordering across the volatile access and ensures visibility across threads.

jmm-volatile-semantics.java
Java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
/**
 * Volatile provides happens-before edges
 */
public class VolatileSemantics {
    private int nonVolatileData = 0;
    private volatile boolean dataReady = false;  // volatile!
    
    // Thread A: Writer
    public void writer() {
        nonVolatileData = 42;       // (1) Regular write
        dataReady = true;            // (2) Volatile write
        
        // JMM guarantees:
        // (1) happens-before (2) due to program order
        // All writes before a volatile write are visible
        // to any thread that subsequently reads that volatile
    }
    
    // Thread B: Reader
    public void reader() {
        if (dataReady) {             // (3) Volatile read
            int value = nonVolatileData;  // (4) Regular read
            System.out.println(value);
            // GUARANTEED to print 42 (if dataReady was true)
            
            // JMM guarantees:
            // (2) happens-before (3) due to volatile semantics
            // Since (1) HB (2) and (2) HB (3) and (3) HB (4):
            // By transitivity, (1) happens-before (4)
            // Therefore, the read at (4) sees the write at (1)
        }
    }
}
 
/**
 * WITHOUT volatile, this could fail:
 */
public class BrokenVisibility {
    private int nonVolatileData = 0;
    private boolean dataReady = false;  // NOT volatile!
    
    public void writer() {
        nonVolatileData = 42;
        dataReady = true;
    }
    
    public void reader() {
        if (dataReady) {
            int value = nonVolatileData;
            System.out.println(value);
            // MIGHT print 0! No happens-before guarantee.
        }
    }
}

3. Final Field Semantics

Java provides special guarantees for final fields that are useful for immutable objects:

If an object is properly constructed (constructor completes, reference not leaked during construction), any thread that obtains a reference to the object is guaranteed to see the correctly initialized final fields.

This is a weaker but sufficient guarantee for some use cases. It's the basis for the "Initialization-on-demand holder idiom" alternative to DCL.

4. JMM and Object Construction

The JMM specifically allows the reordering that breaks naive DCL:

instance = new Singleton();

This can be reordered to:

temp = allocate(Singleton.class);  // Allocate memory
instance = temp;                    // Publish reference
Singleton.<init>(temp);             // Run constructor

This reordering doesn't change single-threaded semantics (the caller still gets a valid object), but it allows other threads to observe the partially constructed object via the early publication of instance.

Pre-Java 5 Was Broken

Before Java 5.0 and JSR-133, even volatile didn't provide sufficient guarantees to make DCL work. The fix required both language changes and JVM implementation updates. Code written for Java 1.4 or earlier cannot use DCL safely.

C++ Memory Model

C++11 introduced a formal memory model with fine-grained control over memory ordering. Unlike Java's binary (volatile vs. non-volatile), C++ provides a spectrum of ordering constraints.

Memory Ordering Options:

Order	Description	Use Case
`memory_order_relaxed`	No ordering guarantees	Counters, statistics where order doesn't matter
`memory_order_acquire`	Prevents reads from moving before this read	Reading a "ready" flag
`memory_order_release`	Prevents writes from moving after this write	Writing a "ready" flag after setup
`memory_order_acq_rel`	Both acquire and release semantics	Read-modify-write operations
`memory_order_seq_cst`	Total ordering across all threads	When you need global ordering

cpp-memory-ordering.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
#include <atomic>
#include <thread>
 
/**
 * C++11 Memory Ordering Demonstration
 */
class VisibilityExample {
    int data = 0;                              // Regular variable
    std::atomic<bool> ready{false};            // Atomic variable
    
public:
    void writer() {
        data = 42;                              // (1) Regular write
        
        // Release semantics: All writes before this are visible
        // to any thread that acquires this atomic
        ready.store(true, std::memory_order_release);  // (2)
    }
    
    void reader() {
        // Acquire semantics: All reads after this see writes
        // released by the matching release operation
        if (ready.load(std::memory_order_acquire)) {   // (3)
            int value = data;                          // (4)
            
            // GUARANTEED to see data == 42
            // (1) happens-before (2) due to program order
            // (2) synchronizes-with (3) due to release-acquire
            // (3) happens-before (4) due to program order
            // By transitivity: (1) happens-before (4)
        }
    }
};
 
/**
 * Common mistake: Using relaxed ordering
 */
class BrokenExample {
    int data = 0;
    std::atomic<bool> ready{false};
    
public:
    void writer() {
        data = 42;
        ready.store(true, std::memory_order_relaxed);  // BROKEN!
    }
    
    void reader() {
        if (ready.load(std::memory_order_relaxed)) {   // BROKEN!
            int value = data;
            // May see data == 0! Relaxed provides no ordering.
        }
    }
};
 
/**
 * Sequential consistency: The safest (and slowest) option
 */
class SeqCstExample {
    int data = 0;
    std::atomic<bool> ready{false};  // Default is seq_cst
    
public:
    void writer() {
        data = 42;
        ready.store(true);  // Uses memory_order_seq_cst by default
    }
    
    void reader() {
        if (ready.load()) {  // Uses memory_order_seq_cst by default
            int value = data;
            // GUARANTEED to see data == 42
            // seq_cst is equivalent to release-acquire plus global ordering
        }
    }
};

C++ Synchronization Primitives for DCL

•std::atomic<T>* — Used for the instance pointer. Requires at least release-acquire ordering.
•std::atomic_thread_fence — Explicit memory barriers (usually not needed with atomics).
•std::mutex — Provides acquire on lock, release on unlock. The second check is protected.
•std::call_once / std::once_flag — C++ provides a direct solution for the DCL use case.

C++11 Made DCL Possible

Before C++11, there was no portable way to implement DCL correctly. The language simply didn't define multi-threaded semantics. The addition of std::atomic and std::memory_order finally made correct DCL possible in standard C++.

Memory Barriers and Fences

Memory barriers (also called fences) are low-level primitives that control the ordering of memory operations. They're the building blocks that volatile and atomic are built upon.

Types of Memory Barriers:

1. Store Barrier (StoreStore fence) Ensures all stores before the barrier complete before any stores after the barrier.

2. Load Barrier (LoadLoad fence) Ensures all loads before the barrier complete before any loads after the barrier.

3. Full Barrier (StoreLoad fence) The strongest barrier. Ensures all loads and stores before the barrier complete before any loads or stores after the barrier.

4. Acquire Barrier Loads can't move before this point. Used after loading a flag or lock acquisition.

5. Release Barrier Stores can't move after this point. Used before storing a flag or lock release.

memory-barriers-pseudocode.ts
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
/**
 * How volatile in Java translates to barriers
 */
 
// Java:
volatile int flag;
 
// Writing to volatile (release semantics):
// Before: x = 1; y = 2;
// Barrier: [StoreStore] + [LoadStore]
// After: flag = value;
// Result: x and y writes complete before flag write is visible
 
// Reading from volatile (acquire semantics):
// Before: value = flag;
// Barrier: [LoadLoad] + [LoadStore]
// After: a = x; b = y;
// Result: flag read completes before x and y reads
 
/**
 * Why DCL needs these barriers
 */
 
// Thread A (creating singleton):
//   1. Allocate memory
//   2. Initialize fields: obj.field1 = ...; obj.field2 = ...;
//   3. [Release Barrier] — ensures field writes complete
//   4. Write reference: instance = obj;
 
// Thread B (reading singleton):
//   1. Read reference: temp = instance;
//   2. [Acquire Barrier] — ensures subsequent reads see init
//   3. If temp != null, use fields: temp.field1, temp.field2
 
// The release-acquire pair creates the happens-before edge
// that guarantees Thread B sees the initialized fields.
 
/**
 * x86 architecture (strong memory model):
 * - StoreStore, LoadLoad, LoadStore barriers are free (no-ops)
 * - Only StoreLoad requires an actual fence instruction (MFENCE)
 * - This is why DCL bugs often don't manifest on x86
 */
 
/**
 * ARM architecture (weak memory model):
 * - All barrier types may require actual instructions
 * - DMB (Data Memory Barrier) instructions are needed
 * - DCL bugs are much more likely to manifest on ARM
 */

Strong Memory Models (x86)

•Only Store-Load can be reordered
•Most barriers are no-ops
•DCL bugs are rare
•Gives false confidence in broken code

Weak Memory Models (ARM, POWER)

•All reordering types permitted
•Barriers have real cost
•DCL bugs manifest frequently
•Honest about broken code

Don't Test Only on x86

If you're deploying to ARM (mobile devices, Raspberry Pi, AWS Graviton, Apple Silicon), test there too. Code that works on x86 may fail on ARM due to its weaker memory model. Many production DCL bugs were only discovered when services moved to ARM-based cloud instances.

Applying Memory Models to DCL

Now let's apply our understanding of memory models to see exactly how volatile/atomic fixes the DCL pattern.

The Core Fix:

The instance reference must be volatile (Java) or atomic (C++). This ensures:

When Thread A writes the reference after construction:
- All constructor writes happen-before the volatile write
- The reference is visible to other threads only after the object is fully initialized
When Thread B reads the reference:
- If non-null, Thread B has a happens-before edge from the volatile write
- All constructor writes are visible to Thread B via transitivity

dcl-with-memory-model.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
/**
 * CORRECT Double-Checked Locking in Java 5+
 * The key fix is: volatile keyword on instance
 */
public class Singleton {
    // volatile ensures happens-before edge from construction to read
    private static volatile Singleton instance = null;  // ← THE FIX
    
    private Singleton() {
        // Constructor initializes all fields
    }
    
    public static Singleton getInstance() {
        // First check: Uses volatile read semantics
        // If non-null, has happens-before edge from the write
        if (instance == null) {
            synchronized (Singleton.class) {
                // Second check: Protected by lock
                if (instance == null) {
                    // Construction completes...
                    // Volatile write ensures all constructor
                    // writes happen-before this assignment
                    instance = new Singleton();  // volatile write
                }
            }
        }
        // Volatile read: sees fully constructed object or null
        return instance;
    }
}
 
/**
 * Trace through with happens-before:
 * 
 * Thread A creates singleton:
 * A1: field1 = value1;           // Constructor writes
 * A2: field2 = value2;           // Constructor writes
 * A3: instance = this;           // Volatile write
 *     [All of A1, A2 happen-before A3 due to program order]
 *     [A3 publishes with release semantics]
 * 
 * Thread B reads singleton:
 * B1: temp = instance;           // Volatile read
 *     [A3 happens-before B1 due to volatile semantics]
 *     [Therefore A1, A2 happen-before B1 by transitivity]
 * B2: temp.field1;               // Sees value1
 * B3: temp.field2;               // Sees value2
 */

The Fix Is Minimal

Notice that the only change required is marking the instance field as volatile (Java) or using std::atomic (C++). The algorithm structure is unchanged. But this minimal change is absolutely essential—without it, the code is broken.

Summary: Memory Model Essentials for DCL

We've covered the memory model foundations necessary to understand and implement DCL correctly:

Key Takeaways

•Memory models define visibility and ordering — They specify what reorderings are allowed and what synchronization guarantees.
•Happens-before is the fundamental relationship — It defines when writes by one thread are visible to reads by another.
•Volatile (Java) and atomic (C++) provide happens-before edges — A volatile write happens-before any subsequent volatile read.
•The DCL fix is adding volatile/atomic to the instance reference — This creates the necessary happens-before edge from construction to read.
•Different architectures have different memory models — x86 is forgiving (strong); ARM/POWER are strict (weak). Test on weak-memory architectures.
•Memory barriers are the underlying mechanism — Volatile/atomic are built on memory barriers that control hardware reordering.

What's Next

The next page presents the complete, safe implementation of DCL in multiple languages, along with alternative approaches (like holder patterns) that avoid the complexity of DCL entirely.

Page Complete

You now understand the memory model foundations that make DCL work (or fail). This knowledge applies far beyond DCL—to all concurrent programming where threads communicate through shared memory.