Loading content...
Imagine you're a senior engineer at a major e-commerce platform. It's Black Friday, and your checkout service is handling 50,000 requests per second. Everything looks fine in the logs—until customer support starts flooring with complaints: customers are seeing each other's shopping carts, orders are being placed with wrong quantities, and account balances are mysteriously changing.
You check the code. It looks perfectly correct. Every individual function does exactly what it should. The unit tests pass. But in production, under heavy concurrent load, the system exhibits ghostly, unpredictable behavior that vanishes when you try to reproduce it in development.
Welcome to the world of thread safety violations—one of the most insidious categories of bugs in software engineering.
By the end of this page, you will understand what thread safety truly means at a foundational level, why it's essential for any concurrent system, and the fundamental concepts that underpin all thread-safe design. You'll develop the mental model needed to reason about concurrent code and recognize potential safety violations before they become production disasters.
Thread safety is one of those terms that's often used loosely, leading to confusion. Let's establish a rigorous definition.
Thread Safety Definition: A class, method, or data structure is thread-safe if it guarantees correct behavior when accessed from multiple threads simultaneously, without requiring any additional synchronization or coordination from the caller.
The key phrase here is "guarantees correct behavior"—but what does "correct" mean in a concurrent context? This leads us to a more formal definition known as correctness under concurrent execution.
Brian Goetz, author of Java Concurrency in Practice, offers perhaps the most widely cited definition: "A class is thread-safe if it behaves correctly when accessed from multiple threads, regardless of the scheduling or interleaving of the execution of those threads by the runtime environment, and with no additional synchronization or other coordination on the part of the calling code."
Let's dissect this definition into its essential components:
1. Behaves Correctly
The class upholds its invariants—the properties that must always be true about its state. For example, a Counter class that can go negative when it shouldn't, or a BankAccount that allows the balance to become inconsistent across threads, is not behaving correctly.
2. Under Any Scheduling or Interleaving
This is the critical part. Modern operating systems can interrupt thread execution at any point and switch to another thread. A thread-safe class must work correctly regardless of:
3. Without Additional Synchronization from the Caller
The thread safety is encapsulated within the class. Callers shouldn't need to wrap every call in a lock or coordinate access between threads—that's the responsibility of the class itself.
1234567891011121314151617181920212223242526272829303132333435363738
// NOT THREAD-SAFE: This counter can produce incorrect results under concurrent accessclass UnsafeCounter { private count: number = 0; // Problem: read-modify-write is three separate operations increment(): void { this.count = this.count + 1; // Thread A reads 5, Thread B reads 5 // Both write 6, losing an increment! } getCount(): number { return this.count; }} // THREAD-SAFE (conceptually): Proper synchronization ensures correctnessclass ThreadSafeCounter { private count: number = 0; private lock: Mutex = new Mutex(); // Conceptual mutex async increment(): Promise<void> { await this.lock.acquire(); // Only one thread can hold the lock try { this.count = this.count + 1; // Now atomic from callers' perspective } finally { this.lock.release(); } } async getCount(): Promise<number> { await this.lock.acquire(); try { return this.count; } finally { this.lock.release(); } }}Thread safety isn't binary—there's a spectrum of safety guarantees that different classes and functions can provide. Understanding these levels helps you design appropriate solutions for specific concurrency requirements.
Joshua Bloch, in Effective Java, describes four levels of thread safety that have become the standard classification in the industry:
| Level | Description | Example | Caller Responsibility |
|---|---|---|---|
| Immutable | Objects that cannot be modified after construction. Always thread-safe. | String, Integer, LocalDate | None—guaranteed safe |
| Unconditionally Thread-Safe | Mutable objects with sufficient internal synchronization. Safe for all usage patterns. | ConcurrentHashMap, AtomicInteger | None—guaranteed safe |
| Conditionally Thread-Safe | Some operations require external synchronization. Safety depends on usage. | Collections.synchronizedList() during iteration | Must synchronize for specific operations |
| Thread-Compatible | No internal synchronization, but can be used safely with external locks. | ArrayList, HashMap | Must synchronize all access |
| Thread-Hostile | Cannot be used safely even with external synchronization. | Classes with static mutable state, poorly designed APIs | Redesign required |
Let's examine each level in detail:
Immutable objects are the gold standard of thread safety. Since their state cannot change after construction, there's no possibility of one thread's modifications affecting another thread. Immutable objects can be freely shared across threads with zero synchronization overhead.
Key characteristics:
this references escape during construction)Immutability is the simplest and most robust path to thread safety. When designing concurrent systems, always ask: "Can this object be immutable?" The performance cost of creating new objects is often far less than the complexity cost of managing mutable shared state.
These are mutable objects that encapsulate all necessary synchronization internally. Clients can use them concurrently without any external coordination. This is the level most "thread-safe" classes aim for.
Examples:
ConcurrentHashMap: All operations are thread-safe, including compound operations like putIfAbsentAtomicLong: Lock-free atomic operationsBlockingQueue: Thread-safe producer-consumer primitivesThis level is often misunderstood and leads to bugs. Some operations require external synchronization, but not all. The class documentation should clearly specify which operations are safe and which require coordination.
Classic example: Iterating over Collections.synchronizedList()
12345678910111213141516171819202122232425262728293031323334353637383940414243
import java.util.Collections;import java.util.List;import java.util.ArrayList; public class ConditionalSafetyExample { // This list is "conditionally thread-safe" private List<String> syncList = Collections.synchronizedList(new ArrayList<>()); // SAFE: Individual operations are synchronized public void addItem(String item) { syncList.add(item); // Thread-safe - uses internal lock } public int getSize() { return syncList.size(); // Thread-safe - uses internal lock } // UNSAFE: Compound operation without synchronization public void addIfAbsentBroken(String item) { // BUG: Between contains() and add(), another thread could add the same item if (!syncList.contains(item)) { syncList.add(item); // Not atomic! } } // UNSAFE: Iteration without synchronization public void printAllBroken() { // BUG: Another thread could modify list during iteration for (String item : syncList) { // ConcurrentModificationException possible! System.out.println(item); } } // SAFE: Proper synchronization during iteration public void printAllCorrect() { synchronized (syncList) { // Must hold lock for entire iteration for (String item : syncList) { System.out.println(item); } } }}Most classes in standard libraries fall into this category. They're not internally synchronized, but they can be used safely if the caller provides proper synchronization. This is the default for most classes because:
Examples: ArrayList, HashMap, StringBuilder
These classes cannot be made thread-safe even with external synchronization, often due to:
Thread-hostile classes should be redesigned or avoided in concurrent contexts.
Thread safety problems arise from three fundamental sources. Understanding these pillars is essential for both diagnosing issues and designing thread-safe code.
The three pillars are:
Violating any of these pillars can cause thread safety failures.
count++) involves read, add, and write—three separate operations that another thread can interleave.running = false, but Thread B's loop on while(running) never terminates because it reads a stale cached value.Each programming language/platform has a memory model that defines exactly what guarantees are provided about visibility and ordering. Java has the Java Memory Model (JMM), C++ has its memory model (since C++11), and so on. Thread-safe code depends on understanding and respecting these memory model guarantees.
12345678910111213141516171819202122232425262728293031323334353637383940
public class VisibilityProblem { // Without 'volatile', this change may NEVER be seen by the reader thread! private boolean running = true; // BUG: Missing volatile private int value = 0; // Writer thread public void stop() { value = 42; // This write might not be visible either running = false; // This might be cached in register, never written to main memory } // Reader thread - may loop FOREVER even after stop() is called public void run() { while (running) { // JVM is free to optimize this to: if (running) { while(true) {} } // because it sees 'running' is never modified in this thread's view } System.out.println("Value: " + value); // Might print 0! }} // CORRECT: Using volatile for visibilitypublic class VisibilityFixed { private volatile boolean running = true; // volatile ensures visibility private volatile int value = 0; // visibility guaranteed public void stop() { value = 42; // Happens-before the write to running running = false; // This write is guaranteed to be visible } public void run() { while (running) { // Loop will terminate when stop() is called } System.out.println("Value: " + value); // Guaranteed to print 42 }}Thread safety isn't just "another concern" to add to your checklist—it's fundamentally more challenging than sequential programming. Understanding why it's difficult helps you approach concurrent code with appropriate rigor.
The core challenge: In sequential programming, you reason about one execution path. In concurrent programming, you must reason about all possible interleavings of multiple execution paths.
Consider two threads executing just two instructions each. The number of possible interleavings is:
Thread A: A1, A2Thread B: B1, B2 Possible interleavings (6 total):1. A1 → A2 → B1 → B22. A1 → B1 → A2 → B23. A1 → B1 → B2 → A24. B1 → A1 → A2 → B25. B1 → A1 → B2 → A26. B1 → B2 → A1 → A2 For n threads with m operations each: (n * m)! / (m!)^n interleavings Example: 3 threads × 10 operations each = 30!/(10!)³ ≈ 5.5 × 10¹⁸ interleavings This is why testing CANNOT prove thread safety—you can only test a tiny fraction of possible executions.This combinatorial explosion of possible states leads to several characteristics that make concurrent programming uniquely challenging:
"We tested it extensively and found no concurrency bugs" is NOT evidence of thread safety. Due to the astronomical number of possible interleavings, testing can only prove the presence of bugs, never their absence. Thread safety must be established through careful design and reasoning, not just testing.
Real-world example of non-determinism:
A team deployed a payment processing system that passed all tests—including concurrent stress tests. After 3 months in production, at 2:47 AM during a traffic spike, a customer was charged twice for a single purchase. The conditions that triggered the bug: exactly 47 concurrent requests, a garbage collection pause of 23ms on a specific server, and network latency variance that caused two threads to hit a specific code path within 0.1ms of each other.
No test could have predicted this. The bug required reasoning about the code's invariants under all possible interleavings—which revealed that a compound operation wasn't properly atomic.
Thread-safe design is fundamentally about preserving invariants in the presence of concurrent access. An invariant is a condition that must always be true about an object's state. Let's formalize this concept.
Class Invariant: A predicate that holds true before and after every public method execution.
For example, a BoundedBuffer class might have these invariants:
count >= 0 && count <= capacitywriteIndex always points to the next empty slotcount == 0, the buffer is emptycount == capacity, the buffer is fullIn single-threaded code, you only need to ensure invariants hold after each method completes. In concurrent code, you must ensure invariants hold even while multiple methods execute simultaneously.
12345678910111213141516171819202122232425262728293031323334353637383940414243
public class InvariantViolation { // Invariant: items.length == capacity, 0 <= count <= capacity private final Object[] items; private final int capacity; private int count = 0; private int putIndex = 0; private int takeIndex = 0; public InvariantViolation(int capacity) { this.capacity = capacity; this.items = new Object[capacity]; } // BUG: Invariant can be violated between check and modification public void put(Object item) throws InterruptedException { // WINDOW OF VULNERABILITY BEGINS while (count == capacity) { // <- Thread A checks: count=10, capacity=10 wait(); // <- Thread A waits } // Thread B could decrement count here, but Thread C might also be // waking up and about to put—both think there's space! items[putIndex] = item; // <- Both threads might write to same slot! putIndex = (putIndex + 1) % capacity; count++; // <- count could exceed capacity! // WINDOW OF VULNERABILITY ENDS notifyAll(); } // CORRECT: Synchronized to protect invariant public synchronized void putSafe(Object item) throws InterruptedException { while (count == capacity) { wait(); // Releases lock while waiting } // No other thread can execute putSafe/takeSafe while we hold the lock items[putIndex] = item; putIndex = (putIndex + 1) % capacity; count++; notifyAll(); }}Thread Safety Contracts
Beyond invariants, thread-safe classes should have clear contracts that document:
What consistency guarantees are provided?
What synchronization is required from the caller (if any)?
What are the thread-safety boundaries?
Documentation example:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
/** * A thread-safe counter with atomic increment and decrement operations. * * <h2>Thread Safety</h2> * This class is <b>unconditionally thread-safe</b>. All individual operations * are atomic with respect to concurrent access. * * <h3>Atomicity</h3> * Each method (increment, decrement, getValue) is atomic. However, compound operations * combining multiple methods are NOT atomic without external synchronization. * * For example, the following is NOT safe: * <pre> * // BUG: Not atomic - another thread could modify between calls * if (counter.getValue() > 0) { * counter.decrement(); * } * </pre> * * <h3>Visibility</h3> * All writes are immediately visible to subsequent reads from any thread. * * <h3>Ordering</h3> * Operations by a single thread appear in program order. Operations across * threads are linearizable - they appear to occur in some total order consistent * with the real-time ordering of operations. * * @ThreadSafe */public class AtomicCounter { private final AtomicLong value = new AtomicLong(0); /** Atomically increments and returns the new value. */ public long increment() { return value.incrementAndGet(); } /** Atomically decrements and returns the new value. */ public long decrement() { return value.decrementAndGet(); } /** Returns the current value. */ public long getValue() { return value.get(); } /** * Atomically decrements if value is positive. * <p> * Use this instead of check-then-decrement for safe conditional decrement. * * @return true if decrement was performed, false if value was already 0 */ public boolean decrementIfPositive() { while (true) { long current = value.get(); if (current <= 0) { return false; } if (value.compareAndSet(current, current - 1)) { return true; } // CAS failed due to concurrent modification - retry } }}Before moving forward in our exploration of thread safety, let's address and dispel several common misconceptions that lead developers astray.
i++ are compound (read + modify + write)Let's examine some of these in detail:
This is one of the most dangerous misconceptions. In most languages:
count++ is NOT atomic (read, add, write)longValue = 42L may NOT be atomic on 32-bit systems (two 32-bit writes)reference = newObject may NOT be atomic in some memory modelslist.add(item) is NOT atomic (resize, copy, increment size)Only operations explicitly guaranteed as atomic by your language/platform are safe.
Using thread-safe collections everywhere does NOT make your code thread-safe. A ConcurrentHashMap guarantees individual operations are atomic, but sequences of operations (get, modify, put) are not atomic. Two thread-safe operations do not compose into a single thread-safe operation.
1234567891011121314151617181920212223242526272829303132333435363738394041
import java.util.concurrent.ConcurrentHashMap; public class CompositionFallacy { // ConcurrentHashMap is a thread-safe data structure private ConcurrentHashMap<String, Integer> counts = new ConcurrentHashMap<>(); // BUG: Not thread-safe despite using ConcurrentHashMap! public void incrementBroken(String key) { Integer current = counts.get(key); // Thread A gets 5 // Thread B gets 5 if (current == null) { counts.put(key, 1); // Both might put 1 } else { counts.put(key, current + 1); // Both put 6, should be 7! } } // CORRECT: Use atomic operation public void incrementCorrect(String key) { counts.merge(key, 1, Integer::sum); // Atomic compute operation // Or use compute: // counts.compute(key, (k, v) -> v == null ? 1 : v + 1); } // ALSO CORRECT: Compare-and-swap loop public void incrementCAS(String key) { while (true) { Integer current = counts.get(key); Integer next = (current == null) ? 1 : current + 1; if (current == null) { if (counts.putIfAbsent(key, next) == null) return; } else { if (counts.replace(key, current, next)) return; } // CAS failed - retry } }}We've established the foundational concepts of thread safety that will guide all our subsequent discussions. Let's consolidate the key takeaways:
What's next:
Now that we understand what thread safety means and why it's challenging, we'll explore the root cause of most thread safety problems: shared mutable state. The next page examines why sharing and mutability together create the conditions for concurrency bugs, and how this insight guides our design decisions.
You now have a rigorous understanding of what thread safety means, its levels, the three pillars that underpin it, and why concurrent reasoning is fundamentally challenging. This foundation prepares you to explore the specific problems—shared mutable state and race conditions—and the design patterns that address them.