Loading learning content...
When code works correctly in isolation but fails mysteriously when multiple threads execute it simultaneously, we have a thread safety problem. These failures are among the most insidious bugs in software—they're non-deterministic, hard to reproduce, often platform-dependent, and can lurk for years before manifesting in production.
Thread safety is the property that guarantees code behaves correctly regardless of how many threads are executing it simultaneously, and regardless of how the operating system interleaves their execution. Achieving thread safety requires understanding what makes code unsafe, and systematically applying techniques to eliminate those hazards.
By the end of this page, you will understand: (1) the precise definition of thread safety, (2) the taxonomy of thread safety levels, (3) common sources of thread-unsafe behavior, (4) techniques for achieving thread safety, (5) how to evaluate code for thread safety, and (6) practical patterns for building thread-safe systems.
Thread safety has multiple definitions, each emphasizing different aspects:
Behavioral Definition:
A function or class is thread-safe if and only if it behaves correctly when called simultaneously from multiple threads, regardless of the scheduling or interleaving of those threads.
Invariant-Preserving Definition:
Code is thread-safe if it maintains all invariants and postconditions even when multiple threads access it concurrently.
Practical Definition:
Thread-safe code can be used from multiple threads without additional synchronization by the caller.
The key insight is that thread safety is about correctness under all possible interleavings. The scheduler can switch threads at any point—between any two machine instructions—and thread-safe code must work correctly in all cases.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
// NOT thread-safe: shared mutable state without protectionstatic int global_counter = 0; void increment() { global_counter++; // Data race! // Actually three operations: // 1. Read global_counter // 2. Add 1 // 3. Write global_counter // Threads can interleave any of these} // Consider two threads calling increment():// Thread A: read(0) // Thread B: read(0)// Thread A: add(1)// Thread B: add(1)// Thread A: write(1)// Thread B: write(1)// Result: counter = 1 (should be 2!) // Thread-safe version using mutexstatic int safe_counter = 0;static pthread_mutex_t counter_mutex = PTHREAD_MUTEX_INITIALIZER; void safe_increment() { pthread_mutex_lock(&counter_mutex); safe_counter++; // Protected by mutex pthread_mutex_unlock(&counter_mutex);} // Thread-safe version using atomicsstatic atomic_int atomic_counter = 0; void atomic_increment() { atomic_fetch_add(&atomic_counter, 1); // Single atomic operation} // IMPORTANT: Thread safety is a property of usage, not just implementation// This is thread-safe for reads but NOT for read-modify-writestatic atomic_int value = 0; void NOT_safe_usage() { // Even with atomics, this is NOT thread-safe! int old = atomic_load(&value); // Another thread could change value here! atomic_store(&value, old + 1); // This is a TOCTOU (time-of-check to time-of-use) bug} void SAFE_usage() { atomic_fetch_add(&value, 1); // Atomically reads, adds, writes}Using atomic variables doesn't automatically make code thread-safe. If your logic requires multiple operations to appear atomic (read-modify-write, compare-then-act, put-if-absent), you need either compound atomic operations (CAS loops, fetch_add) or explicit locks. Individual atomic loads and stores can still suffer from race conditions in the aggregate.
Thread safety isn't binary—there's a spectrum of safety levels, each with different guarantees and implications. Understanding this taxonomy helps you choose appropriate designs and document APIs correctly.
| Level | Description | Examples |
|---|---|---|
| Immutable/Constant | Data never changes after construction; inherently thread-safe | String literals, const objects, final fields |
| Thread-Safe | All public operations are safe to call concurrently; internal synchronization | ConcurrentHashMap, thread-safe singletons |
| Conditionally Safe | Safe if certain conditions are met (different objects, or specific operations) | Different threads using different std::vector instances |
| Thread-Compatible | No internal sync, but external sync makes it safe | std::vector, std::string (C++) |
| Thread-Hostile | Cannot be made safe even with external synchronization | Functions modifying global state that can't be locked |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
#include <mutex>#include <shared_mutex>#include <vector>#include <string> // Level 1: IMMUTABLE - Inherently thread-safeclass ImmutableConfig { const std::string name_; const int value_;public: ImmutableConfig(std::string name, int value) : name_(std::move(name)), value_(value) {} // All operations are read-only - always safe const std::string& name() const { return name_; } int value() const { return value_; }}; // Level 2: THREAD-SAFE - Safe for concurrent accessclass ThreadSafeCounter { mutable std::mutex mutex_; int count_ = 0;public: void increment() { std::lock_guard lock(mutex_); ++count_; } int get() const { std::lock_guard lock(mutex_); return count_; }}; // Level 3: CONDITIONALLY SAFE - Safe under specific conditions// std::vector: safe if different threads access different instances// or if only reading (no modification)class Reader { const std::vector<int>& data_; // reference to immutable datapublic: explicit Reader(const std::vector<int>& data) : data_(data) {} // Safe: only reading, data doesn't change int sum() const { int total = 0; for (int x : data_) total += x; return total; }}; // Level 4: THREAD-COMPATIBLE - Needs external synchronizationclass MessageQueue { std::vector<std::string> messages_; // No internal mutex - caller must synchronizepublic: void push(std::string msg) { messages_.push_back(std::move(msg)); } std::string pop() { if (messages_.empty()) return ""; std::string msg = std::move(messages_.back()); messages_.pop_back(); return msg; }}; // Usage with external synchronizationclass SafeMessageQueue { MessageQueue queue_; std::mutex mutex_;public: void push(std::string msg) { std::lock_guard lock(mutex_); queue_.push(std::move(msg)); } std::string pop() { std::lock_guard lock(mutex_); return queue_.pop(); }}; // Level 5: THREAD-HOSTILE - Avoid!// Example: old strtok() that uses static buffer// Can't be made safe without reimplementingchar* strtok_toxic(char* str, const char* delim) { static char* saved; // Static state = thread-hostile // Any attempt to lock in caller doesn't help // because the static state is internal return strtok(str, delim);}Every public class and function should document its thread safety level. Use comments like '@threadsafe', '@thread-compatible', or '@not-thread-safe'. This helps users understand what synchronization (if any) they must provide. The C++ Standard Library uses 'basic exception safety' terminology—consider adopting similar standardized phrasing for thread safety.
Understanding common thread safety hazards helps you identify and prevent them. Here are the primary sources of thread-unsafe behavior:
counter++, list.append(item), or map.put(key, value) involve multiple steps that can be interleaved.123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293
// HAZARD 1: Check-Then-Actstatic int tickets = 100; // WRONG: Race condition between check and decrementint buy_ticket_unsafe() { if (tickets > 0) { // Another thread could decrement tickets here! tickets--; return 1; // Success } return 0; // Failure} // CORRECT: Atomic operationstatic atomic_int atomic_tickets = 100; int buy_ticket_safe() { int old; do { old = atomic_load(&atomic_tickets); if (old <= 0) return 0; // No tickets } while (!atomic_compare_exchange_weak(&atomic_tickets, &old, old - 1)); return 1; // Success} // HAZARD 2: Lazy initializationstatic ExpensiveObject *singleton = NULL; // WRONG: Multiple threads may create multiple instancesExpensiveObject *get_instance_unsafe() { if (singleton == NULL) { singleton = create_expensive_object(); } return singleton;} // CORRECT: Double-checked locking with proper memory barriersstatic _Atomic(ExpensiveObject *) atomic_singleton = NULL;static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER; ExpensiveObject *get_instance_safe() { ExpensiveObject *result = atomic_load_explicit(&atomic_singleton, memory_order_acquire); if (result == NULL) { pthread_mutex_lock(&init_mutex); result = atomic_load_explicit(&atomic_singleton, memory_order_relaxed); if (result == NULL) { result = create_expensive_object(); atomic_store_explicit(&atomic_singleton, result, memory_order_release); } pthread_mutex_unlock(&init_mutex); } return result;} // HAZARD 3: Escaped 'this' referencetypedef struct { int value; pthread_t thread; // ...} SelfStartingObject; // WRONG: Object registered/used before construction completesSelfStartingObject *create_self_starting_object_unsafe() { SelfStartingObject *obj = malloc(sizeof(SelfStartingObject)); // Starting thread BEFORE initialization is complete! pthread_create(&obj->thread, NULL, object_thread, obj); // Thread might run while we're still initializing... obj->value = compute_initial_value(); // Too late! return obj;} // CORRECT: Complete initialization before any concurrent accessSelfStartingObject *create_self_starting_object_safe() { SelfStartingObject *obj = malloc(sizeof(SelfStartingObject)); // Initialize FIRST obj->value = compute_initial_value(); // Memory barrier ensures initialization visible before thread starts atomic_thread_fence(memory_order_release); // NOW start the thread pthread_create(&obj->thread, NULL, object_thread, obj); return obj;}Even without data races, thread safety can fail due to visibility issues. Without proper memory barriers (synchronization), writes by one thread may not be visible to other threads due to CPU caching and compiler optimizations. Using mutexes provides implicit barriers; with atomics, you must choose appropriate memory orderings (acquire/release).
There are several fundamental approaches to achieving thread safety, each with different tradeoffs:
Immutability is the simplest and most powerful technique. If objects never change after construction, they're inherently thread-safe—no synchronization needed.
1234567891011121314151617
class ImmutablePoint { const int x_; const int y_;public: ImmutablePoint(int x, int y) : x_(x), y_(y) {} int x() const { return x_; } int y() const { return y_; } // "Modification" creates a new object ImmutablePoint moved(int dx, int dy) const { return ImmutablePoint(x_ + dx, y_ + dy); }}; // Safe to share across any number of threads// Zero synchronization overheadAdvantages: No locks, no contention, no deadlocks, no races
Disadvantages: Memory overhead for creating new objects on 'modification'; not always practical
Prefer techniques in this order: (1) Immutability—no moving parts. (2) Thread confinement—no sharing. (3) Synchronization—controlled sharing. (4) Lock-free algorithms—complex but high-performance. Each step toward complexity should be justified by a concrete performance requirement.
Beyond making individual functions safe, designing a thread-safe API requires thinking about how operations compose. A collection of individually thread-safe methods can still be used unsafely if the API design encourages check-then-act patterns.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
// PROBLEMATIC API: Encourages unsafe patternsclass BadMap { std::mutex mutex_; std::map<std::string, int> data_;public: bool contains(const std::string& key) { std::lock_guard lock(mutex_); return data_.count(key) > 0; } int get(const std::string& key) { std::lock_guard lock(mutex_); return data_[key]; } void put(const std::string& key, int value) { std::lock_guard lock(mutex_); data_[key] = value; }}; // UNSAFE USAGE (even though each method is thread-safe):void unsafe_usage(BadMap& map) { if (!map.contains("key")) { // Check... // Another thread could insert "key" here! map.put("key", compute()); // ...Act }} // BETTER API: Compound operationsclass GoodMap { std::mutex mutex_; std::map<std::string, int> data_;public: // put-if-absent: atomic check-and-insert bool put_if_absent(const std::string& key, int value) { std::lock_guard lock(mutex_); auto [it, inserted] = data_.emplace(key, value); return inserted; } // get-or-default: never fails int get_or_default(const std::string& key, int default_value) { std::lock_guard lock(mutex_); auto it = data_.find(key); return it != data_.end() ? it->second : default_value; } // compute: atomic read-modify-write int compute(const std::string& key, std::function<int(std::optional<int>)> func) { std::lock_guard lock(mutex_); auto it = data_.find(key); std::optional<int> old_value = it != data_.end() ? std::optional{it->second} : std::nullopt; int new_value = func(old_value); data_[key] = new_value; return new_value; }}; // SAFE USAGE:void safe_usage(GoodMap& map) { map.put_if_absent("key", compute()); // Atomic // Or with compute: map.compute("counter", [](auto old) { return old.value_or(0) + 1; });}Java's ConcurrentHashMap is a model of thread-safe API design. It provides atomic compound operations: putIfAbsent(), computeIfAbsent(), computeIfPresent(), compute(), merge(), replace(). These eliminate the need for external synchronization in almost all use cases. When designing concurrent collections, study this API for inspiration.
Testing thread safety is notoriously difficult because bugs are non-deterministic. A program might run correctly a million times and fail on the million-and-first run. Traditional testing approaches are insufficient—specialized tools and techniques are required.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
// Thread Safety Annotations (Clang)// Compile with -Wthread-safety #include <mutex> class CAPABILITY("mutex") ThreadSafeCounter { std::mutex mutex_; int count_ GUARDED_BY(mutex_) = 0; public: void increment() REQUIRES(!mutex_) { std::lock_guard lock(mutex_); count_++; // Safe: lock is held } int get() const REQUIRES(!mutex_) { std::lock_guard lock(mutex_); return count_; } // This would cause a compile warning: // void unsafeGet() const { return count_; } // Error!}; // Stress test example#include <thread>#include <vector> void stress_test_counter() { ThreadSafeCounter counter; const int NUM_THREADS = 10; const int INCREMENTS_PER_THREAD = 100000; std::vector<std::thread> threads; for (int i = 0; i < NUM_THREADS; i++) { threads.emplace_back([&counter]() { for (int j = 0; j < INCREMENTS_PER_THREAD; j++) { counter.increment(); } }); } for (auto& t : threads) { t.join(); } int expected = NUM_THREADS * INCREMENTS_PER_THREAD; int actual = counter.get(); assert(actual == expected); // If not thread-safe, this can fail} // Compile with ThreadSanitizer:// clang++ -fsanitize=thread -g -O1 test.cpp// Run the test - TSan reports any data races foundA test passing does not prove thread safety. Due to the non-determinism of scheduling, a race condition might never trigger during testing but appear in production under different load. Use multiple strategies: static analysis + dynamic analysis (TSan) + stress testing + code review. Only their combination provides reasonable confidence.
Certain patterns recur frequently in thread-safe programming. Knowing these patterns helps you recognize safe designs and apply proven solutions.
| Pattern | Description | When to Use |
|---|---|---|
| Monitor | Encapsulate shared state with a mutex; all methods acquire lock | General-purpose synchronized data structures |
| Read-Write Lock | Multiple readers OR one writer; shared_mutex or RWLock | Read-heavy workloads with rare writes |
| Copy-on-Write | Modifications copy the data, leaving existing readers unaffected | Configuration, immutable-style with rare updates |
| Double-Checked Locking | Check without lock, lock and re-check if needed | Lazy singleton initialization (with proper memory barriers!) |
| Thread-Per-Connection | Each connection handled by a dedicated thread | Traditional server design; natural isolation |
| Worker Pool | Fixed pool of threads processing a shared queue | High-throughput servers; controlled resource usage |
| Active Object | Object with its own thread; requests queued and processed serially | Avoiding synchronization by having each object single-threaded |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
// Pattern: Copy-on-Writeclass CopyOnWriteList { std::shared_ptr<const std::vector<int>> data_; std::mutex mutex_; public: CopyOnWriteList() : data_(std::make_shared<std::vector<int>>()) {} // Read: No lock needed - we get a snapshot std::shared_ptr<const std::vector<int>> snapshot() const { return std::atomic_load(&data_); } // Write: Lock, copy, modify, replace void add(int value) { std::lock_guard lock(mutex_); auto new_data = std::make_shared<std::vector<int>>(*data_); new_data->push_back(value); std::atomic_store(&data_, new_data); }}; // Pattern: Active Objectclass ActiveObject { std::queue<std::function<void()>> tasks_; std::mutex mutex_; std::condition_variable cv_; std::thread worker_; bool running_ = true; public: ActiveObject() : worker_([this]{ run(); }) {} ~ActiveObject() { { std::lock_guard lock(mutex_); running_ = false; } cv_.notify_one(); worker_.join(); } // Enqueue a task to be executed by the active object's thread void enqueue(std::function<void()> task) { { std::lock_guard lock(mutex_); tasks_.push(std::move(task)); } cv_.notify_one(); } private: void run() { while (true) { std::function<void()> task; { std::unique_lock lock(mutex_); cv_.wait(lock, [this]{ return !running_ || !tasks_.empty(); }); if (!running_ && tasks_.empty()) return; task = std::move(tasks_.front()); tasks_.pop(); } task(); // Execute without holding lock } }};Thread safety is fundamental to correct concurrent programming. Let's consolidate the essential concepts:
You now understand thread safety in depth—what it means, how to achieve it, and how to verify it. Thread safety is a critical skill for building reliable concurrent software. Next, we'll explore reentrancy—a related but distinct concept that's especially important for signal handlers and recursive code paths.