Operating SystemsThread Issues

Thread Issues

LevelIntermediate

Duration90 mins

TopicThread Issues

4 / 5

Thread Safety

Correctness Under Concurrency

When code works correctly in isolation but fails mysteriously when multiple threads execute it simultaneously, we have a thread safety problem. These failures are among the most insidious bugs in software—they're non-deterministic, hard to reproduce, often platform-dependent, and can lurk for years before manifesting in production.

Thread safety is the property that guarantees code behaves correctly regardless of how many threads are executing it simultaneously, and regardless of how the operating system interleaves their execution. Achieving thread safety requires understanding what makes code unsafe, and systematically applying techniques to eliminate those hazards.

What You Will Learn

By the end of this page, you will understand: (1) the precise definition of thread safety, (2) the taxonomy of thread safety levels, (3) common sources of thread-unsafe behavior, (4) techniques for achieving thread safety, (5) how to evaluate code for thread safety, and (6) practical patterns for building thread-safe systems.

Defining Thread Safety

Thread safety has multiple definitions, each emphasizing different aspects:

Behavioral Definition:

A function or class is thread-safe if and only if it behaves correctly when called simultaneously from multiple threads, regardless of the scheduling or interleaving of those threads.

Invariant-Preserving Definition:

Code is thread-safe if it maintains all invariants and postconditions even when multiple threads access it concurrently.

Practical Definition:

Thread-safe code can be used from multiple threads without additional synchronization by the caller.

The key insight is that thread safety is about correctness under all possible interleavings. The scheduler can switch threads at any point—between any two machine instructions—and thread-safe code must work correctly in all cases.

thread_safety_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// NOT thread-safe: shared mutable state without protection
static int global_counter = 0;
 
void increment() {
    global_counter++;  // Data race!
    // Actually three operations:
    // 1. Read global_counter
    // 2. Add 1
    // 3. Write global_counter
    // Threads can interleave any of these
}
 
// Consider two threads calling increment():
// Thread A: read(0) 
//                     Thread B: read(0)
// Thread A: add(1)
//                     Thread B: add(1)
// Thread A: write(1)
//                     Thread B: write(1)
// Result: counter = 1 (should be 2!)
 
 
// Thread-safe version using mutex
static int safe_counter = 0;
static pthread_mutex_t counter_mutex = PTHREAD_MUTEX_INITIALIZER;
 
void safe_increment() {
    pthread_mutex_lock(&counter_mutex);
    safe_counter++;  // Protected by mutex
    pthread_mutex_unlock(&counter_mutex);
}
 
// Thread-safe version using atomics
static atomic_int atomic_counter = 0;
 
void atomic_increment() {
    atomic_fetch_add(&atomic_counter, 1);  // Single atomic operation
}
 
 
// IMPORTANT: Thread safety is a property of usage, not just implementation
// This is thread-safe for reads but NOT for read-modify-write
static atomic_int value = 0;
 
void NOT_safe_usage() {
    // Even with atomics, this is NOT thread-safe!
    int old = atomic_load(&value);
    // Another thread could change value here!
    atomic_store(&value, old + 1);
    // This is a TOCTOU (time-of-check to time-of-use) bug
}
 
void SAFE_usage() {
    atomic_fetch_add(&value, 1);  // Atomically reads, adds, writes
}

Thread Safety ≠ Atomic Operations

Using atomic variables doesn't automatically make code thread-safe. If your logic requires multiple operations to appear atomic (read-modify-write, compare-then-act, put-if-absent), you need either compound atomic operations (CAS loops, fetch_add) or explicit locks. Individual atomic loads and stores can still suffer from race conditions in the aggregate.

Levels of Thread Safety

Thread safety isn't binary—there's a spectrum of safety levels, each with different guarantees and implications. Understanding this taxonomy helps you choose appropriate designs and document APIs correctly.

Thread Safety Taxonomy
Level	Description	Examples
Immutable/Constant	Data never changes after construction; inherently thread-safe	String literals, const objects, final fields
Thread-Safe	All public operations are safe to call concurrently; internal synchronization	ConcurrentHashMap, thread-safe singletons
Conditionally Safe	Safe if certain conditions are met (different objects, or specific operations)	Different threads using different std::vector instances
Thread-Compatible	No internal sync, but external sync makes it safe	std::vector, std::string (C++)
Thread-Hostile	Cannot be made safe even with external synchronization	Functions modifying global state that can't be locked

safety_levels.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
#include <mutex>
#include <shared_mutex>
#include <vector>
#include <string>
 
// Level 1: IMMUTABLE - Inherently thread-safe
class ImmutableConfig {
    const std::string name_;
    const int value_;
public:
    ImmutableConfig(std::string name, int value) 
        : name_(std::move(name)), value_(value) {}
    
    // All operations are read-only - always safe
    const std::string& name() const { return name_; }
    int value() const { return value_; }
};
 
 
// Level 2: THREAD-SAFE - Safe for concurrent access
class ThreadSafeCounter {
    mutable std::mutex mutex_;
    int count_ = 0;
public:
    void increment() {
        std::lock_guard lock(mutex_);
        ++count_;
    }
    
    int get() const {
        std::lock_guard lock(mutex_);
        return count_;
    }
};
 
 
// Level 3: CONDITIONALLY SAFE - Safe under specific conditions
// std::vector: safe if different threads access different instances
// or if only reading (no modification)
class Reader {
    const std::vector<int>& data_;  // reference to immutable data
public:
    explicit Reader(const std::vector<int>& data) : data_(data) {}
    
    // Safe: only reading, data doesn't change
    int sum() const {
        int total = 0;
        for (int x : data_) total += x;
        return total;
    }
};
 
 
// Level 4: THREAD-COMPATIBLE - Needs external synchronization
class MessageQueue {
    std::vector<std::string> messages_;
    // No internal mutex - caller must synchronize
public:
    void push(std::string msg) {
        messages_.push_back(std::move(msg));
    }
    
    std::string pop() {
        if (messages_.empty()) return "";
        std::string msg = std::move(messages_.back());
        messages_.pop_back();
        return msg;
    }
};
 
// Usage with external synchronization
class SafeMessageQueue {
    MessageQueue queue_;
    std::mutex mutex_;
public:
    void push(std::string msg) {
        std::lock_guard lock(mutex_);
        queue_.push(std::move(msg));
    }
    
    std::string pop() {
        std::lock_guard lock(mutex_);
        return queue_.pop();
    }
};
 
 
// Level 5: THREAD-HOSTILE - Avoid!
// Example: old strtok() that uses static buffer
// Can't be made safe without reimplementing
char* strtok_toxic(char* str, const char* delim) {
    static char* saved;  // Static state = thread-hostile
    // Any attempt to lock in caller doesn't help
    // because the static state is internal
    return strtok(str, delim);
}

Document Your Thread Safety Level

Every public class and function should document its thread safety level. Use comments like '@threadsafe', '@thread-compatible', or '@not-thread-safe'. This helps users understand what synchronization (if any) they must provide. The C++ Standard Library uses 'basic exception safety' terminology—consider adopting similar standardized phrasing for thread safety.

Common Sources of Thread Unsafety

Understanding common thread safety hazards helps you identify and prevent them. Here are the primary sources of thread-unsafe behavior:

Primary Sources of Thread Unsafety

•Shared Mutable State — Any variable that multiple threads can both read and write is a potential data race. This includes global variables, static variables, and instance fields accessed across threads.
•Check-Then-Act Patterns — Testing a condition and then acting based on it creates a window where another thread can change the condition. 'if (x > 0) x--;' is vulnerable if x can change between check and modification.
•Read-Modify-Write Operations — Compound operations like counter++, list.append(item), or map.put(key, value) involve multiple steps that can be interleaved.
•Lazy Initialization — 'if (instance == null) instance = new Object()' is a race condition. Multiple threads can observe null and create multiple instances.
•Iterator Invalidation — Modifying a collection while another thread iterates over it causes undefined behavior or exceptions.
•Escaped References — Passing object references to other threads before construction completes exposes partially initialized state.

unsafety_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
// HAZARD 1: Check-Then-Act
static int tickets = 100;
 
// WRONG: Race condition between check and decrement
int buy_ticket_unsafe() {
    if (tickets > 0) {
        // Another thread could decrement tickets here!
        tickets--;
        return 1;  // Success
    }
    return 0;  // Failure
}
 
// CORRECT: Atomic operation
static atomic_int atomic_tickets = 100;
 
int buy_ticket_safe() {
    int old;
    do {
        old = atomic_load(&atomic_tickets);
        if (old <= 0) return 0;  // No tickets
    } while (!atomic_compare_exchange_weak(&atomic_tickets, &old, old - 1));
    return 1;  // Success
}
 
 
// HAZARD 2: Lazy initialization
static ExpensiveObject *singleton = NULL;
 
// WRONG: Multiple threads may create multiple instances
ExpensiveObject *get_instance_unsafe() {
    if (singleton == NULL) {
        singleton = create_expensive_object();
    }
    return singleton;
}
 
// CORRECT: Double-checked locking with proper memory barriers
static _Atomic(ExpensiveObject *) atomic_singleton = NULL;
static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER;
 
ExpensiveObject *get_instance_safe() {
    ExpensiveObject *result = atomic_load_explicit(&atomic_singleton, 
                                                    memory_order_acquire);
    if (result == NULL) {
        pthread_mutex_lock(&init_mutex);
        result = atomic_load_explicit(&atomic_singleton, memory_order_relaxed);
        if (result == NULL) {
            result = create_expensive_object();
            atomic_store_explicit(&atomic_singleton, result, 
                                  memory_order_release);
        }
        pthread_mutex_unlock(&init_mutex);
    }
    return result;
}
 
 
// HAZARD 3: Escaped 'this' reference
typedef struct {
    int value;
    pthread_t thread;
    // ...
} SelfStartingObject;
 
// WRONG: Object registered/used before construction completes
SelfStartingObject *create_self_starting_object_unsafe() {
    SelfStartingObject *obj = malloc(sizeof(SelfStartingObject));
    
    // Starting thread BEFORE initialization is complete!
    pthread_create(&obj->thread, NULL, object_thread, obj);
    
    // Thread might run while we're still initializing...
    obj->value = compute_initial_value();  // Too late!
    
    return obj;
}
 
// CORRECT: Complete initialization before any concurrent access
SelfStartingObject *create_self_starting_object_safe() {
    SelfStartingObject *obj = malloc(sizeof(SelfStartingObject));
    
    // Initialize FIRST
    obj->value = compute_initial_value();
    
    // Memory barrier ensures initialization visible before thread starts
    atomic_thread_fence(memory_order_release);
    
    // NOW start the thread
    pthread_create(&obj->thread, NULL, object_thread, obj);
    
    return obj;
}

The Visibility Problem

Even without data races, thread safety can fail due to visibility issues. Without proper memory barriers (synchronization), writes by one thread may not be visible to other threads due to CPU caching and compiler optimizations. Using mutexes provides implicit barriers; with atomics, you must choose appropriate memory orderings (acquire/release).

Techniques for Achieving Thread Safety

There are several fundamental approaches to achieving thread safety, each with different tradeoffs:

Immutability is the simplest and most powerful technique. If objects never change after construction, they're inherently thread-safe—no synchronization needed.

immutability.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class ImmutablePoint {
    const int x_;
    const int y_;
public:
    ImmutablePoint(int x, int y) : x_(x), y_(y) {}
    
    int x() const { return x_; }
    int y() const { return y_; }
    
    // "Modification" creates a new object
    ImmutablePoint moved(int dx, int dy) const {
        return ImmutablePoint(x_ + dx, y_ + dy);
    }
};
 
// Safe to share across any number of threads
// Zero synchronization overhead

Advantages: No locks, no contention, no deadlocks, no races

Disadvantages: Memory overhead for creating new objects on 'modification'; not always practical

Choose the Simplest Technique That Works

Prefer techniques in this order: (1) Immutability—no moving parts. (2) Thread confinement—no sharing. (3) Synchronization—controlled sharing. (4) Lock-free algorithms—complex but high-performance. Each step toward complexity should be justified by a concrete performance requirement.

Designing Thread-Safe APIs

Beyond making individual functions safe, designing a thread-safe API requires thinking about how operations compose. A collection of individually thread-safe methods can still be used unsafely if the API design encourages check-then-act patterns.

api_design.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
// PROBLEMATIC API: Encourages unsafe patterns
class BadMap {
    std::mutex mutex_;
    std::map<std::string, int> data_;
public:
    bool contains(const std::string& key) {
        std::lock_guard lock(mutex_);
        return data_.count(key) > 0;
    }
    
    int get(const std::string& key) {
        std::lock_guard lock(mutex_);
        return data_[key];
    }
    
    void put(const std::string& key, int value) {
        std::lock_guard lock(mutex_);
        data_[key] = value;
    }
};
 
// UNSAFE USAGE (even though each method is thread-safe):
void unsafe_usage(BadMap& map) {
    if (!map.contains("key")) {    // Check...
        // Another thread could insert "key" here!
        map.put("key", compute());  // ...Act
    }
}
 
 
// BETTER API: Compound operations
class GoodMap {
    std::mutex mutex_;
    std::map<std::string, int> data_;
public:
    // put-if-absent: atomic check-and-insert
    bool put_if_absent(const std::string& key, int value) {
        std::lock_guard lock(mutex_);
        auto [it, inserted] = data_.emplace(key, value);
        return inserted;
    }
    
    // get-or-default: never fails
    int get_or_default(const std::string& key, int default_value) {
        std::lock_guard lock(mutex_);
        auto it = data_.find(key);
        return it != data_.end() ? it->second : default_value;
    }
    
    // compute: atomic read-modify-write
    int compute(const std::string& key, 
                std::function<int(std::optional<int>)> func) {
        std::lock_guard lock(mutex_);
        auto it = data_.find(key);
        std::optional<int> old_value = 
            it != data_.end() ? std::optional{it->second} : std::nullopt;
        int new_value = func(old_value);
        data_[key] = new_value;
        return new_value;
    }
};
 
// SAFE USAGE:
void safe_usage(GoodMap& map) {
    map.put_if_absent("key", compute());  // Atomic
    
    // Or with compute:
    map.compute("counter", [](auto old) {
        return old.value_or(0) + 1;
    });
}

Thread-Safe API Design Principles

•Provide compound operations — put-if-absent, get-or-create, compare-and-swap. Don't force users to break atomicity.
•Return values, not references — Returning internal references allows unsynchronized access. Return copies or use output parameters.
•Avoid getters that enable check-then-act — Instead of 'isEmpty()' followed by 'getFirst()', provide 'tryGetFirst()' that returns optional.
•Make operations idempotent when possible — Idempotent operations are easier to reason about and can be safely retried.
•Document synchronization requirements — State explicitly whether methods can be called concurrently and what synchronization (if any) callers must provide.

The Java ConcurrentHashMap Pattern

Java's ConcurrentHashMap is a model of thread-safe API design. It provides atomic compound operations: putIfAbsent(), computeIfAbsent(), computeIfPresent(), compute(), merge(), replace(). These eliminate the need for external synchronization in almost all use cases. When designing concurrent collections, study this API for inspiration.

Testing and Verifying Thread Safety

Testing thread safety is notoriously difficult because bugs are non-deterministic. A program might run correctly a million times and fail on the million-and-first run. Traditional testing approaches are insufficient—specialized tools and techniques are required.

Thread Safety Testing Strategies

•Static Analysis Tools — Clang Thread Safety Analysis (annotations), Coverity, ThreadSanitizer's static analysis. These find bugs without executing code.
•Thread Sanitizer (TSan) — Dynamic analysis tool that detects data races at runtime. Run tests with TSan enabled (-fsanitize=thread).
•Stress Testing — Run concurrent operations thousands of times with varying numbers of threads. Races are probabilistic—more iterations increase detection likelihood.
•Deterministic Testing — Libraries like Relacy or CDSChecker systematically explore thread interleavings, eliminating non-determinism.
•Chaos Testing — Inject random delays, priority inversions, and scheduling variations to expose hidden races.
•Formal Verification — For critical code, use model checkers (SPIN, TLA+) to prove correctness across all possible interleavings.

thread_safety_testing.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
// Thread Safety Annotations (Clang)
// Compile with -Wthread-safety
 
#include <mutex>
 
class CAPABILITY("mutex") ThreadSafeCounter {
    std::mutex mutex_;
    int count_ GUARDED_BY(mutex_) = 0;
    
public:
    void increment() REQUIRES(!mutex_) {
        std::lock_guard lock(mutex_);
        count_++;  // Safe: lock is held
    }
    
    int get() const REQUIRES(!mutex_) {
        std::lock_guard lock(mutex_);
        return count_;
    }
    
    // This would cause a compile warning:
    // void unsafeGet() const { return count_; }  // Error!
};
 
 
// Stress test example
#include <thread>
#include <vector>
 
void stress_test_counter() {
    ThreadSafeCounter counter;
    const int NUM_THREADS = 10;
    const int INCREMENTS_PER_THREAD = 100000;
    
    std::vector<std::thread> threads;
    for (int i = 0; i < NUM_THREADS; i++) {
        threads.emplace_back([&counter]() {
            for (int j = 0; j < INCREMENTS_PER_THREAD; j++) {
                counter.increment();
            }
        });
    }
    
    for (auto& t : threads) {
        t.join();
    }
    
    int expected = NUM_THREADS * INCREMENTS_PER_THREAD;
    int actual = counter.get();
    
    assert(actual == expected);  // If not thread-safe, this can fail
}
 
// Compile with ThreadSanitizer:
// clang++ -fsanitize=thread -g -O1 test.cpp
// Run the test - TSan reports any data races found

Passing Tests ≠ Thread-Safe

A test passing does not prove thread safety. Due to the non-determinism of scheduling, a race condition might never trigger during testing but appear in production under different load. Use multiple strategies: static analysis + dynamic analysis (TSan) + stress testing + code review. Only their combination provides reasonable confidence.

Common Thread Safety Patterns

Certain patterns recur frequently in thread-safe programming. Knowing these patterns helps you recognize safe designs and apply proven solutions.

Thread Safety Patterns
Pattern	Description	When to Use
Monitor	Encapsulate shared state with a mutex; all methods acquire lock	General-purpose synchronized data structures
Read-Write Lock	Multiple readers OR one writer; shared_mutex or RWLock	Read-heavy workloads with rare writes
Copy-on-Write	Modifications copy the data, leaving existing readers unaffected	Configuration, immutable-style with rare updates
Double-Checked Locking	Check without lock, lock and re-check if needed	Lazy singleton initialization (with proper memory barriers!)
Thread-Per-Connection	Each connection handled by a dedicated thread	Traditional server design; natural isolation
Worker Pool	Fixed pool of threads processing a shared queue	High-throughput servers; controlled resource usage
Active Object	Object with its own thread; requests queued and processed serially	Avoiding synchronization by having each object single-threaded

patterns.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
// Pattern: Copy-on-Write
class CopyOnWriteList {
    std::shared_ptr<const std::vector<int>> data_;
    std::mutex mutex_;
    
public:
    CopyOnWriteList() : data_(std::make_shared<std::vector<int>>()) {}
    
    // Read: No lock needed - we get a snapshot
    std::shared_ptr<const std::vector<int>> snapshot() const {
        return std::atomic_load(&data_);
    }
    
    // Write: Lock, copy, modify, replace
    void add(int value) {
        std::lock_guard lock(mutex_);
        auto new_data = std::make_shared<std::vector<int>>(*data_);
        new_data->push_back(value);
        std::atomic_store(&data_, new_data);
    }
};
 
 
// Pattern: Active Object
class ActiveObject {
    std::queue<std::function<void()>> tasks_;
    std::mutex mutex_;
    std::condition_variable cv_;
    std::thread worker_;
    bool running_ = true;
    
public:
    ActiveObject() : worker_([this]{ run(); }) {}
    
    ~ActiveObject() {
        {
            std::lock_guard lock(mutex_);
            running_ = false;
        }
        cv_.notify_one();
        worker_.join();
    }
    
    // Enqueue a task to be executed by the active object's thread
    void enqueue(std::function<void()> task) {
        {
            std::lock_guard lock(mutex_);
            tasks_.push(std::move(task));
        }
        cv_.notify_one();
    }
    
private:
    void run() {
        while (true) {
            std::function<void()> task;
            {
                std::unique_lock lock(mutex_);
                cv_.wait(lock, [this]{ 
                    return !running_ || !tasks_.empty(); 
                });
                if (!running_ && tasks_.empty()) return;
                task = std::move(tasks_.front());
                tasks_.pop();
            }
            task();  // Execute without holding lock
        }
    }
};

Summary: Mastering Thread Safety

Thread safety is fundamental to correct concurrent programming. Let's consolidate the essential concepts:

Key Takeaways

•Thread safety means correct behavior under all interleavings — Code must work regardless of how the scheduler orders thread execution.
•Thread safety exists on a spectrum — From immutable (inherently safe) to thread-hostile (cannot be made safe). Know where your code falls.
•Shared mutable state is the root cause of most thread safety issues — Eliminate sharing (confinement), eliminate mutation (immutability), or synchronize access.
•Check-then-act is a recipe for disaster — Provide atomic compound operations in APIs to prevent users from writing races.
•Prefer simpler techniques — Immutability > confinement > synchronization > lock-free. Complexity must be justified by requirements.
•Testing cannot prove thread safety — Use static analysis, TSan, stress testing, and code review together. No single approach is sufficient.
•Document thread safety levels — Every public API should state its thread safety guarantees explicitly.

Page Complete

You now understand thread safety in depth—what it means, how to achieve it, and how to verify it. Thread safety is a critical skill for building reliable concurrent software. Next, we'll explore reentrancy—a related but distinct concept that's especially important for signal handlers and recursive code paths.

4 / 5

Loading learning content...

Operating SystemsThread Issues

Thread Issues

LevelIntermediate

Duration90 mins

TopicThread Issues

4 / 5

Thread Safety

Correctness Under Concurrency

What You Will Learn

Defining Thread Safety

Thread safety has multiple definitions, each emphasizing different aspects:

Behavioral Definition:

A function or class is thread-safe if and only if it behaves correctly when called simultaneously from multiple threads, regardless of the scheduling or interleaving of those threads.

Invariant-Preserving Definition:

Code is thread-safe if it maintains all invariants and postconditions even when multiple threads access it concurrently.

Practical Definition:

Thread-safe code can be used from multiple threads without additional synchronization by the caller.

thread_safety_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// NOT thread-safe: shared mutable state without protection
static int global_counter = 0;
 
void increment() {
    global_counter++;  // Data race!
    // Actually three operations:
    // 1. Read global_counter
    // 2. Add 1
    // 3. Write global_counter
    // Threads can interleave any of these
}
 
// Consider two threads calling increment():
// Thread A: read(0) 
//                     Thread B: read(0)
// Thread A: add(1)
//                     Thread B: add(1)
// Thread A: write(1)
//                     Thread B: write(1)
// Result: counter = 1 (should be 2!)
 
 
// Thread-safe version using mutex
static int safe_counter = 0;
static pthread_mutex_t counter_mutex = PTHREAD_MUTEX_INITIALIZER;
 
void safe_increment() {
    pthread_mutex_lock(&counter_mutex);
    safe_counter++;  // Protected by mutex
    pthread_mutex_unlock(&counter_mutex);
}
 
// Thread-safe version using atomics
static atomic_int atomic_counter = 0;
 
void atomic_increment() {
    atomic_fetch_add(&atomic_counter, 1);  // Single atomic operation
}
 
 
// IMPORTANT: Thread safety is a property of usage, not just implementation
// This is thread-safe for reads but NOT for read-modify-write
static atomic_int value = 0;
 
void NOT_safe_usage() {
    // Even with atomics, this is NOT thread-safe!
    int old = atomic_load(&value);
    // Another thread could change value here!
    atomic_store(&value, old + 1);
    // This is a TOCTOU (time-of-check to time-of-use) bug
}
 
void SAFE_usage() {
    atomic_fetch_add(&value, 1);  // Atomically reads, adds, writes
}

Thread Safety ≠ Atomic Operations

Levels of Thread Safety

Thread Safety Taxonomy
Level	Description	Examples
Immutable/Constant	Data never changes after construction; inherently thread-safe	String literals, const objects, final fields
Thread-Safe	All public operations are safe to call concurrently; internal synchronization	ConcurrentHashMap, thread-safe singletons
Conditionally Safe	Safe if certain conditions are met (different objects, or specific operations)	Different threads using different std::vector instances
Thread-Compatible	No internal sync, but external sync makes it safe	std::vector, std::string (C++)
Thread-Hostile	Cannot be made safe even with external synchronization	Functions modifying global state that can't be locked

safety_levels.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
#include <mutex>
#include <shared_mutex>
#include <vector>
#include <string>
 
// Level 1: IMMUTABLE - Inherently thread-safe
class ImmutableConfig {
    const std::string name_;
    const int value_;
public:
    ImmutableConfig(std::string name, int value) 
        : name_(std::move(name)), value_(value) {}
    
    // All operations are read-only - always safe
    const std::string& name() const { return name_; }
    int value() const { return value_; }
};
 
 
// Level 2: THREAD-SAFE - Safe for concurrent access
class ThreadSafeCounter {
    mutable std::mutex mutex_;
    int count_ = 0;
public:
    void increment() {
        std::lock_guard lock(mutex_);
        ++count_;
    }
    
    int get() const {
        std::lock_guard lock(mutex_);
        return count_;
    }
};
 
 
// Level 3: CONDITIONALLY SAFE - Safe under specific conditions
// std::vector: safe if different threads access different instances
// or if only reading (no modification)
class Reader {
    const std::vector<int>& data_;  // reference to immutable data
public:
    explicit Reader(const std::vector<int>& data) : data_(data) {}
    
    // Safe: only reading, data doesn't change
    int sum() const {
        int total = 0;
        for (int x : data_) total += x;
        return total;
    }
};
 
 
// Level 4: THREAD-COMPATIBLE - Needs external synchronization
class MessageQueue {
    std::vector<std::string> messages_;
    // No internal mutex - caller must synchronize
public:
    void push(std::string msg) {
        messages_.push_back(std::move(msg));
    }
    
    std::string pop() {
        if (messages_.empty()) return "";
        std::string msg = std::move(messages_.back());
        messages_.pop_back();
        return msg;
    }
};
 
// Usage with external synchronization
class SafeMessageQueue {
    MessageQueue queue_;
    std::mutex mutex_;
public:
    void push(std::string msg) {
        std::lock_guard lock(mutex_);
        queue_.push(std::move(msg));
    }
    
    std::string pop() {
        std::lock_guard lock(mutex_);
        return queue_.pop();
    }
};
 
 
// Level 5: THREAD-HOSTILE - Avoid!
// Example: old strtok() that uses static buffer
// Can't be made safe without reimplementing
char* strtok_toxic(char* str, const char* delim) {
    static char* saved;  // Static state = thread-hostile
    // Any attempt to lock in caller doesn't help
    // because the static state is internal
    return strtok(str, delim);
}

Document Your Thread Safety Level

Common Sources of Thread Unsafety

Understanding common thread safety hazards helps you identify and prevent them. Here are the primary sources of thread-unsafe behavior:

Primary Sources of Thread Unsafety

•Shared Mutable State — Any variable that multiple threads can both read and write is a potential data race. This includes global variables, static variables, and instance fields accessed across threads.
•Check-Then-Act Patterns — Testing a condition and then acting based on it creates a window where another thread can change the condition. 'if (x > 0) x--;' is vulnerable if x can change between check and modification.
•Read-Modify-Write Operations — Compound operations like counter++, list.append(item), or map.put(key, value) involve multiple steps that can be interleaved.
•Lazy Initialization — 'if (instance == null) instance = new Object()' is a race condition. Multiple threads can observe null and create multiple instances.
•Iterator Invalidation — Modifying a collection while another thread iterates over it causes undefined behavior or exceptions.
•Escaped References — Passing object references to other threads before construction completes exposes partially initialized state.

unsafety_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
// HAZARD 1: Check-Then-Act
static int tickets = 100;
 
// WRONG: Race condition between check and decrement
int buy_ticket_unsafe() {
    if (tickets > 0) {
        // Another thread could decrement tickets here!
        tickets--;
        return 1;  // Success
    }
    return 0;  // Failure
}
 
// CORRECT: Atomic operation
static atomic_int atomic_tickets = 100;
 
int buy_ticket_safe() {
    int old;
    do {
        old = atomic_load(&atomic_tickets);
        if (old <= 0) return 0;  // No tickets
    } while (!atomic_compare_exchange_weak(&atomic_tickets, &old, old - 1));
    return 1;  // Success
}
 
 
// HAZARD 2: Lazy initialization
static ExpensiveObject *singleton = NULL;
 
// WRONG: Multiple threads may create multiple instances
ExpensiveObject *get_instance_unsafe() {
    if (singleton == NULL) {
        singleton = create_expensive_object();
    }
    return singleton;
}
 
// CORRECT: Double-checked locking with proper memory barriers
static _Atomic(ExpensiveObject *) atomic_singleton = NULL;
static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER;
 
ExpensiveObject *get_instance_safe() {
    ExpensiveObject *result = atomic_load_explicit(&atomic_singleton, 
                                                    memory_order_acquire);
    if (result == NULL) {
        pthread_mutex_lock(&init_mutex);
        result = atomic_load_explicit(&atomic_singleton, memory_order_relaxed);
        if (result == NULL) {
            result = create_expensive_object();
            atomic_store_explicit(&atomic_singleton, result, 
                                  memory_order_release);
        }
        pthread_mutex_unlock(&init_mutex);
    }
    return result;
}
 
 
// HAZARD 3: Escaped 'this' reference
typedef struct {
    int value;
    pthread_t thread;
    // ...
} SelfStartingObject;
 
// WRONG: Object registered/used before construction completes
SelfStartingObject *create_self_starting_object_unsafe() {
    SelfStartingObject *obj = malloc(sizeof(SelfStartingObject));
    
    // Starting thread BEFORE initialization is complete!
    pthread_create(&obj->thread, NULL, object_thread, obj);
    
    // Thread might run while we're still initializing...
    obj->value = compute_initial_value();  // Too late!
    
    return obj;
}
 
// CORRECT: Complete initialization before any concurrent access
SelfStartingObject *create_self_starting_object_safe() {
    SelfStartingObject *obj = malloc(sizeof(SelfStartingObject));
    
    // Initialize FIRST
    obj->value = compute_initial_value();
    
    // Memory barrier ensures initialization visible before thread starts
    atomic_thread_fence(memory_order_release);
    
    // NOW start the thread
    pthread_create(&obj->thread, NULL, object_thread, obj);
    
    return obj;
}

The Visibility Problem

Techniques for Achieving Thread Safety

There are several fundamental approaches to achieving thread safety, each with different tradeoffs:

Immutability is the simplest and most powerful technique. If objects never change after construction, they're inherently thread-safe—no synchronization needed.

immutability.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class ImmutablePoint {
    const int x_;
    const int y_;
public:
    ImmutablePoint(int x, int y) : x_(x), y_(y) {}
    
    int x() const { return x_; }
    int y() const { return y_; }
    
    // "Modification" creates a new object
    ImmutablePoint moved(int dx, int dy) const {
        return ImmutablePoint(x_ + dx, y_ + dy);
    }
};
 
// Safe to share across any number of threads
// Zero synchronization overhead

Advantages: No locks, no contention, no deadlocks, no races

Disadvantages: Memory overhead for creating new objects on 'modification'; not always practical

Choose the Simplest Technique That Works

Designing Thread-Safe APIs

api_design.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
// PROBLEMATIC API: Encourages unsafe patterns
class BadMap {
    std::mutex mutex_;
    std::map<std::string, int> data_;
public:
    bool contains(const std::string& key) {
        std::lock_guard lock(mutex_);
        return data_.count(key) > 0;
    }
    
    int get(const std::string& key) {
        std::lock_guard lock(mutex_);
        return data_[key];
    }
    
    void put(const std::string& key, int value) {
        std::lock_guard lock(mutex_);
        data_[key] = value;
    }
};
 
// UNSAFE USAGE (even though each method is thread-safe):
void unsafe_usage(BadMap& map) {
    if (!map.contains("key")) {    // Check...
        // Another thread could insert "key" here!
        map.put("key", compute());  // ...Act
    }
}
 
 
// BETTER API: Compound operations
class GoodMap {
    std::mutex mutex_;
    std::map<std::string, int> data_;
public:
    // put-if-absent: atomic check-and-insert
    bool put_if_absent(const std::string& key, int value) {
        std::lock_guard lock(mutex_);
        auto [it, inserted] = data_.emplace(key, value);
        return inserted;
    }
    
    // get-or-default: never fails
    int get_or_default(const std::string& key, int default_value) {
        std::lock_guard lock(mutex_);
        auto it = data_.find(key);
        return it != data_.end() ? it->second : default_value;
    }
    
    // compute: atomic read-modify-write
    int compute(const std::string& key, 
                std::function<int(std::optional<int>)> func) {
        std::lock_guard lock(mutex_);
        auto it = data_.find(key);
        std::optional<int> old_value = 
            it != data_.end() ? std::optional{it->second} : std::nullopt;
        int new_value = func(old_value);
        data_[key] = new_value;
        return new_value;
    }
};
 
// SAFE USAGE:
void safe_usage(GoodMap& map) {
    map.put_if_absent("key", compute());  // Atomic
    
    // Or with compute:
    map.compute("counter", [](auto old) {
        return old.value_or(0) + 1;
    });
}

Thread-Safe API Design Principles

•Provide compound operations — put-if-absent, get-or-create, compare-and-swap. Don't force users to break atomicity.
•Return values, not references — Returning internal references allows unsynchronized access. Return copies or use output parameters.
•Avoid getters that enable check-then-act — Instead of 'isEmpty()' followed by 'getFirst()', provide 'tryGetFirst()' that returns optional.
•Make operations idempotent when possible — Idempotent operations are easier to reason about and can be safely retried.
•Document synchronization requirements — State explicitly whether methods can be called concurrently and what synchronization (if any) callers must provide.

The Java ConcurrentHashMap Pattern

Testing and Verifying Thread Safety

Thread Safety Testing Strategies

•Static Analysis Tools — Clang Thread Safety Analysis (annotations), Coverity, ThreadSanitizer's static analysis. These find bugs without executing code.
•Thread Sanitizer (TSan) — Dynamic analysis tool that detects data races at runtime. Run tests with TSan enabled (-fsanitize=thread).
•Stress Testing — Run concurrent operations thousands of times with varying numbers of threads. Races are probabilistic—more iterations increase detection likelihood.
•Deterministic Testing — Libraries like Relacy or CDSChecker systematically explore thread interleavings, eliminating non-determinism.
•Chaos Testing — Inject random delays, priority inversions, and scheduling variations to expose hidden races.
•Formal Verification — For critical code, use model checkers (SPIN, TLA+) to prove correctness across all possible interleavings.

thread_safety_testing.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
// Thread Safety Annotations (Clang)
// Compile with -Wthread-safety
 
#include <mutex>
 
class CAPABILITY("mutex") ThreadSafeCounter {
    std::mutex mutex_;
    int count_ GUARDED_BY(mutex_) = 0;
    
public:
    void increment() REQUIRES(!mutex_) {
        std::lock_guard lock(mutex_);
        count_++;  // Safe: lock is held
    }
    
    int get() const REQUIRES(!mutex_) {
        std::lock_guard lock(mutex_);
        return count_;
    }
    
    // This would cause a compile warning:
    // void unsafeGet() const { return count_; }  // Error!
};
 
 
// Stress test example
#include <thread>
#include <vector>
 
void stress_test_counter() {
    ThreadSafeCounter counter;
    const int NUM_THREADS = 10;
    const int INCREMENTS_PER_THREAD = 100000;
    
    std::vector<std::thread> threads;
    for (int i = 0; i < NUM_THREADS; i++) {
        threads.emplace_back([&counter]() {
            for (int j = 0; j < INCREMENTS_PER_THREAD; j++) {
                counter.increment();
            }
        });
    }
    
    for (auto& t : threads) {
        t.join();
    }
    
    int expected = NUM_THREADS * INCREMENTS_PER_THREAD;
    int actual = counter.get();
    
    assert(actual == expected);  // If not thread-safe, this can fail
}
 
// Compile with ThreadSanitizer:
// clang++ -fsanitize=thread -g -O1 test.cpp
// Run the test - TSan reports any data races found

Passing Tests ≠ Thread-Safe

Common Thread Safety Patterns

Certain patterns recur frequently in thread-safe programming. Knowing these patterns helps you recognize safe designs and apply proven solutions.

Thread Safety Patterns
Pattern	Description	When to Use
Monitor	Encapsulate shared state with a mutex; all methods acquire lock	General-purpose synchronized data structures
Read-Write Lock	Multiple readers OR one writer; shared_mutex or RWLock	Read-heavy workloads with rare writes
Copy-on-Write	Modifications copy the data, leaving existing readers unaffected	Configuration, immutable-style with rare updates
Double-Checked Locking	Check without lock, lock and re-check if needed	Lazy singleton initialization (with proper memory barriers!)
Thread-Per-Connection	Each connection handled by a dedicated thread	Traditional server design; natural isolation
Worker Pool	Fixed pool of threads processing a shared queue	High-throughput servers; controlled resource usage
Active Object	Object with its own thread; requests queued and processed serially	Avoiding synchronization by having each object single-threaded

patterns.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
// Pattern: Copy-on-Write
class CopyOnWriteList {
    std::shared_ptr<const std::vector<int>> data_;
    std::mutex mutex_;
    
public:
    CopyOnWriteList() : data_(std::make_shared<std::vector<int>>()) {}
    
    // Read: No lock needed - we get a snapshot
    std::shared_ptr<const std::vector<int>> snapshot() const {
        return std::atomic_load(&data_);
    }
    
    // Write: Lock, copy, modify, replace
    void add(int value) {
        std::lock_guard lock(mutex_);
        auto new_data = std::make_shared<std::vector<int>>(*data_);
        new_data->push_back(value);
        std::atomic_store(&data_, new_data);
    }
};
 
 
// Pattern: Active Object
class ActiveObject {
    std::queue<std::function<void()>> tasks_;
    std::mutex mutex_;
    std::condition_variable cv_;
    std::thread worker_;
    bool running_ = true;
    
public:
    ActiveObject() : worker_([this]{ run(); }) {}
    
    ~ActiveObject() {
        {
            std::lock_guard lock(mutex_);
            running_ = false;
        }
        cv_.notify_one();
        worker_.join();
    }
    
    // Enqueue a task to be executed by the active object's thread
    void enqueue(std::function<void()> task) {
        {
            std::lock_guard lock(mutex_);
            tasks_.push(std::move(task));
        }
        cv_.notify_one();
    }
    
private:
    void run() {
        while (true) {
            std::function<void()> task;
            {
                std::unique_lock lock(mutex_);
                cv_.wait(lock, [this]{ 
                    return !running_ || !tasks_.empty(); 
                });
                if (!running_ && tasks_.empty()) return;
                task = std::move(tasks_.front());
                tasks_.pop();
            }
            task();  // Execute without holding lock
        }
    }
};

Summary: Mastering Thread Safety

Thread safety is fundamental to correct concurrent programming. Let's consolidate the essential concepts:

Key Takeaways

•Thread safety means correct behavior under all interleavings — Code must work regardless of how the scheduler orders thread execution.
•Thread safety exists on a spectrum — From immutable (inherently safe) to thread-hostile (cannot be made safe). Know where your code falls.
•Shared mutable state is the root cause of most thread safety issues — Eliminate sharing (confinement), eliminate mutation (immutability), or synchronize access.
•Check-then-act is a recipe for disaster — Provide atomic compound operations in APIs to prevent users from writing races.
•Prefer simpler techniques — Immutability > confinement > synchronization > lock-free. Complexity must be justified by requirements.
•Testing cannot prove thread safety — Use static analysis, TSan, stress testing, and code review together. No single approach is sufficient.
•Document thread safety levels — Every public API should state its thread safety guarantees explicitly.

Page Complete

4 / 5