Operating SystemsConcurrent Programming

Concurrency Concepts

LevelIntermediate

Duration60 mins

TopicConcurrent Programming

3 / 5

Data Races

The Silent Killer of Concurrent Programs

In 2003, the Northeast Blackout affected 55 million people across the United States and Canada. Investigation revealed that a software bug—involving unsynchronized access to shared state in the alarm monitoring system—prevented operators from seeing critical alerts as the grid began to fail. The bug had existed for years but manifested only under a specific combination of conditions.

This is the nature of data races: they lurk silently in code, working correctly most of the time, until conditions align to produce catastrophic failure. A data race can:

Pass all tests yet fail in production
Work for years then suddenly break with a new CPU or compiler
Produce different behavior on every run
Corrupt data in ways that aren't discovered until much later
Be nearly impossible to reproduce for debugging

Data races are the single most important bug class in concurrent programming, and understanding them deeply is essential for any engineer working with threads.

What You Will Learn

By the end of this page, you will understand precisely what a data race is, why data races are so dangerous, how to identify them in code, and the fundamental principle for preventing them. You'll develop the ability to spot potential races before they become bugs.

Defining Data Races

A data race occurs when all of the following conditions are met:

Two or more threads access the same memory location
At least one access is a write
The accesses are not ordered by synchronization (no happens-before relationship)
The accesses can occur concurrently (both could happen 'at the same time')

The formal definition:

A data race is a situation where two memory accesses in different threads target the same location, at least one is a write, and there is no happens-before relationship between them.

Note what is not required for a data race:

Both threads don't need to write—one reader and one writer is sufficient
The accesses don't need to actually happen simultaneously—potential concurrency is enough
The program doesn't need to produce wrong results—the race exists regardless of outcomes

data_race_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Example 1: Classic data race - simultaneous read and write
int counter = 0;  // Shared
 
void thread1() {
    counter = counter + 1;  // Read, modify, write - not atomic!
}
 
void thread2() {
    counter = counter + 1;  // Same: read, modify, write
}
// Both threads read and write 'counter' without synchronization
// Result after both complete: could be 1 (lost update) or 2
 
 
// Example 2: Flag-based race - looks safe but isn't
int data = 0;    // Shared
int ready = 0;   // Shared flag
 
void producer() {
    data = 42;    // Write data
    ready = 1;    // Signal reader
}
 
void consumer() {
    while (ready == 0) ;   // Wait for signal
    printf("%d
", data);  // Read data
}
// Data race on 'ready': writer and reader without synchronization
// Data race on 'data': no happens-before between write and read
// Consumer might see ready=1 but data=0 due to reordering!
 
 
// Example 3: Read-read - NOT a data race
int shared = 42;
 
void reader1() { printf("%d
", shared); }
void reader2() { printf("%d
", shared); }
// Both only read - no data race (but no useful communication either)

Data Races vs Race Conditions

These terms are often confused but are different:

Data race: Unsynchronized concurrent access to shared memory (a low-level concept about memory operations).

Race condition: A correctness bug where program behavior depends on timing (a high-level concept about program logic).

You can have race conditions without data races (using locks incorrectly) and data races without race conditions (if the program happens to work despite undefined behavior).

Why Data Races Are Dangerous

Data races aren't just bugs—in C and C++, they're undefined behavior. This has profound implications that many developers don't fully appreciate.

Undefined behavior means anything can happen:

The language specification places no constraints on program behavior when a data race exists. The compiler is free to assume data races don't exist and optimize accordingly. This can cause:

Value corruption — Torn reads/writes where a 64-bit value is partially updated may produce values that were never written by any thread.
Optimization breakage — The compiler might cache a value in a register, never re-reading from memory even though another thread changed it.
Infinite loops — A loop that checks a flag might be optimized to check once (since 'no race exists,' the flag can't change).
Time travel — Due to compiler reordering, effects of a data race may appear to happen before their causes.
Speculative execution effects — Side-channel information leakage, as demonstrated by Spectre vulnerabilities.

dangerous_race.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Example: Compiler optimization breaks racy code
 
int done = 0;  // Shared flag (non-atomic, non-volatile)
 
void wait_for_done() {
    while (!done) {
        // Wait...
    }
    // Continue after signal
}
 
// What the compiler might generate (optimized):
void wait_for_done_optimized() {
    if (!done) {
        while (1) { }  // Infinite loop!
    }
    // Compiler hoists 'done' read outside loop
    // Reasoning: "done can't change (no race allowed)"
}
 
// The compiler's optimization is CORRECT per the standard:
// - Program has a data race → undefined behavior
// - Compiler may assume any behavior for undefined behavior
// - Hoisting the read is a valid transformation
 
// Fix: Use atomic types with proper memory ordering
#include <stdatomic.h>
atomic_int done_safe = 0;
 
void wait_for_done_correct() {
    while (!atomic_load(&done_safe)) {
        // Now compiler knows another thread might change done_safe
    }
}

The Volatility Trap

Many developers attempt to fix data races with 'volatile'. In C/C++, volatile prevents compiler optimization of the variable but provides NO synchronization or memory ordering. A volatile variable can still be subject to CPU reordering and cache incoherence. Volatile does not fix data races—use atomics or proper synchronization.

The non-determinism problem:

Even when data races don't trigger dramatic effects, they introduce non-determinism:

The same code with the same inputs can produce different outputs
Behavior may change with different thread schedules, CPU loads, or timing
Adding debug output (printf, logging) changes timing and may hide bugs
Bugs may appear only in production, under load, or on certain hardware

This makes data race bugs extremely difficult to diagnose. A 'Heisenbug' that disappears when you try to observe it is often a data race.

The Lost Update Problem

The most classic data race pattern is the lost update, where two threads try to update the same value and one update is lost.

Consider incrementing a counter:

counter = counter + 1;

This looks like one operation, but it's actually three:

Read: Load current value of counter into register
Modify: Add 1 to the register value
Write: Store the register value back to counter

With two threads executing concurrently, the interleaving can cause updates to be lost:

Timeline of a Lost Update (counter starts at 0)
Time	Thread 1	Thread 2	counter value
T1	Read counter → 0	–	0
T2	–	Read counter → 0	0
T3	Add 1 → 1	–	0
T4	–	Add 1 → 1	0
T5	Write 1 → counter	–	1
T6	–	Write 1 → counter	1

Expected result: 2 (both increments applied) Actual result: 1 (Thread 2's increment overwrote Thread 1's)

The update from Thread 1 was lost because Thread 2 read the old value before Thread 1's write was visible.

This generalizes to any read-modify-write pattern:

Bank account: balance = balance - withdrawal (lost withdrawals → wrong balance)
Inventory: stock = stock - order (lost orders → overselling)
Statistics: total += measurement (lost data → wrong statistics)
Reference counting: refcount-- (lost decrement → memory leak)

In all these cases, the pattern is the same: read, compute, write—without atomicity, the race exists.

lost_update_demo.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Demonstration: Lost updates in practice
#include <stdio.h>
#include <pthread.h>
 
int counter = 0;
#define ITERATIONS 1000000
 
void* increment(void* arg) {
    for (int i = 0; i < ITERATIONS; i++) {
        counter = counter + 1;  // Data race!
    }
    return NULL;
}
 
int main() {
    pthread_t t1, t2;
    
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    
    printf("Expected: %d
", 2 * ITERATIONS);  // 2,000,000
    printf("Actual: %d
", counter);            // Often < 2,000,000
    
    return 0;
}
 
// Typical output (varies on each run):
// Expected: 2000000
// Actual: 1387429
//
// Hundreds of thousands of updates lost to the race!

The Fix: Atomic Operations

Lost updates are prevented by making the read-modify-write operation atomic. Using atomic_fetch_add (C11), __sync_fetch_and_add (GCC), or InterlockedIncrement (Windows) ensures the entire operation completes without interruption. Alternatively, a mutex can protect the entire critical section.

Torn Reads and Writes

Torn reads/writes occur when a multi-byte value is partially updated by one thread while another thread reads it, resulting in a chimeric value that was never written.

When tearing can occur:

The C standard does not guarantee that any type is accessed atomically. Whether tearing occurs depends on:

Data size vs. native word size — Writing a 64-bit value on a 32-bit CPU requires two memory operations. Another thread may see half old, half new.
Alignment — Misaligned data may cross cache lines, requiring multiple memory accesses.
Hardware architecture — Even correctly-sized, aligned data may not be atomic on all hardware.

Example: 64-bit value on 32-bit CPU

Torn Write Example (64-bit value as two 32-bit halves)
Time	Thread 1 (Writer)	Thread 2 (Reader)	Value in Memory
Initial	–	–	0x0000000000000000
T1	Write low 32 bits of 0xFFFFFFFFFFFFFFFF	–	0x00000000FFFFFFFF
T2	–	Read 64-bit value	Reads 0x00000000FFFFFFFF
T3	Write high 32 bits	–	0xFFFFFFFFFFFFFFFF

The reader sees 0x00000000FFFFFFFF—a value that Thread 1 never intended to write! This is a torn read.

Real-world consequences:

Pointers: A torn pointer read could produce an invalid address, causing a crash or accessing arbitrary memory
Floating point: A torn double could be NaN, Inf, or a denormalized number, causing calculation errors
Timestamps: A torn timestamp could appear to be far in the past or future
Counters: A torn counter could jump backwards or show impossible values

What guarantees atomicity?

In practice, most modern 64-bit platforms guarantee atomic access for naturally-aligned 64-bit values. But this is a hardware/ABI guarantee, not a language guarantee. Always use atomic types when multiple threads access the same data.

torn_read_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Example: Torn reads with a struct
 
struct Coordinate {
    int x;
    int y;
};
 
struct Coordinate position = {0, 0};
 
// Thread 1: Updates position atomically (conceptually)
void update_position(int new_x, int new_y) {
    position.x = new_x;
    position.y = new_y;  // Not atomic with x write!
}
 
// Thread 2: Reads position
void read_position() {
    int x = position.x;
    int y = position.y;  // May see old x, new y or vice versa
    printf("Position: (%d, %d)
", x, y);
}
 
// If Thread 1 writes (10, 20) while position is (0, 0):
// Thread 2 might read:
//   (0, 0)   - saw nothing yet
//   (10, 0)  - saw x update, not y
//   (0, 20)  - saw y update, not x (reordering!)
//   (10, 20) - saw both
//
// The (10, 0) and (0, 20) cases are torn reads of the struct
 
// Fix: Use a mutex to make the entire update atomic
pthread_mutex_t pos_lock = PTHREAD_MUTEX_INITIALIZER;
 
void update_position_safe(int new_x, int new_y) {
    pthread_mutex_lock(&pos_lock);
    position.x = new_x;
    position.y = new_y;
    pthread_mutex_unlock(&pos_lock);
}

Compound Values Are Never Atomic

Even if individual fields are atomically accessible, updating multiple fields is never atomic without synchronization. A struct, array, or any compound data structure requires explicit protection (mutex, RCU, or lock-free design) for thread-safe updates.

Detecting Data Races

Data races are notoriously difficult to detect through testing alone because they depend on thread scheduling, which varies non-deterministically. Fortunately, powerful tools exist:

1. ThreadSanitizer (TSan)

Instrumentation-based detection built into Clang and GCC. Tracks all memory accesses and synchronization to detect races at runtime.

Compile with -fsanitize=thread
Slowdown: ~10-20×
Memory overhead: ~10×
Detects races as they occur, with precise stack traces

2. Helgrind (Valgrind tool)

Dynamic race detection using Valgrind's binary instrumentation.

Run with valgrind --tool=helgrind ./program
Slower than TSan (~20-30×)
Detects races, lock order violations, misuse of pthreads API

3. Intel Inspector

Commercial race detection for Intel platforms.

Precise detection with lower overhead than Helgrind
GUI for exploring races and fix recommendations

4. Static Analysis

Tools like Coverity, PVS-Studio, and Clang's static analyzer can identify potential races without running code.

Fast, no runtime overhead
May produce false positives
Cannot detect all races (undecidable in general)

tsan_example.sh
Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Compile with ThreadSanitizer
$ clang -fsanitize=thread -g -O1 race_program.c -o race_program
 
# Run - TSan will report any races detected
$ ./race_program
 
==================
WARNING: ThreadSanitizer: data race (pid=12345)
  Write of size 4 at 0x... by thread T2:
    #0 increment race_program.c:10 (race_program+0x...)
    #1 thread_start ...
 
  Previous write of size 4 at 0x... by thread T1:
    #0 increment race_program.c:10 (race_program+0x...)
    #1 thread_start ...
 
  Location is global 'counter' of size 4 at 0x...
 
  Thread T2 (tid=..., running) created by main thread at:
    #0 pthread_create ...
    #1 main race_program.c:20 (race_program+0x...)
 
SUMMARY: ThreadSanitizer: data race race_program.c:10 in increment
==================

Run TSan in CI

Integrate ThreadSanitizer into your continuous integration pipeline. Run tests with TSan enabled on every commit. A race that manifests once will be caught before it reaches production. The ~10× slowdown is acceptable for CI, and catching races early is invaluable.

Limitations of race detection:

Dynamic tools only find executed races — Code paths not exercised during testing won't be checked
Scheduling-dependent — A race may not manifest during the instrumented run
No guarantee of completeness — Race detection is undecidable in general; tools catch most practical cases but can miss some
Overhead may hide races — The slowdown from instrumentation changes timing, potentially preventing manifestation

Despite limitations, race detectors catch the vast majority of races in practice and are essential tools for concurrent programming.

Preventing Data Races

The fundamental principle for preventing data races:

If a memory location is accessed by multiple threads and at least one access is a write, all accesses must be synchronized.

There are several strategies to achieve this:

Strategy 1: Mutual Exclusion (Locks)

Protect all accesses to shared data with a mutex. Only one thread can hold the lock at a time, preventing concurrent access.

Strategy 2: Atomic Operations

Use atomic types and operations (C11 _Atomic, C++ std::atomic) for simple shared state. The hardware guarantees atomicity.

Strategy 3: Message Passing

Instead of sharing memory, have threads communicate by sending messages (channels, queues). Shared state is eliminated.

Strategy 4: Immutability

Data that is never modified after creation is inherently safe for concurrent reads. Many functional programming patterns leverage this.

Strategy 5: Thread Confinement

Design so that each piece of data is only accessed by one thread. No sharing = no races.

Mutex Approach

•Pros: Simple conceptual model
•Pros: Works for complex data structures
•Pros: Composition of operations
•Cons: Risk of deadlock if lock ordering violated
•Cons: Performance overhead
•Cons: Can't scale to many threads

Atomic Approach

•Pros: Very low overhead
•Pros: No deadlock risk
•Pros: Scales to many threads
•Cons: Only works for simple values
•Cons: Complex algorithms are hard
•Cons: Memory ordering is tricky

race_prevention.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <stdatomic.h>
#include <pthread.h>
 
// ===== Strategy 1: Mutex =====
int counter_mutex = 0;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
 
void increment_with_mutex() {
    pthread_mutex_lock(&lock);
    counter_mutex++;  // Protected by lock
    pthread_mutex_unlock(&lock);
}
 
 
// ===== Strategy 2: Atomics =====
atomic_int counter_atomic = 0;
 
void increment_with_atomic() {
    atomic_fetch_add(&counter_atomic, 1);  // Single atomic operation
}
 
 
// ===== Strategy 3: Thread Confinement =====
__thread int thread_local_counter = 0;  // Each thread has its own
 
void increment_local() {
    thread_local_counter++;  // No sharing = no race
}
 
 
// ===== Strategy 4: Immutable Data =====
// Once created, 'config' is never modified
const struct Config* get_config() {
    static const struct Config config = { .timeout = 30 };
    return &config;  // Safe to read from any thread
}

Choose Based on Context

Mutexes are the go-to choice for protecting complex shared state. Atomics are ideal for simple counters, flags, and pointers. Message passing works well for producer-consumer patterns. Immutability shines for configuration and reference data. Good concurrent design often combines multiple strategies.

Data Races in Different Languages

Different programming languages handle data races differently:

C and C++

Data races are undefined behavior. The program may do anything—crash, produce wrong results, appear to work, or format your hard drive (in theory). The language provides atomics and threading primitives, but race prevention is entirely the programmer's responsibility.

Java

Data races produce weak but defined behavior. The Java Memory Model specifies what happens: you may see stale or partly-updated values, but certain cases (like torn long/double writes) are prevented for volatile variables. Race prevention is still your job, but undefined behavior is avoided.

The Go race detector catches races at runtime, and races are considered bugs, but behavior is not undefined—you get unpredictable-but-safe behavior. Go encourages message passing via channels: "Do not communicate by sharing memory; share memory by communicating."

Rust

Data races are impossible (in safe Rust). The ownership and borrowing system prevents multiple threads from having mutable access to the same data simultaneously. This is checked at compile time—if your code compiles, it's data-race free.

Language Approaches to Data Races
Language	Data Race Semantics	Prevention Mechanism	Detection
C/C++	Undefined behavior	Programmer discipline + atomics/mutexes	TSan, static analysis
Java	Weak but defined	volatile, synchronized, java.util.concurrent	Race detectors exist
Go	Unsafe but not UB	Channels preferred; mutexes available	Built-in race detector
Rust	Compile-time prevented	Ownership system prevents at compile time	N/A (can't compile)
Python	GIL prevents true races*	Global Interpreter Lock (for CPython)	Limited parallelism
JavaScript	Single-threaded**	Event loop; SharedArrayBuffer needs Atomics	N/A for main thread

*Python's GIL prevents simultaneous bytecode execution but doesn't prevent race conditions in logic.

**JavaScript's main thread is single-threaded, but Web Workers can share memory via SharedArrayBuffer, requiring explicit atomics.

Rust's Revolution

Rust's approach is groundbreaking: the compiler statically ensures data-race freedom. If you access data from multiple threads, you must use synchronization types (Arc<Mutex<T>>, channels, etc.), or the code won't compile. This eliminates an entire class of bugs at zero runtime cost.

Case Studies: Historic Data Race Bugs

Data races have caused numerous real-world failures. Understanding these helps appreciate why race prevention matters.

Therac-25 (1985-1987)

A radiation therapy machine that massively overdosed patients due to software race conditions. When operators typed commands quickly, race conditions between the user interface and hardware control software could set lethal radiation doses. Six patients received massive overdoses; three died. The root cause included missing synchronization between concurrent software routines.

2003 Northeast Blackout

As mentioned earlier, a race condition in the alarm system software prevented operators from seeing cascading failures in the power grid. The race had existed for years but triggered only under specific conditions—55 million people lost power.

Knight Capital (2012)

A trading firm lost $440 million in 45 minutes due to software deployment errors that activated old, disabled code. While not purely a data race, the incident involved unsynchronized state between components behaving as if old orders were valid. Improper concurrent access to shared state contributed to catastrophic behavior.

Linux Kernel Privilege Escalation (CVE-2016-5195, 'Dirty COW')

A race condition in the Linux kernel's copy-on-write mechanism allowed unprivileged users to gain root access. The race had existed for nearly a decade. Attackers could race two threads to write to memory they shouldn't have access to, exploiting a narrow window in the kernel's memory handling.

Races Kill

The Therac-25 case makes it clear: data races aren't just programming errors—they can kill people. In safety-critical systems, race prevention isn't optional. Even in non-safety-critical systems, races cause data corruption, financial loss, and security vulnerabilities.

Lessons from Historic Races

•Races can hide for years — Code may appear correct until specific conditions trigger the bug
•Testing alone is insufficient — Races are timing-dependent and may not manifest during testing
•Severity can be catastrophic — From security vulnerabilities to loss of life
•Concurrent code requires rigorous analysis — Every shared variable must be analyzed for potential races
•Defense in depth — Use race detectors, static analysis, code review, and formal methods for critical code

The Discipline of Race-Free Programming

Writing race-free concurrent code requires disciplined practices:

1. Document thread-safety invariants

For every shared variable, document:

Which threads may access it
Which lock protects it (if any)
The memory ordering used (for atomics)

2. Minimize sharing

The safest race is one that can't happen because data isn't shared. Design for thread confinement and message passing where possible.

3. Hold locks for the minimum time

Long critical sections increase contention and the window for mistakes. Enter, do the minimum necessary, exit.

4. Use established patterns

Producer-consumer, readers-writers, and other patterns have well-understood solutions. Use them rather than inventing ad-hoc synchronization.

5. Review concurrent code carefully

Every concurrent code review should ask:

Is this variable accessed by multiple threads?
If so, what synchronization protects it?
Could instruction reordering break this?

documented_sync.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Example: Well-documented synchronization
 
// This shared state is accessed by multiple threads.
// All fields are protected by 'state_lock'.
// The lock MUST be held for any read or write.
struct SharedState {
    int count;              // Protected by state_lock
    char* buffer;           // Protected by state_lock
    size_t buffer_size;     // Protected by state_lock
} shared_state;
 
pthread_mutex_t state_lock = PTHREAD_MUTEX_INITIALIZER;  // Protects shared_state
 
 
// This counter is updated by multiple threads.
// Uses atomic operations; no lock required.
// Memory order: seq_cst for simplicity (could relax if needed).
atomic_int request_count = 0;
 
 
// This is thread-local - no synchronization needed.
__thread int thread_id = 0;
 
 
// Thread-safe increment with documentation
void increment_count() {
    pthread_mutex_lock(&state_lock);  // Acquire lock before accessing shared_state
    shared_state.count++;              // Safe: lock held
    pthread_mutex_unlock(&state_lock);
}

The Data-Race-Free Guarantee

If you rigorously follow synchronization discipline—protecting every shared variable, using atomics correctly, running race detectors—your program will be data-race-free. This earns you DRF-SC: you can reason about your program using the simple sequential consistency model, regardless of the underlying hardware.

Summary: Data Races

We've examined data races—the fundamental hazard that makes concurrent programming challenging:

Key Takeaways

•Definition — A data race is unsynchronized concurrent access to shared memory where at least one access is a write
•Consequences — In C/C++, data races are undefined behavior; any outcome is possible
•Lost updates — The classic race: two threads read-modify-write, losing one update
•Torn reads/writes — Partial observation of multi-byte values produces impossible values
•Detection — ThreadSanitizer and Helgrind catch races at runtime; use them in CI
•Prevention — Use locks for complex state, atomics for simple values, message passing where appropriate
•Language matters — Rust prevents races at compile time; other languages require runtime discipline
•Documentation — Every shared variable must have documented synchronization

What's next:

We've seen that data races occur when concurrent accesses aren't properly synchronized. But what exactly needs synchronization? The next page examines critical regions—the sections of code where race conditions can occur, and the formal requirements for safely executing them.

Page Complete

You now understand data races deeply: what they are, why they're dangerous, how to detect them, and how to prevent them. This knowledge is foundational—every subsequent topic in concurrent programming (critical regions, locks, atomics, lock-free algorithms) is about preventing or managing races correctly.

3 / 5

Loading learning content...

Operating SystemsConcurrent Programming

Concurrency Concepts

LevelIntermediate

Duration60 mins

TopicConcurrent Programming

3 / 5

Data Races

The Silent Killer of Concurrent Programs

This is the nature of data races: they lurk silently in code, working correctly most of the time, until conditions align to produce catastrophic failure. A data race can:

Pass all tests yet fail in production
Work for years then suddenly break with a new CPU or compiler
Produce different behavior on every run
Corrupt data in ways that aren't discovered until much later
Be nearly impossible to reproduce for debugging

Data races are the single most important bug class in concurrent programming, and understanding them deeply is essential for any engineer working with threads.

What You Will Learn

Defining Data Races

A data race occurs when all of the following conditions are met:

Two or more threads access the same memory location
At least one access is a write
The accesses are not ordered by synchronization (no happens-before relationship)
The accesses can occur concurrently (both could happen 'at the same time')

The formal definition:

A data race is a situation where two memory accesses in different threads target the same location, at least one is a write, and there is no happens-before relationship between them.

Note what is not required for a data race:

Both threads don't need to write—one reader and one writer is sufficient
The accesses don't need to actually happen simultaneously—potential concurrency is enough
The program doesn't need to produce wrong results—the race exists regardless of outcomes

data_race_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Example 1: Classic data race - simultaneous read and write
int counter = 0;  // Shared
 
void thread1() {
    counter = counter + 1;  // Read, modify, write - not atomic!
}
 
void thread2() {
    counter = counter + 1;  // Same: read, modify, write
}
// Both threads read and write 'counter' without synchronization
// Result after both complete: could be 1 (lost update) or 2
 
 
// Example 2: Flag-based race - looks safe but isn't
int data = 0;    // Shared
int ready = 0;   // Shared flag
 
void producer() {
    data = 42;    // Write data
    ready = 1;    // Signal reader
}
 
void consumer() {
    while (ready == 0) ;   // Wait for signal
    printf("%d
", data);  // Read data
}
// Data race on 'ready': writer and reader without synchronization
// Data race on 'data': no happens-before between write and read
// Consumer might see ready=1 but data=0 due to reordering!
 
 
// Example 3: Read-read - NOT a data race
int shared = 42;
 
void reader1() { printf("%d
", shared); }
void reader2() { printf("%d
", shared); }
// Both only read - no data race (but no useful communication either)

Data Races vs Race Conditions

These terms are often confused but are different:

Data race: Unsynchronized concurrent access to shared memory (a low-level concept about memory operations).

Race condition: A correctness bug where program behavior depends on timing (a high-level concept about program logic).

You can have race conditions without data races (using locks incorrectly) and data races without race conditions (if the program happens to work despite undefined behavior).

Why Data Races Are Dangerous

Data races aren't just bugs—in C and C++, they're undefined behavior. This has profound implications that many developers don't fully appreciate.

Undefined behavior means anything can happen:

The language specification places no constraints on program behavior when a data race exists. The compiler is free to assume data races don't exist and optimize accordingly. This can cause:

Value corruption — Torn reads/writes where a 64-bit value is partially updated may produce values that were never written by any thread.
Optimization breakage — The compiler might cache a value in a register, never re-reading from memory even though another thread changed it.
Infinite loops — A loop that checks a flag might be optimized to check once (since 'no race exists,' the flag can't change).
Time travel — Due to compiler reordering, effects of a data race may appear to happen before their causes.
Speculative execution effects — Side-channel information leakage, as demonstrated by Spectre vulnerabilities.

dangerous_race.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Example: Compiler optimization breaks racy code
 
int done = 0;  // Shared flag (non-atomic, non-volatile)
 
void wait_for_done() {
    while (!done) {
        // Wait...
    }
    // Continue after signal
}
 
// What the compiler might generate (optimized):
void wait_for_done_optimized() {
    if (!done) {
        while (1) { }  // Infinite loop!
    }
    // Compiler hoists 'done' read outside loop
    // Reasoning: "done can't change (no race allowed)"
}
 
// The compiler's optimization is CORRECT per the standard:
// - Program has a data race → undefined behavior
// - Compiler may assume any behavior for undefined behavior
// - Hoisting the read is a valid transformation
 
// Fix: Use atomic types with proper memory ordering
#include <stdatomic.h>
atomic_int done_safe = 0;
 
void wait_for_done_correct() {
    while (!atomic_load(&done_safe)) {
        // Now compiler knows another thread might change done_safe
    }
}

The Volatility Trap

The non-determinism problem:

Even when data races don't trigger dramatic effects, they introduce non-determinism:

The same code with the same inputs can produce different outputs
Behavior may change with different thread schedules, CPU loads, or timing
Adding debug output (printf, logging) changes timing and may hide bugs
Bugs may appear only in production, under load, or on certain hardware

This makes data race bugs extremely difficult to diagnose. A 'Heisenbug' that disappears when you try to observe it is often a data race.

The Lost Update Problem

The most classic data race pattern is the lost update, where two threads try to update the same value and one update is lost.

Consider incrementing a counter:

counter = counter + 1;

This looks like one operation, but it's actually three:

Read: Load current value of counter into register
Modify: Add 1 to the register value
Write: Store the register value back to counter

With two threads executing concurrently, the interleaving can cause updates to be lost:

Timeline of a Lost Update (counter starts at 0)
Time	Thread 1	Thread 2	counter value
T1	Read counter → 0	–	0
T2	–	Read counter → 0	0
T3	Add 1 → 1	–	0
T4	–	Add 1 → 1	0
T5	Write 1 → counter	–	1
T6	–	Write 1 → counter	1

Expected result: 2 (both increments applied) Actual result: 1 (Thread 2's increment overwrote Thread 1's)

The update from Thread 1 was lost because Thread 2 read the old value before Thread 1's write was visible.

This generalizes to any read-modify-write pattern:

Bank account: balance = balance - withdrawal (lost withdrawals → wrong balance)
Inventory: stock = stock - order (lost orders → overselling)
Statistics: total += measurement (lost data → wrong statistics)
Reference counting: refcount-- (lost decrement → memory leak)

In all these cases, the pattern is the same: read, compute, write—without atomicity, the race exists.

lost_update_demo.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Demonstration: Lost updates in practice
#include <stdio.h>
#include <pthread.h>
 
int counter = 0;
#define ITERATIONS 1000000
 
void* increment(void* arg) {
    for (int i = 0; i < ITERATIONS; i++) {
        counter = counter + 1;  // Data race!
    }
    return NULL;
}
 
int main() {
    pthread_t t1, t2;
    
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    
    printf("Expected: %d
", 2 * ITERATIONS);  // 2,000,000
    printf("Actual: %d
", counter);            // Often < 2,000,000
    
    return 0;
}
 
// Typical output (varies on each run):
// Expected: 2000000
// Actual: 1387429
//
// Hundreds of thousands of updates lost to the race!

The Fix: Atomic Operations

Torn Reads and Writes

Torn reads/writes occur when a multi-byte value is partially updated by one thread while another thread reads it, resulting in a chimeric value that was never written.

When tearing can occur:

The C standard does not guarantee that any type is accessed atomically. Whether tearing occurs depends on:

Data size vs. native word size — Writing a 64-bit value on a 32-bit CPU requires two memory operations. Another thread may see half old, half new.
Alignment — Misaligned data may cross cache lines, requiring multiple memory accesses.
Hardware architecture — Even correctly-sized, aligned data may not be atomic on all hardware.

Example: 64-bit value on 32-bit CPU

Torn Write Example (64-bit value as two 32-bit halves)
Time	Thread 1 (Writer)	Thread 2 (Reader)	Value in Memory
Initial	–	–	0x0000000000000000
T1	Write low 32 bits of 0xFFFFFFFFFFFFFFFF	–	0x00000000FFFFFFFF
T2	–	Read 64-bit value	Reads 0x00000000FFFFFFFF
T3	Write high 32 bits	–	0xFFFFFFFFFFFFFFFF

The reader sees 0x00000000FFFFFFFF—a value that Thread 1 never intended to write! This is a torn read.

Real-world consequences:

Pointers: A torn pointer read could produce an invalid address, causing a crash or accessing arbitrary memory
Floating point: A torn double could be NaN, Inf, or a denormalized number, causing calculation errors
Timestamps: A torn timestamp could appear to be far in the past or future
Counters: A torn counter could jump backwards or show impossible values

What guarantees atomicity?

torn_read_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Example: Torn reads with a struct
 
struct Coordinate {
    int x;
    int y;
};
 
struct Coordinate position = {0, 0};
 
// Thread 1: Updates position atomically (conceptually)
void update_position(int new_x, int new_y) {
    position.x = new_x;
    position.y = new_y;  // Not atomic with x write!
}
 
// Thread 2: Reads position
void read_position() {
    int x = position.x;
    int y = position.y;  // May see old x, new y or vice versa
    printf("Position: (%d, %d)
", x, y);
}
 
// If Thread 1 writes (10, 20) while position is (0, 0):
// Thread 2 might read:
//   (0, 0)   - saw nothing yet
//   (10, 0)  - saw x update, not y
//   (0, 20)  - saw y update, not x (reordering!)
//   (10, 20) - saw both
//
// The (10, 0) and (0, 20) cases are torn reads of the struct
 
// Fix: Use a mutex to make the entire update atomic
pthread_mutex_t pos_lock = PTHREAD_MUTEX_INITIALIZER;
 
void update_position_safe(int new_x, int new_y) {
    pthread_mutex_lock(&pos_lock);
    position.x = new_x;
    position.y = new_y;
    pthread_mutex_unlock(&pos_lock);
}

Compound Values Are Never Atomic

Detecting Data Races

Data races are notoriously difficult to detect through testing alone because they depend on thread scheduling, which varies non-deterministically. Fortunately, powerful tools exist:

1. ThreadSanitizer (TSan)

Instrumentation-based detection built into Clang and GCC. Tracks all memory accesses and synchronization to detect races at runtime.

Compile with -fsanitize=thread
Slowdown: ~10-20×
Memory overhead: ~10×
Detects races as they occur, with precise stack traces

2. Helgrind (Valgrind tool)

Dynamic race detection using Valgrind's binary instrumentation.

Run with valgrind --tool=helgrind ./program
Slower than TSan (~20-30×)
Detects races, lock order violations, misuse of pthreads API

3. Intel Inspector

Commercial race detection for Intel platforms.

Precise detection with lower overhead than Helgrind
GUI for exploring races and fix recommendations

4. Static Analysis

Tools like Coverity, PVS-Studio, and Clang's static analyzer can identify potential races without running code.

Fast, no runtime overhead
May produce false positives
Cannot detect all races (undecidable in general)

tsan_example.sh
Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Compile with ThreadSanitizer
$ clang -fsanitize=thread -g -O1 race_program.c -o race_program
 
# Run - TSan will report any races detected
$ ./race_program
 
==================
WARNING: ThreadSanitizer: data race (pid=12345)
  Write of size 4 at 0x... by thread T2:
    #0 increment race_program.c:10 (race_program+0x...)
    #1 thread_start ...
 
  Previous write of size 4 at 0x... by thread T1:
    #0 increment race_program.c:10 (race_program+0x...)
    #1 thread_start ...
 
  Location is global 'counter' of size 4 at 0x...
 
  Thread T2 (tid=..., running) created by main thread at:
    #0 pthread_create ...
    #1 main race_program.c:20 (race_program+0x...)
 
SUMMARY: ThreadSanitizer: data race race_program.c:10 in increment
==================

Run TSan in CI

Limitations of race detection:

Dynamic tools only find executed races — Code paths not exercised during testing won't be checked
Scheduling-dependent — A race may not manifest during the instrumented run
No guarantee of completeness — Race detection is undecidable in general; tools catch most practical cases but can miss some
Overhead may hide races — The slowdown from instrumentation changes timing, potentially preventing manifestation

Despite limitations, race detectors catch the vast majority of races in practice and are essential tools for concurrent programming.

Preventing Data Races

The fundamental principle for preventing data races:

If a memory location is accessed by multiple threads and at least one access is a write, all accesses must be synchronized.

There are several strategies to achieve this:

Strategy 1: Mutual Exclusion (Locks)

Protect all accesses to shared data with a mutex. Only one thread can hold the lock at a time, preventing concurrent access.

Strategy 2: Atomic Operations

Use atomic types and operations (C11 _Atomic, C++ std::atomic) for simple shared state. The hardware guarantees atomicity.

Strategy 3: Message Passing

Instead of sharing memory, have threads communicate by sending messages (channels, queues). Shared state is eliminated.

Strategy 4: Immutability

Data that is never modified after creation is inherently safe for concurrent reads. Many functional programming patterns leverage this.

Strategy 5: Thread Confinement

Design so that each piece of data is only accessed by one thread. No sharing = no races.

Mutex Approach

•Pros: Simple conceptual model
•Pros: Works for complex data structures
•Pros: Composition of operations
•Cons: Risk of deadlock if lock ordering violated
•Cons: Performance overhead
•Cons: Can't scale to many threads

Atomic Approach

•Pros: Very low overhead
•Pros: No deadlock risk
•Pros: Scales to many threads
•Cons: Only works for simple values
•Cons: Complex algorithms are hard
•Cons: Memory ordering is tricky

race_prevention.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <stdatomic.h>
#include <pthread.h>
 
// ===== Strategy 1: Mutex =====
int counter_mutex = 0;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
 
void increment_with_mutex() {
    pthread_mutex_lock(&lock);
    counter_mutex++;  // Protected by lock
    pthread_mutex_unlock(&lock);
}
 
 
// ===== Strategy 2: Atomics =====
atomic_int counter_atomic = 0;
 
void increment_with_atomic() {
    atomic_fetch_add(&counter_atomic, 1);  // Single atomic operation
}
 
 
// ===== Strategy 3: Thread Confinement =====
__thread int thread_local_counter = 0;  // Each thread has its own
 
void increment_local() {
    thread_local_counter++;  // No sharing = no race
}
 
 
// ===== Strategy 4: Immutable Data =====
// Once created, 'config' is never modified
const struct Config* get_config() {
    static const struct Config config = { .timeout = 30 };
    return &config;  // Safe to read from any thread
}

Choose Based on Context

Data Races in Different Languages

Different programming languages handle data races differently:

C and C++

Java

Rust

Language Approaches to Data Races
Language	Data Race Semantics	Prevention Mechanism	Detection
C/C++	Undefined behavior	Programmer discipline + atomics/mutexes	TSan, static analysis
Java	Weak but defined	volatile, synchronized, java.util.concurrent	Race detectors exist
Go	Unsafe but not UB	Channels preferred; mutexes available	Built-in race detector
Rust	Compile-time prevented	Ownership system prevents at compile time	N/A (can't compile)
Python	GIL prevents true races*	Global Interpreter Lock (for CPython)	Limited parallelism
JavaScript	Single-threaded**	Event loop; SharedArrayBuffer needs Atomics	N/A for main thread

*Python's GIL prevents simultaneous bytecode execution but doesn't prevent race conditions in logic.

**JavaScript's main thread is single-threaded, but Web Workers can share memory via SharedArrayBuffer, requiring explicit atomics.

Rust's Revolution

Case Studies: Historic Data Race Bugs

Data races have caused numerous real-world failures. Understanding these helps appreciate why race prevention matters.

Therac-25 (1985-1987)

2003 Northeast Blackout

Knight Capital (2012)

Linux Kernel Privilege Escalation (CVE-2016-5195, 'Dirty COW')

Races Kill

Lessons from Historic Races

•Races can hide for years — Code may appear correct until specific conditions trigger the bug
•Testing alone is insufficient — Races are timing-dependent and may not manifest during testing
•Severity can be catastrophic — From security vulnerabilities to loss of life
•Concurrent code requires rigorous analysis — Every shared variable must be analyzed for potential races
•Defense in depth — Use race detectors, static analysis, code review, and formal methods for critical code

The Discipline of Race-Free Programming

Writing race-free concurrent code requires disciplined practices:

1. Document thread-safety invariants

For every shared variable, document:

Which threads may access it
Which lock protects it (if any)
The memory ordering used (for atomics)

2. Minimize sharing

The safest race is one that can't happen because data isn't shared. Design for thread confinement and message passing where possible.

3. Hold locks for the minimum time

Long critical sections increase contention and the window for mistakes. Enter, do the minimum necessary, exit.

4. Use established patterns

Producer-consumer, readers-writers, and other patterns have well-understood solutions. Use them rather than inventing ad-hoc synchronization.

5. Review concurrent code carefully

Every concurrent code review should ask:

Is this variable accessed by multiple threads?
If so, what synchronization protects it?
Could instruction reordering break this?

documented_sync.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Example: Well-documented synchronization
 
// This shared state is accessed by multiple threads.
// All fields are protected by 'state_lock'.
// The lock MUST be held for any read or write.
struct SharedState {
    int count;              // Protected by state_lock
    char* buffer;           // Protected by state_lock
    size_t buffer_size;     // Protected by state_lock
} shared_state;
 
pthread_mutex_t state_lock = PTHREAD_MUTEX_INITIALIZER;  // Protects shared_state
 
 
// This counter is updated by multiple threads.
// Uses atomic operations; no lock required.
// Memory order: seq_cst for simplicity (could relax if needed).
atomic_int request_count = 0;
 
 
// This is thread-local - no synchronization needed.
__thread int thread_id = 0;
 
 
// Thread-safe increment with documentation
void increment_count() {
    pthread_mutex_lock(&state_lock);  // Acquire lock before accessing shared_state
    shared_state.count++;              // Safe: lock held
    pthread_mutex_unlock(&state_lock);
}

The Data-Race-Free Guarantee

Summary: Data Races

We've examined data races—the fundamental hazard that makes concurrent programming challenging:

Key Takeaways

•Definition — A data race is unsynchronized concurrent access to shared memory where at least one access is a write
•Consequences — In C/C++, data races are undefined behavior; any outcome is possible
•Lost updates — The classic race: two threads read-modify-write, losing one update
•Torn reads/writes — Partial observation of multi-byte values produces impossible values
•Detection — ThreadSanitizer and Helgrind catch races at runtime; use them in CI
•Prevention — Use locks for complex state, atomics for simple values, message passing where appropriate
•Language matters — Rust prevents races at compile time; other languages require runtime discipline
•Documentation — Every shared variable must have documented synchronization

What's next:

Page Complete

3 / 5