Loading content...
Race conditions appear in countless forms across software systems. While the underlying principles remain constant—concurrent access to shared mutable state without proper synchronization—the specific manifestations vary dramatically.
This page presents a comprehensive catalog of race condition examples, organized by category and level of abstraction. Each example illustrates not just what goes wrong, but why the race exists and how it could be prevented. Developing this pattern recognition is essential for identifying races in your own code and systems.
These are not hypothetical scenarios—each represents patterns that have caused real production failures, security vulnerabilities, or data corruption in deployed systems.
By the end of this page, you will: (1) Recognize common race condition patterns across multiple domains, (2) Understand why specific coding patterns lead to races, (3) Identify race-prone situations in new code, (4) Apply pattern knowledge to prevent races in your own designs, and (5) Build intuition for where races hide in complex systems.
The simplest and most common race conditions involve counters, accumulators, and statistics—any variable that multiple threads update based on its current value.
12345678910111213141516171819202122
# VULNERABLE: Counter race in a web server request counter class RequestMetrics: def __init__(self): self.request_count = 0 self.total_response_time = 0.0 def record_request(self, response_time_ms): # RACE: Read-modify-write on request_count self.request_count = self.request_count + 1 # RACE: Read-modify-write on total_response_time self.total_response_time = self.total_response_time + response_time_ms def get_average_response_time(self): # RACE: Reading two related values non-atomically # count could increase between reading total and count if self.request_count == 0: return 0 return self.total_response_time / self.request_count # May be inconsistent! # Multiple threads handling requests simultaneously will lose updatesWhy this fails: Each increment involves reading the current value, adding one, and writing back. If two threads read the same value (e.g., 42), they both compute 43 and write 43—one increment is lost.
Real impact: In a high-traffic web server, the request count might report 800,000 requests when there were actually 1,000,000. Performance metrics become unreliable.
1234567891011121314151617181920212223242526
// VULNERABLE: Classic bank account race public class BankAccount { private double balance; public BankAccount(double initialBalance) { this.balance = initialBalance; } public void withdraw(double amount) { // RACE: Check-then-act without atomicity if (balance >= amount) { // VULNERABILITY WINDOW: Another thread could withdraw here balance = balance - amount; // May overdraw! } } public void transfer(BankAccount destination, double amount) { // RACE: Multiple operations that should be atomic if (this.balance >= amount) { this.balance -= amount; // What if we fail after this? destination.balance += amount; // And before this? } // Non-atomic transfer: money could disappear if interrupted! }}The transfer() race is particularly severe: if the thread is interrupted between the two balance updates, money is destroyed—debited from one account but never credited to the other. In real financial systems, this would cause books to not balance, triggering auditing nightmares.
Lazy initialization patterns—where resources are created on first use—are fertile ground for race conditions.
12345678910111213141516171819202122232425262728293031323334353637
// VULNERABLE: Classic broken double-checked locking public class Singleton { private static Singleton instance; // NOT volatile! public static Singleton getInstance() { if (instance == null) { // First check (no lock) synchronized (Singleton.class) { if (instance == null) { // Second check (with lock) instance = new Singleton(); // PROBLEM HERE } } } return instance; } private Singleton() { // Complex initialization... }} /* * WHY THIS FAILS: * * The line "instance = new Singleton()" is NOT atomic. It involves: * 1. Allocate memory for Singleton * 2. Invoke constructor to initialize fields * 3. Assign reference to instance * * The compiler/CPU can REORDER steps 2 and 3! * * Thread A: Allocates memory, assigns to instance (step 3) * Thread A: ...about to run constructor (step 2) * Thread B: if (instance == null) -> FALSE (sees non-null reference) * Thread B: return instance // Returns UNINITIALIZED object! * Thread B: Tries to use object, crashes or corrupts data */The fix: Mark instance as volatile. This prevents the reordering and ensures that when a thread sees a non-null instance, the object is fully constructed.
123456789101112131415161718192021222324
# VULNERABLE: Lazy cache population race class ExpensiveComputationCache: def __init__(self): self._cache = {} def get_result(self, key): if key not in self._cache: # RACE: Multiple threads may compute simultaneously result = self._expensive_compute(key) # Takes 5 seconds # RACE: Multiple threads may write different results # if computation is not deterministic self._cache[key] = result return self._cache[key] def _expensive_compute(self, key): # Expensive operation... return computed_result # Problem 1: Multiple threads waste resources computing the same key# Problem 2: In Python 3.5, dict operations weren't thread-safe# Problem 3: If _expensive_compute has side effects, they happen multiple timesLazy initialization races appear in singletons, caches, connection pools, configuration loading, and plugin systems. The pattern 'check if initialized, if not initialize' is almost always racy without synchronization. Use language-provided mechanisms: Java's synchronized/volatile, Python's threading.Lock, or lazy initialization frameworks that handle this correctly.
Data structures—lists, maps, trees—maintain internal invariants that can be violated by concurrent modification.
123456789101112131415161718192021222324252627282930313233343536
// VULNERABLE: Concurrent ArrayList modification List<String> list = new ArrayList<>(); // Thread 1: Adding elementsvoid addItems() { for (int i = 0; i < 10000; i++) { list.add("item-" + i); // RACE: ArrayList is not thread-safe }} // Thread 2: Iteratingvoid processItems() { for (String item : list) { // RACE: ConcurrentModificationException process(item); // Or worse: silent corruption }} // Thread 3: Removing elementsvoid removeItems() { for (int i = 0; i < list.size(); i++) { // RACE: size() changes during loop if (shouldRemove(list.get(i))) { list.remove(i); // RACE: indices shift, may skip/double-process i--; // Naive "fix" still races } }} /* * FAILURE MODES: * - ArrayIndexOutOfBoundsException * - ConcurrentModificationException * - Null elements where there shouldn't be * - Missing elements (never processed) * - Corrupted internal array (size != actual elements) */12345678910111213141516171819202122232425262728293031
// VULNERABLE: HashMap corruption leading to infinite loop Map<Integer, String> map = new HashMap<>(); // Multiple threads inserting simultaneouslyvoid insertMany() { for (int i = 0; i < 100000; i++) { // RACE: HashMap.put() is NOT thread-safe map.put(i, "value-" + i); }} /* * WHAT CAN HAPPEN: * * HashMap internally uses a linked list for hash collisions. * Concurrent insertions can corrupt the linked list into a CYCLE. * * When get() or iteration traverses a cyclic list: * while (e != null) { e = e.next; } // INFINITE LOOP * * The thread hangs forever, consuming 100% CPU. * * This is not hypothetical—it has crashed production systems, * including instances of the HotSpot JVM itself during startup. * * Java HashMap javadoc explicitly warns: * "If multiple threads access a hash map concurrently, and at least * one of the threads modifies the map structurally, it must be * synchronized externally." */The HashMap infinite loop race has taken down countless production services. Because the loop consumes CPU without making progress or throwing exceptions, it often isn't detected until the system becomes unresponsive. Use ConcurrentHashMap for concurrent access, or ensure external synchronization.
The check-then-act pattern—testing a condition and acting on it—is inherently racy if the condition can change between check and action.
12345678910111213141516171819202122232425262728
// VULNERABLE: Naive put-if-absent Map<String, ExpensiveObject> cache = Collections.synchronizedMap(new HashMap<>()); public ExpensiveObject getOrCreate(String key) { // Each operation is atomic, but the COMBINATION is not! if (!cache.containsKey(key)) { // Check // VULNERABILITY WINDOW // Another thread could insert the same key here! ExpensiveObject obj = new ExpensiveObject(); // Wasted work cache.put(key, obj); // Act: may overwrite other thread's value } return cache.get(key); // RACE: May return null if removed between put and get!} /* * PROBLEMS: * 1. Two threads may both see key missing, both create objects * 2. One thread's object gets overwritten and lost * 3. If ExpensiveObject has side effects, they happen twice * 4. Even worse: cache.get(key) could return null if another * thread removed the key between put() and get() * * FIX: Use computeIfAbsent() which is atomic: * cache.computeIfAbsent(key, k -> new ExpensiveObject()) */123456789101112131415161718192021222324252627
# VULNERABLE: Directory creation race import os def ensure_directory(path): if not os.path.exists(path): # Check # VULNERABILITY WINDOW # Another process could create the directory OR # create a SYMLINK to somewhere else! os.mkdir(path) # Act: may fail or create in wrong location # Assume directory now exists and we control it write_to_directory(path) # DANGEROUS if it's not what we expected # PROBLEMS:# 1. FileExistsError if another process creates first# 2. Security: attacker creates symlink, we write to wrong location# 3. Race between existence check and our use of the directory # SAFER APPROACH:def safe_ensure_directory(path): try: os.mkdir(path) # Atomic: creates or fails except FileExistsError: pass # Already exists, that's fine # But still should verify it's a directory we own!Whenever you see 'if not exists, create' or 'if exists, use', suspect a race condition. The safe pattern is usually 'try the operation, handle failure' (EAFP: Easier to Ask Forgiveness than Permission). Atomic operations eliminate the window between check and act.
State machines with shared state are particularly vulnerable because state transitions involve both checking the current state and updating it.
12345678910111213141516171819202122232425262728293031323334353637383940
// VULNERABLE: Connection state machine race public class Connection { private enum State { DISCONNECTED, CONNECTING, CONNECTED, DISCONNECTING } private State state = State.DISCONNECTED; public void connect() { if (state == State.DISCONNECTED) { state = State.CONNECTING; // RACE: Two threads could both see DISCONNECTED performConnect(); // Network operation takes time state = State.CONNECTED; // RACE: What if disconnect() called during connect? } } public void disconnect() { if (state == State.CONNECTED) { state = State.DISCONNECTING; // RACE: connect() might be in progress performDisconnect(); state = State.DISCONNECTED; // RACE: Might overwrite CONNECTING state } } public void send(byte[] data) { if (state == State.CONNECTED) { // RACE: State could change after this check doSend(data); // Send on disconnected socket! } }} /* * FAILURE MODES: * - Two simultaneous connects create two sockets (resource leak) * - Disconnect during connect leaves inconsistent state * - Send after disconnect check passes, socket already closed * - State ends up in impossible combination (CONNECTING + doDisconnect happening) */1234567891011121314151617181920212223242526272829303132333435363738394041
// VULNERABLE: Flag-based synchronization (broken) // Thread 1: Producerint data;int ready = 0; void producer() { data = 42; // Write data ready = 1; // Signal that data is ready} // Thread 2: Consumervoid consumer() { while (ready == 0) { // Spin, waiting for ready flag } printf("%d\n", data); // Expect 42} /* * WHAT CAN GO WRONG: * * 1. COMPILER REORDERING: * Compiler may reorder "data = 42" after "ready = 1" * (They're independent writes to the compiler) * * 2. CPU REORDERING: * Store to 'data' may be buffered, while store to 'ready' * propagates to consumer first * * 3. CONSUMER SEES: * ready = 1, data = 0 (old value) * Prints 0 instead of 42 * * 4. WORSE ON WEAK MEMORY MODELS: * On ARM, this can RELIABLY fail without memory barriers * * FIX: Use atomics with proper memory ordering: * atomic_store_explicit(&ready, 1, memory_order_release); * atomic_load_explicit(&ready, memory_order_acquire); */A common mistake is using simple boolean/integer flags to coordinate threads. Without memory barriers or atomics, flag updates may not be visible across threads in the expected order. Always use proper synchronization primitives: locks, condition variables, or atomic operations with explicit memory ordering.
Managing the lifecycle of resources—allocation, use, and deallocation—across threads is error-prone. Use-after-free and double-free are the most dangerous outcomes.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
// VULNERABLE: Use-after-free race struct Connection { int socket_fd; char buffer[1024]; // ...}; struct Connection* global_conn = NULL; // Thread 1: Processing loopvoid process_connection() { while (global_conn != NULL) { // Use the connection read(global_conn->socket_fd, global_conn->buffer, 1024); process_data(global_conn->buffer); }} // Thread 2: Cleanup on errorvoid cleanup_on_error() { if (global_conn != NULL) { struct Connection* temp = global_conn; global_conn = NULL; // Signal to other thread // BUG: Thread 1 might be in middle of using global_conn! // The NULL check in Thread 1's while loop may have passed // BEFORE we set global_conn = NULL close(temp->socket_fd); // Close while Thread 1 might be reading! free(temp); // Free while Thread 1 holds a reference! }} /* * RACE TIMELINE: * T1: Check global_conn != NULL -> TRUE * T1: Enters loop body * T2: Sets global_conn = NULL * T2: Closes socket, frees memory * T1: read(global_conn->socket_fd, ...) // CRASH: accessing freed memory * * Result: Undefined behavior, crash, or security vulnerability */1234567891011121314151617181920212223242526272829303132333435363738394041424344
// VULNERABLE: Naive reference counting class RefCounted { int refcount = 1; // Non-atomic! public: void addRef() { refcount++; // RACE: Not atomic } void release() { refcount--; // RACE: Two threads may both decrement from 2 to 1 if (refcount == 0) { delete this; // RACE: Might double-free } }}; /* * RACE SCENARIO: * * refcount = 2 (two threads hold references) * * Thread A: reads refcount (2) * Thread B: reads refcount (2) * Thread A: computes 2 - 1 = 1 * Thread B: computes 2 - 1 = 1 * Thread A: writes refcount = 1 * Thread B: writes refcount = 1 * * refcount = 1 (should be 0!) * Object never freed -> memory leak * * WORSE SCENARIO: * * Thread A: decrements refcount to 0 * Thread B: about to decrement * Thread A: calls delete this * Thread B: decrements freed memory * Thread B: refcount appears to be -1 or anything * Thread B: may also call delete this -> double free! * * FIX: Use atomic operations for refcount */Use-after-free and double-free races are not just crashes—they're exploitable security vulnerabilities. Attackers can control the freed memory contents and manipulate program execution. These race conditions regularly appear in CVEs for browsers, operating systems, and databases.
Distributed systems face race conditions at the network level, where message ordering and timing are fundamentally unpredictable.
123456789101112131415161718192021222324252627282930313233343536373839
# VULNERABLE: Inventory oversell in distributed e-commerce class InventoryService: def try_purchase(self, product_id, quantity): # Read current inventory from database current_stock = db.query( "SELECT quantity FROM inventory WHERE product_id = ?", product_id ) if current_stock >= quantity: # RACE WINDOW: Many concurrent requests pass this check # All see the same stock level, all think they can proceed # Update inventory db.execute( "UPDATE inventory SET quantity = quantity - ? WHERE product_id = ?", quantity, product_id ) # Create order create_order(product_id, quantity) return True return False # SCENARIO:# Product has 1 unit in stock# 10 concurrent purchase requests arrive# All 10 read current_stock = 1# All 10 pass the if check# All 10 decrement inventory (quantity goes to -9!)# All 10 create orders# Result: 10 orders fulfilled from 1 unit of inventory # FIX: Use atomic conditional update:# UPDATE inventory SET quantity = quantity - ?# WHERE product_id = ? AND quantity >= ?# Then check rows_affected to see if it succeeded12345678910111213141516171819202122232425262728293031
# VULNERABLE: Distributed like counter race class LikeService: def add_like(self, post_id, user_id): # Check if user already liked (prevent double-like) already_liked = redis.sismember(f"likes:{post_id}", user_id) if not already_liked: # RACE: Two requests from same user could both reach here # Add to set of users who liked redis.sadd(f"likes:{post_id}", user_id) # Increment like count redis.incr(f"like_count:{post_id}") # RACE RESULT: Like count and set member count may differ! # The set correctly has user once (idempotent) # But we may have incremented the counter twice def remove_like(self, post_id, user_id): # Check if user actually liked did_like = redis.sismember(f"likes:{post_id}", user_id) if did_like: # RACE: Unlike and Like could interleave badly redis.srem(f"likes:{post_id}", user_id) redis.decr(f"like_count:{post_id}") # FIX: Use Lua script for atomic check-and-modify in Redis# Or use transactions (MULTI/EXEC) when supportedUnlike local races, distributed races span network boundaries with variable latency. Testing may never trigger them because network timing is more consistent in dev/test than production. Load testing with artificial delays helps expose distributed races.
Signal handlers in Unix-like systems introduce an often-overlooked form of concurrency. A signal can interrupt the main program at almost any point, creating race conditions even in single-threaded programs.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
// VULNERABLE: Calling non-async-signal-safe functions from handler #include <signal.h>#include <stdio.h>#include <stdlib.h> volatile int should_exit = 0; void sigint_handler(int sig) { // RACE: These functions are NOT async-signal-safe! printf("Caught SIGINT, cleaning up...\n"); // Uses internal buffers // RACE: malloc/free use global heap state char* message = malloc(100); // May deadlock or corrupt heap! free(message); exit(0); // May not be safe depending on what main was doing} int main() { signal(SIGINT, sigint_handler); while (!should_exit) { // Imagine main is doing: printf("Processing...\n"); // RACE: Handler could interrupt mid-printf! char* data = malloc(1024); // RACE: Handler could call malloc at same time process(data); free(data); }} /* * RACE SCENARIO: * * 1. main() is inside printf(), which holds an internal lock on stdout * 2. SIGINT arrives, handler runs * 3. Handler calls printf(), tries to acquire same lock * 4. DEADLOCK: Handler waits for lock held by interrupted main() * But main() can never run to release the lock! * * Or with malloc: * 1. main() is modifying heap metadata in malloc() * 2. Signal arrives, handler calls malloc() * 3. Heap metadata is corrupted * 4. Future allocations fail or corrupt memory */12345678910111213141516171819202122232425262728293031323334353637
// SECURE: Minimal signal handler #include <signal.h>#include <unistd.h> volatile sig_atomic_t got_signal = 0; void sigint_handler(int sig) { // ONLY do async-signal-safe operations: // - Set a volatile sig_atomic_t flag // - Call other async-signal-safe functions (write(), _Exit(), signal()) got_signal = 1; // NOT safe: printf, malloc, free, exit(), mutexes, etc.} int main() { struct sigaction sa; sa.sa_handler = sigint_handler; sigemptyset(&sa.sa_mask); sa.sa_flags = 0; sigaction(SIGINT, &sa, NULL); while (!got_signal) { // Do work... process_item(); // Periodically check the flag and handle in main context if (got_signal) { // NOW it's safe to do cleanup printf("Caught signal, exiting cleanly\n"); cleanup_resources(); break; } } return 0;}POSIX specifies a limited set of async-signal-safe functions that can be called from signal handlers. The safest approach: set a flag in the handler, do all real work in the main program. Common unsafe functions: printf, malloc, free, most of stdio, mutex operations, and many more.
We've explored a comprehensive catalog of race condition examples. The key to preventing races is recognizing the patterns that create them.
| Pattern | Core Problem | Prevention |
|---|---|---|
| Counter/Accumulator | Read-modify-write not atomic | Use atomic operations or locks |
| Singleton/Lazy Init | Check-initialize not atomic, reordering | Use language lazy init patterns, volatile |
| Collection Modification | Internal invariants violated | Use concurrent collections |
| Check-Then-Act | Condition changes after check | Atomic conditional operations |
| State Machine | Transition not atomic | Lock entire state machine or use atomic CAS |
| Resource Lifecycle | Use after free/close | Reference counting (atomic), RAII |
| Distributed Operations | Network delays between operations | Atomic transactions, consensus |
| Signal Handlers | Interrupts at arbitrary points | Minimal handlers, async-safe only |
You now have a catalog of race condition patterns to recognize in code. The next page covers detection techniques—how to find race conditions through testing, static analysis, and dynamic analysis when prevention fails or to verify correctness.