Loading learning content...
You've carefully implemented a lock-free data structure using Compare-and-Swap (CAS). The logic seems bulletproof: check if the value is what you expect, swap atomically. Yet intermittently—perhaps once in a million operations, perhaps only under specific timing conditions—your data structure corrupts. The bug is nearly impossible to reproduce in testing, but devastating in production.
Welcome to the ABA problem, one of the most insidious issues in concurrent programming. The name comes from a simple scenario: a memory location starts with value A, changes to B, then changes back to A. A CAS operation sees the value is A (matching expectations) and succeeds—but the context has changed entirely. The algorithm's invariants are silently violated.
The ABA problem affects not just lock-free data structures but also certain implementations of locks and synchronization primitives. Understanding it is essential for anyone working with low-level concurrency, and knowing how to prevent it separates robust systems from ticking time bombs.
By the end of this page, you will understand: (1) The precise nature of the ABA problem and why it occurs, (2) Real-world scenarios where ABA causes failures, (3) Solutions including hazard pointers, tagged pointers, and memory reclamation schemes, (4) How ABA relates to lock implementations, and (5) Practical strategies for ABA-safe programming.
The ABA problem arises from a fundamental limitation of Compare-and-Swap (and related operations like Test-and-Set): they only compare values, not history. CAS doesn't know if a value changed and returned—it only sees the current state.
Formal Definition:
The ABA problem occurs when:
Why Is This a Problem?
CAS-based algorithms often use pointer values. The assumption is: "if the pointer hasn't changed, the data structure hasn't changed." But if a node is removed and a new node is allocated at the same address (reused memory), the pointer value is the same while the data is completely different.
12345678910111213141516171819202122232425262728293031323334353637
// Lock-free stack with ABA vulnerabilitystruct node { int data; struct node *next;}; struct stack { struct node *head; // Atomic pointer}; // Push operation - ABA safe (creates new nodes)void push(struct stack *s, int value) { struct node *new_node = malloc(sizeof(*new_node)); new_node->data = value; do { new_node->next = s->head; } while (!CAS(&s->head, new_node->next, new_node));} // Pop operation - ABA VULNERABLE!int pop(struct stack *s) { struct node *old_head; struct node *new_head; do { old_head = s->head; if (old_head == NULL) return -1; // Stack empty new_head = old_head->next; // Read next pointer // DANGER: If we're preempted here, old_head could be // freed and reallocated to a different node! } while (!CAS(&s->head, old_head, new_head)); int data = old_head->data; free(old_head); // This free makes ABA possible return data;}Thread 1 reads head pointing to node A (with next→B). Thread 1 stalls. Thread 2 pops A and B, pushes new node C. If C is allocated at A's old address, Thread 1's CAS succeeds, setting head to B—but B was already popped! The stack now points to freed memory.
Let's trace through the ABA problem in our lock-free stack implementation step by step. This detailed walkthrough reveals exactly how corruption occurs.
Initial State:
Stack contains: A → B → C (A is on top)
| Step | Thread 1 | Thread 2 | Stack State |
|---|---|---|---|
| 1 | read head → A | — | head→A→B→C |
| 2 | read A.next → B | — | head→A→B→C |
| 3 | (stalled) | pop() begins | head→A→B→C |
| 4 | (stalled) | CAS(A→B) succeeds, free(A) | head→B→C |
| 5 | (stalled) | pop() again | head→B→C |
| 6 | (stalled) | CAS(B→C) succeeds, free(B) | head→C |
| 7 | (stalled) | push(X) allocates at A's old address | head→X→C |
| 8 | resumes | — | head→X→C |
| 9 | CAS(head, A, B) | — | head→X→C |
| 10 | CAS SUCCEEDS! (A's address = X's address) | — | head→B (FREED!) |
The Critical Insight:
Thread 1's cached value of new_head (which pointed to B) is now a dangling pointer. B has been freed. When Thread 1's CAS succeeds—because it compares the address, not the identity—it sets head to point to freed memory.
Subsequent operations on the stack:
12345678910111213141516171819202122232425262728293031323334353637383940
Detailed Memory View During ABA: STEP 1-2 (Thread 1 reads):┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ head → 0x100├───►│ A @ 0x100 ├───►│ B @ 0x200 ├───► ...└─────────────┘ │ data: 10 │ │ data: 20 │ │ next: 0x200 │ │ next: 0x300 │ └─────────────┘ └─────────────┘ Thread 1 state: old_head = 0x100, new_head = 0x200 STEP 4-6 (Thread 2 pops A and B):┌─────────────┐ ┌─────────────┐│ head → 0x300├───►│ C @ 0x300 ├───► ...└─────────────┘ │ data: 30 │ │ next: 0x400 │ └─────────────┘ Memory at 0x100 and 0x200 is FREE (heap) STEP 7 (Thread 2 push reallocates at 0x100):┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ head → 0x100├───►│ X @ 0x100 ├───►│ C @ 0x300 ├───► ...└─────────────┘ │ data: 99 │ │ data: 30 │ │ next: 0x300 │ │ next: 0x400 │ └─────────────┘ └─────────────┘ Note: 0x100 is reused! Different node, same address.Thread 1 still has: old_head = 0x100, new_head = 0x200 STEP 10 (Thread 1 CAS succeeds):CAS(&head, 0x100, 0x200) → compares head (0x100) with expected (0x100) → MATCH!head := 0x200 ┌─────────────┐ │ head → 0x200│ ← POINTS TO FREED MEMORY!└─────────────┘ ┌─────────────┐ │ ??? @ 0x200 │ ← May be reallocated to anything └─────────────┘ABA bugs are notoriously difficult to debug because: (1) they require specific timing to manifest, (2) the corruption may not be detected until much later, (3) debugger probes change timing and hide the bug, (4) symptoms vary—crashes, wrong results, or nothing visible. Prevention is far easier than debugging.
While we've focused on lock-free data structures, the ABA problem can also affect lock implementations, particularly those using CAS-based state machines or owner tracking.
Fortunately, Simple Spinlocks Are Safe:
The basic TAS spinlock using 0/1 states does NOT suffer from ABA because:
123456789101112131415161718
// This spinlock is ABA-safe void acquire() { while (TAS(&lock) == 1) { // spin }} void release() { lock = 0;} // Why ABA-safe:// - State is only 0 or 1// - A→B→A means: unlocked→locked→unlocked// - If we see unlocked and successfully lock it, we're correct// regardless of what happened before// - No dangling references, no external state dependencyWhen Locks CAN Have ABA Issues:
More complex lock implementations may be vulnerable:
12345678910111213141516171819202122232425262728293031323334
// A lock design with potential ABA vulnerability// (educational example - do not use) struct mutex { struct thread *owner; // NULL = unlocked, else locked}; void lock(struct mutex *m) { struct thread *self = current_thread(); while (!CAS(&m->owner, NULL, self)) { // Spin waiting for owner to become NULL while (m->owner != NULL) { wait_hint(); } }} void unlock(struct mutex *m) { m->owner = NULL;} // ABA scenario:// 1. Thread A (at address 0x1000) locks, owner = 0x1000// 2. Thread A unlocks, owner = NULL// 3. Thread A terminates, thread struct freed// 4. Thread B created, allocated at address 0x1000// 5. Thread C is checking if lock is held, reads owner// 6. Thread C has old cached owner = 0x1000// 7. Thread C compares: "is thread at 0x1000 still valid?"// 0x1000 is valid (it's Thread B!) but WRONG identity//// This is subtle but can cause issues with ownership checks// Example: recursive lock checking if current thread is ownerUsing a unique thread ID (monotonically increasing, never reused) instead of thread pointer addresses avoids this class of ABA. Linux's gettid() returns PIDs that won't be immediately reused. Windows thread IDs are similarly protected.
Several techniques can prevent or mitigate the ABA problem. Each has different tradeoffs in terms of performance, complexity, and applicability.
Solution 1: Tagged Pointers (Double-Word CAS)
Pair each pointer with a version counter. Every modification increments the counter. ABA is detected because even if the pointer returns to A, the counter has changed.
123456789101112131415161718192021222324252627282930313233343536373839
// Tagged pointer solution using double-width CAS // Pack pointer and tag into a single atomic unit// On x86-64: use 128-bit CMPXCHG16B// On 32-bit: use 64-bit CMPXCHG8B struct tagged_ptr { void *ptr; // The actual pointer uintptr_t tag; // Version counter}; // Atomic operations on the combined value_Atomic struct tagged_ptr head; int pop_with_tag(struct stack *s) { struct tagged_ptr old_head, new_head; do { old_head = atomic_load(&s->head); if (old_head.ptr == NULL) return -1; // Calculate new head new_head.ptr = ((struct node *)old_head.ptr)->next; new_head.tag = old_head.tag + 1; // Increment version } while (!atomic_compare_exchange_weak( &s->head, &old_head, new_head)); // CAS now compares BOTH pointer AND tag // A→B→A would have different tags: (A,1) → (B,2) → (A,3) // CAS on (A,1) fails when head is (A,3) struct node *popped = old_head.ptr; int data = popped->data; free(popped); // Now safe! return data;} // On x86-64, this compiles to CMPXCHG16B instruction// On architectures without wide CAS, alternatives neededSolution 2: Hazard Pointers
Each thread announces which pointers it's currently accessing. Memory cannot be freed while any thread's hazard pointer references it. This prevents the reuse that causes ABA.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
// Hazard pointer scheme - simplified// Each thread has a small number of hazard pointer slots #define MAX_THREADS 64#define HP_PER_THREAD 2 struct hazard_pointers { _Atomic(void *) hp[MAX_THREADS][HP_PER_THREAD]; // Per-thread retired lists (to-be-freed nodes) struct retired_list retired[MAX_THREADS];}; struct hazard_pointers g_hp; int pop_with_hazard(struct stack *s, int thread_id) { struct node *old_head; retry: old_head = atomic_load(&s->head); if (old_head == NULL) return -1; // Announce we're accessing this node atomic_store(&g_hp.hp[thread_id][0], old_head); // Double-check head hasn't changed while we were announcing if (atomic_load(&s->head) != old_head) { goto retry; } // Now safe to read old_head->next struct node *new_head = old_head->next; if (!atomic_compare_exchange_weak(&s->head, &old_head, new_head)) { goto retry; } // Clear hazard pointer atomic_store(&g_hp.hp[thread_id][0], NULL); // Don't free immediately - retire instead retire_node(old_head, thread_id); return old_head->data;} void retire_node(struct node *node, int thread_id) { // Add to thread's retired list list_add(&node->retired_link, &g_hp.retired[thread_id]); // Periodically scan and free nodes not protected if (retired_count(thread_id) > threshold) { for each node in retired[thread_id] { bool protected = false; for each thread t, each hp slot s { if (g_hp.hp[t][s] == node) { protected = true; break; } } if (!protected) { free(node); // Safe to free! } } }}Solution 3: Epoch-Based Reclamation (EBR)
Similar to hazard pointers but coarser-grained. Threads enter epochs, and memory is only freed when no thread could be in an epoch that accessed it.
| Solution | Overhead | Complexity | Memory Reclaim | Platform |
|---|---|---|---|---|
| Tagged pointers | Low (wider CAS) | Low | Immediate | Needs 2× CAS |
| Hazard pointers | Medium | High | Delayed | Portable |
| Epoch-based | Low | Medium | Delayed (batched) | Portable |
| Reference counting | High | Medium | Immediate | Portable |
| GC (Java, C#) | Variable | Low (handled by runtime) | Automatic | Managed languages |
In languages with garbage collection (Java, Go, C#), memory is never reused while reachable references exist. This eliminates ABA for managed objects. However, interop with native code, off-heap allocations, or native atomics can still be vulnerable.
Tagged pointers are the most common ABA solution in systems programming because they have low overhead and integrate naturally with CAS. Let's explore practical implementation details.
Exploiting Pointer Alignment:
On most architectures, dynamically allocated memory is aligned (typically 8 or 16 bytes). This means the low bits of valid pointers are always zero. We can steal these bits for tags!
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
// Using low bits for tag (pointer stealing)// Works because malloc returns 16-byte aligned pointers // With 16-byte alignment, low 4 bits are available// We use them for a 4-bit counter (0-15) #define TAG_BITS 4#define TAG_MASK ((1UL << TAG_BITS) - 1) // 0x0F#define PTR_MASK (~TAG_MASK) // 0xFFFFFFFFFFFFFFF0 // Pack pointer and tagstatic inline uintptr_t pack(void *ptr, unsigned int tag) { return ((uintptr_t)ptr & PTR_MASK) | (tag & TAG_MASK);} // Unpack pointerstatic inline void *unpack_ptr(uintptr_t packed) { return (void *)(packed & PTR_MASK);} // Unpack tagstatic inline unsigned int unpack_tag(uintptr_t packed) { return packed & TAG_MASK;} // Atomic tagged stack_Atomic uintptr_t head = 0; // NULL with tag 0 int pop(void) { uintptr_t old_head, new_head; struct node *node; do { old_head = atomic_load(&head); node = unpack_ptr(old_head); if (node == NULL) return -1; // Increment tag, update pointer unsigned int new_tag = (unpack_tag(old_head) + 1) & TAG_MASK; new_head = pack(node->next, new_tag); } while (!atomic_compare_exchange_weak(&head, &old_head, new_head)); int data = node->data; free(node); return data;} // Limitation: tag wraps after 16 operations// ABA still possible if A→B→A with exactly 16 intervening ops// For most workloads, extremely unlikelySmall tags (4-16 bits) can wrap around. If exactly 2^n operations occur during a preemption where n is the tag width, ABA can recur. Wider tags (32-64 bits) make this astronomically unlikely. With a 32-bit tag and nanosecond operations, wraparound takes ~4 seconds—plenty of margin for most preemptions.
Using DWCAS (Double-Width Compare-and-Swap):
For full-width tags, use the platform's double-width CAS. On x86-64, this is CMPXCHG16B (128-bit CAS). On 32-bit x86, CMPXCHG8B (64-bit CAS).
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
// Double-width CAS with full pointer + counter// Using __int128 on x86-64 with GCC/Clang #include <stdatomic.h>#include <stdint.h> typedef struct { void *ptr; uint64_t tag;} tagged_ptr_t; // GCC extension for 128-bit atomicstypedef _Atomic __int128 atomic_tagged_t; static inline __int128 to_128(tagged_ptr_t tp) { __int128 result; __builtin_memcpy(&result, &tp, sizeof(result)); return result;} static inline tagged_ptr_t from_128(__int128 val) { tagged_ptr_t result; __builtin_memcpy(&result, &val, sizeof(result)); return result;} atomic_tagged_t head; int pop(void) { tagged_ptr_t old_head, new_head; struct node *node; do { __int128 old_val = atomic_load(&head); old_head = from_128(old_val); node = old_head.ptr; if (node == NULL) return -1; new_head.ptr = node->next; new_head.tag = old_head.tag + 1; // 128-bit CAS } while (!atomic_compare_exchange_weak( &head, &((__int128){to_128(old_head)}), to_128(new_head))); int data = node->data; free(node); return data;} // Note: CMPXCHG16B requires 16-byte aligned operand// Use: _Alignas(16) atomic_tagged_t head;| Platform | Instruction | Width | Notes |
|---|---|---|---|
| x86-64 | CMPXCHG16B | 128-bit | Requires CPUID check; post-2005 CPUs |
| x86-32 | CMPXCHG8B | 64-bit | Pentium+; 32-bit ptr + 32-bit tag |
| ARM64 | CASP | 128-bit | ARMv8.1+; pair of 64-bit registers |
| ARM64 LL/SC | LDXP/STXP | 128-bit | All ARMv8; exclusive pair operations |
| POWER | lqarx/stqcx | 128-bit | POWER8+; quadword atomics |
Beyond specific solutions, there are design patterns and practices that naturally avoid or minimize ABA vulnerability.
Strategy 1: Never Reuse Memory
The simplest approach: don't free memory that CAS might reference. Use per-thread freelists or memory pools with epoch-based cleanup.
123456789101112131415161718192021222324
// Thread-local freelist pattern// Nodes are recycled within same thread, never freed globally __thread struct node *local_freelist = NULL; struct node *alloc_node(void) { if (local_freelist) { struct node *n = local_freelist; local_freelist = n->next; return n; } return malloc(sizeof(struct node));} void retire_node(struct node *n) { // Don't free - add to thread-local freelist n->next = local_freelist; local_freelist = n;} // Why this helps:// - Nodes are only reused by the same thread// - Other threads' CAS can't see pointers to "recycled" nodes// - Still need care if threads share freelistsStrategy 2: Unique Identifiers Instead of Pointers
Use indices into arrays or unique IDs instead of raw pointers. These can be designed never to repeat.
123456789101112131415161718192021222324252627282930313233343536373839
// Array-based lock-free stack using indices// Index never repeats if using 64-bit IDs #define MAX_NODES 10000 struct node { int data; uint32_t next_idx; // Index, not pointer}; struct node nodes[MAX_NODES]; // Use 64-bit ID: 32-bit index + 32-bit versiontypedef uint64_t node_id_t; #define MAKE_ID(idx, ver) (((uint64_t)(ver) << 32) | (idx))#define GET_IDX(id) ((uint32_t)(id))#define GET_VER(id) ((uint32_t)((id) >> 32)) _Atomic node_id_t head = MAKE_ID(UINT32_MAX, 0); // Empty // Pop with built-in versioningint pop(void) { node_id_t old_head, new_head; do { old_head = atomic_load(&head); uint32_t idx = GET_IDX(old_head); if (idx == UINT32_MAX) return -1; // Next index, incremented version new_head = MAKE_ID(nodes[idx].next_idx, GET_VER(old_head) + 1); } while (!atomic_compare_exchange_weak(&head, &old_head, new_head)); return nodes[GET_IDX(old_head)].data;} // Version number prevents ABA even when indices repeatLock-free structures shine when: (1) contention is very high, (2) lock holder might be preempted/killed, (3) priority inversion must be avoided, or (4) real-time guarantees are needed. For most cases, well-implemented locks perform equally well with less complexity and no ABA risk.
We have thoroughly explored the ABA problem—one of the most subtle and dangerous issues in concurrent programming. Understanding ABA is essential for anyone working with atomic operations and lock-free algorithms. Let's consolidate the key insights:
Module Complete:
We have now completed our comprehensive exploration of Test-and-Set locks. From the fundamental atomic operation through hardware implementation, practical lock building, and subtle issues like ABA, you now possess deep understanding of this foundational synchronization primitive.
The knowledge from this module applies far beyond TSL itself—the patterns of atomic operations, cache coherence awareness, lock architecture, and correctness reasoning form the foundation for understanding all synchronization primitives: semaphores, mutexes, condition variables, and beyond.
Congratulations! You have mastered Test-and-Set locks at a deep level—understanding not just how to use them, but how they work at the hardware level, how to implement them efficiently, and how to avoid subtle bugs like ABA. This knowledge distinguishes you as a systems programmer capable of building and debugging correct concurrent systems.