Operating SystemsTest-and-Set Lock (TSL)

Test-and-Set Lock (TSL): Hardware-Assisted Synchronization

LevelIntermediate

Duration75 mins

TopicTest-and-Set Lock (TSL)

5 / 5

The ABA Problem: Understanding and Preventing a Subtle Concurrency Bug

When 'Same' Isn't Really The Same

You've carefully implemented a lock-free data structure using Compare-and-Swap (CAS). The logic seems bulletproof: check if the value is what you expect, swap atomically. Yet intermittently—perhaps once in a million operations, perhaps only under specific timing conditions—your data structure corrupts. The bug is nearly impossible to reproduce in testing, but devastating in production.

Welcome to the ABA problem, one of the most insidious issues in concurrent programming. The name comes from a simple scenario: a memory location starts with value A, changes to B, then changes back to A. A CAS operation sees the value is A (matching expectations) and succeeds—but the context has changed entirely. The algorithm's invariants are silently violated.

The ABA problem affects not just lock-free data structures but also certain implementations of locks and synchronization primitives. Understanding it is essential for anyone working with low-level concurrency, and knowing how to prevent it separates robust systems from ticking time bombs.

What You Will Learn

By the end of this page, you will understand: (1) The precise nature of the ABA problem and why it occurs, (2) Real-world scenarios where ABA causes failures, (3) Solutions including hazard pointers, tagged pointers, and memory reclamation schemes, (4) How ABA relates to lock implementations, and (5) Practical strategies for ABA-safe programming.

Understanding the ABA Problem

The ABA problem arises from a fundamental limitation of Compare-and-Swap (and related operations like Test-and-Set): they only compare values, not history. CAS doesn't know if a value changed and returned—it only sees the current state.

Formal Definition:

The ABA problem occurs when:

Thread T1 reads value A from memory location X
Thread T1 is preempted or delayed
Other threads change X from A → B → A
Thread T1 resumes, performs CAS(X, A, C)
CAS succeeds because X contains A
But the context of A has changed—invariants may be violated

Converting Mermaid diagram...

Why Is This a Problem?

CAS-based algorithms often use pointer values. The assumption is: "if the pointer hasn't changed, the data structure hasn't changed." But if a node is removed and a new node is allocated at the same address (reused memory), the pointer value is the same while the data is completely different.

aba_example.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Lock-free stack with ABA vulnerability
struct node {
    int data;
    struct node *next;
};
 
struct stack {
    struct node *head;  // Atomic pointer
};
 
// Push operation - ABA safe (creates new nodes)
void push(struct stack *s, int value) {
    struct node *new_node = malloc(sizeof(*new_node));
    new_node->data = value;
    
    do {
        new_node->next = s->head;
    } while (!CAS(&s->head, new_node->next, new_node));
}
 
// Pop operation - ABA VULNERABLE!
int pop(struct stack *s) {
    struct node *old_head;
    struct node *new_head;
    
    do {
        old_head = s->head;
        if (old_head == NULL) return -1;  // Stack empty
        new_head = old_head->next;   // Read next pointer
        // DANGER: If we're preempted here, old_head could be
        // freed and reallocated to a different node!
    } while (!CAS(&s->head, old_head, new_head));
    
    int data = old_head->data;
    free(old_head);  // This free makes ABA possible
    return data;
}

The Fatal Scenario

Thread 1 reads head pointing to node A (with next→B). Thread 1 stalls. Thread 2 pops A and B, pushes new node C. If C is allocated at A's old address, Thread 1's CAS succeeds, setting head to B—but B was already popped! The stack now points to freed memory.

Detailed Example: Lock-free Stack Corruption

Let's trace through the ABA problem in our lock-free stack implementation step by step. This detailed walkthrough reveals exactly how corruption occurs.

Initial State:

Stack contains: A → B → C (A is on top)

ABA Corruption Timeline in Lock-free Stack
Step	Thread 1	Thread 2	Stack State
1	read head → A	—	head→A→B→C
2	read A.next → B	—	head→A→B→C
3	(stalled)	pop() begins	head→A→B→C
4	(stalled)	CAS(A→B) succeeds, free(A)	head→B→C
5	(stalled)	pop() again	head→B→C
6	(stalled)	CAS(B→C) succeeds, free(B)	head→C
7	(stalled)	push(X) allocates at A's old address	head→X→C
8	resumes	—	head→X→C
9	CAS(head, A, B)	—	head→X→C
10	CAS SUCCEEDS! (A's address = X's address)	—	head→B (FREED!)

The Critical Insight:

Thread 1's cached value of new_head (which pointed to B) is now a dangling pointer. B has been freed. When Thread 1's CAS succeeds—because it compares the address, not the identity—it sets head to point to freed memory.

Subsequent operations on the stack:

May read garbage data from B's old location
May corrupt heap metadata if B's memory was reused
May cause segmentation faults
May silently produce wrong results

aba_trace.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Detailed Memory View During ABA:
 
STEP 1-2 (Thread 1 reads):
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ head → 0x100├───►│ A @ 0x100   ├───►│ B @ 0x200   ├───► ...
└─────────────┘    │ data: 10    │    │ data: 20    │
                   │ next: 0x200 │    │ next: 0x300 │
                   └─────────────┘    └─────────────┘
                   
Thread 1 state: old_head = 0x100, new_head = 0x200
 
STEP 4-6 (Thread 2 pops A and B):
┌─────────────┐    ┌─────────────┐
│ head → 0x300├───►│ C @ 0x300   ├───► ...
└─────────────┘    │ data: 30    │
                   │ next: 0x400 │
                   └─────────────┘
                   
Memory at 0x100 and 0x200 is FREE (heap)
 
STEP 7 (Thread 2 push reallocates at 0x100):
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ head → 0x100├───►│ X @ 0x100   ├───►│ C @ 0x300   ├───► ...
└─────────────┘    │ data: 99    │    │ data: 30    │
                   │ next: 0x300 │    │ next: 0x400 │
                   └─────────────┘    └─────────────┘
 
Note: 0x100 is reused! Different node, same address.
Thread 1 still has: old_head = 0x100, new_head = 0x200
 
STEP 10 (Thread 1 CAS succeeds):
CAS(&head, 0x100, 0x200) → compares head (0x100) with expected (0x100) → MATCH!
head := 0x200
 
┌─────────────┐    
│ head → 0x200│    ← POINTS TO FREED MEMORY!
└─────────────┘    
                   ┌─────────────┐
                   │ ??? @ 0x200 │  ← May be reallocated to anything
                   └─────────────┘

Debugging Nightmare

ABA bugs are notoriously difficult to debug because: (1) they require specific timing to manifest, (2) the corruption may not be detected until much later, (3) debugger probes change timing and hide the bug, (4) symptoms vary—crashes, wrong results, or nothing visible. Prevention is far easier than debugging.

ABA in Lock Implementations

While we've focused on lock-free data structures, the ABA problem can also affect lock implementations, particularly those using CAS-based state machines or owner tracking.

Fortunately, Simple Spinlocks Are Safe:

The basic TAS spinlock using 0/1 states does NOT suffer from ABA because:

The state only has two values (locked/unlocked)
The B→A transition (unlock) is the expected operation
We don't rely on any context beyond the state value

safe_spinlock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// This spinlock is ABA-safe
 
void acquire() {
    while (TAS(&lock) == 1) {
        // spin
    }
}
 
void release() {
    lock = 0;
}
 
// Why ABA-safe:
// - State is only 0 or 1
// - A→B→A means: unlocked→locked→unlocked
// - If we see unlocked and successfully lock it, we're correct
//   regardless of what happened before
// - No dangling references, no external state dependency

When Locks CAN Have ABA Issues:

More complex lock implementations may be vulnerable:

Potentially Vulnerable Lock Patterns

•Pointer-based owner tracking: If owner is stored as a pointer and thread structures can be reused, ABA can occur. A new thread at the old address might appear to be the owner.
•Sequence locks with wrapping counters: If a sequence counter wraps around to its original value during a long read, the reader may not detect intervening writes.
•MCS locks with list manipulation: Node recycling without proper ordering could theoretically create ABA, though typical implementations avoid this.
•Lock-free mutex attempts: Pure CAS-based 'mutex' implementations attempting to avoid kernel calls may be vulnerable if they track waiters via pointers.

vulnerable_owner_lock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// A lock design with potential ABA vulnerability
// (educational example - do not use)
 
struct mutex {
    struct thread *owner;  // NULL = unlocked, else locked
};
 
void lock(struct mutex *m) {
    struct thread *self = current_thread();
    
    while (!CAS(&m->owner, NULL, self)) {
        // Spin waiting for owner to become NULL
        while (m->owner != NULL) {
            wait_hint();
        }
    }
}
 
void unlock(struct mutex *m) {
    m->owner = NULL;
}
 
// ABA scenario:
// 1. Thread A (at address 0x1000) locks, owner = 0x1000
// 2. Thread A unlocks, owner = NULL
// 3. Thread A terminates, thread struct freed
// 4. Thread B created, allocated at address 0x1000
// 5. Thread C is checking if lock is held, reads owner
// 6. Thread C has old cached owner = 0x1000
// 7. Thread C compares: "is thread at 0x1000 still valid?"
//    0x1000 is valid (it's Thread B!) but WRONG identity
//
// This is subtle but can cause issues with ownership checks
// Example: recursive lock checking if current thread is owner

Thread ID vs. Thread Pointer

Using a unique thread ID (monotonically increasing, never reused) instead of thread pointer addresses avoids this class of ABA. Linux's gettid() returns PIDs that won't be immediately reused. Windows thread IDs are similarly protected.

Solutions to the ABA Problem

Several techniques can prevent or mitigate the ABA problem. Each has different tradeoffs in terms of performance, complexity, and applicability.

Solution 1: Tagged Pointers (Double-Word CAS)

Pair each pointer with a version counter. Every modification increments the counter. ABA is detected because even if the pointer returns to A, the counter has changed.

tagged_pointer.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Tagged pointer solution using double-width CAS
 
// Pack pointer and tag into a single atomic unit
// On x86-64: use 128-bit CMPXCHG16B
// On 32-bit: use 64-bit CMPXCHG8B
 
struct tagged_ptr {
    void *ptr;        // The actual pointer
    uintptr_t tag;    // Version counter
};
 
// Atomic operations on the combined value
_Atomic struct tagged_ptr head;
 
int pop_with_tag(struct stack *s) {
    struct tagged_ptr old_head, new_head;
    
    do {
        old_head = atomic_load(&s->head);
        if (old_head.ptr == NULL) return -1;
        
        // Calculate new head
        new_head.ptr = ((struct node *)old_head.ptr)->next;
        new_head.tag = old_head.tag + 1;  // Increment version
        
    } while (!atomic_compare_exchange_weak(
        &s->head, &old_head, new_head));
    // CAS now compares BOTH pointer AND tag
    // A→B→A would have different tags: (A,1) → (B,2) → (A,3)
    // CAS on (A,1) fails when head is (A,3)
    
    struct node *popped = old_head.ptr;
    int data = popped->data;
    free(popped);  // Now safe!
    return data;
}
 
// On x86-64, this compiles to CMPXCHG16B instruction
// On architectures without wide CAS, alternatives needed

Solution 2: Hazard Pointers

Each thread announces which pointers it's currently accessing. Memory cannot be freed while any thread's hazard pointer references it. This prevents the reuse that causes ABA.

hazard_pointers.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// Hazard pointer scheme - simplified
// Each thread has a small number of hazard pointer slots
 
#define MAX_THREADS 64
#define HP_PER_THREAD 2
 
struct hazard_pointers {
    _Atomic(void *) hp[MAX_THREADS][HP_PER_THREAD];
    // Per-thread retired lists (to-be-freed nodes)
    struct retired_list retired[MAX_THREADS];
};
 
struct hazard_pointers g_hp;
 
int pop_with_hazard(struct stack *s, int thread_id) {
    struct node *old_head;
    
retry:
    old_head = atomic_load(&s->head);
    if (old_head == NULL) return -1;
    
    // Announce we're accessing this node
    atomic_store(&g_hp.hp[thread_id][0], old_head);
    
    // Double-check head hasn't changed while we were announcing
    if (atomic_load(&s->head) != old_head) {
        goto retry;
    }
    
    // Now safe to read old_head->next
    struct node *new_head = old_head->next;
    
    if (!atomic_compare_exchange_weak(&s->head, &old_head, new_head)) {
        goto retry;
    }
    
    // Clear hazard pointer
    atomic_store(&g_hp.hp[thread_id][0], NULL);
    
    // Don't free immediately - retire instead
    retire_node(old_head, thread_id);
    
    return old_head->data;
}
 
void retire_node(struct node *node, int thread_id) {
    // Add to thread's retired list
    list_add(&node->retired_link, &g_hp.retired[thread_id]);
    
    // Periodically scan and free nodes not protected
    if (retired_count(thread_id) > threshold) {
        for each node in retired[thread_id] {
            bool protected = false;
            for each thread t, each hp slot s {
                if (g_hp.hp[t][s] == node) {
                    protected = true;
                    break;
                }
            }
            if (!protected) {
                free(node);  // Safe to free!
            }
        }
    }
}

Solution 3: Epoch-Based Reclamation (EBR)

Similar to hazard pointers but coarser-grained. Threads enter epochs, and memory is only freed when no thread could be in an epoch that accessed it.

ABA Solution Comparison
Solution	Overhead	Complexity	Memory Reclaim	Platform
Tagged pointers	Low (wider CAS)	Low	Immediate	Needs 2× CAS
Hazard pointers	Medium	High	Delayed	Portable
Epoch-based	Low	Medium	Delayed (batched)	Portable
Reference counting	High	Medium	Immediate	Portable
GC (Java, C#)	Variable	Low (handled by runtime)	Automatic	Managed languages

Garbage Collection Solves ABA

In languages with garbage collection (Java, Go, C#), memory is never reused while reachable references exist. This eliminates ABA for managed objects. However, interop with native code, off-heap allocations, or native atomics can still be vulnerable.

Tagged Pointers in Practice

Tagged pointers are the most common ABA solution in systems programming because they have low overhead and integrate naturally with CAS. Let's explore practical implementation details.

Exploiting Pointer Alignment:

On most architectures, dynamically allocated memory is aligned (typically 8 or 16 bytes). This means the low bits of valid pointers are always zero. We can steal these bits for tags!

packed_tagged_pointer.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Using low bits for tag (pointer stealing)
// Works because malloc returns 16-byte aligned pointers
 
// With 16-byte alignment, low 4 bits are available
// We use them for a 4-bit counter (0-15)
 
#define TAG_BITS 4
#define TAG_MASK ((1UL << TAG_BITS) - 1)  // 0x0F
#define PTR_MASK (~TAG_MASK)               // 0xFFFFFFFFFFFFFFF0
 
// Pack pointer and tag
static inline uintptr_t pack(void *ptr, unsigned int tag) {
    return ((uintptr_t)ptr & PTR_MASK) | (tag & TAG_MASK);
}
 
// Unpack pointer
static inline void *unpack_ptr(uintptr_t packed) {
    return (void *)(packed & PTR_MASK);
}
 
// Unpack tag
static inline unsigned int unpack_tag(uintptr_t packed) {
    return packed & TAG_MASK;
}
 
// Atomic tagged stack
_Atomic uintptr_t head = 0;  // NULL with tag 0
 
int pop(void) {
    uintptr_t old_head, new_head;
    struct node *node;
    
    do {
        old_head = atomic_load(&head);
        node = unpack_ptr(old_head);
        if (node == NULL) return -1;
        
        // Increment tag, update pointer
        unsigned int new_tag = (unpack_tag(old_head) + 1) & TAG_MASK;
        new_head = pack(node->next, new_tag);
        
    } while (!atomic_compare_exchange_weak(&head, &old_head, new_head));
    
    int data = node->data;
    free(node);
    return data;
}
 
// Limitation: tag wraps after 16 operations
// ABA still possible if A→B→A with exactly 16 intervening ops
// For most workloads, extremely unlikely

Tag Wraparound

Small tags (4-16 bits) can wrap around. If exactly 2^n operations occur during a preemption where n is the tag width, ABA can recur. Wider tags (32-64 bits) make this astronomically unlikely. With a 32-bit tag and nanosecond operations, wraparound takes ~4 seconds—plenty of margin for most preemptions.

Using DWCAS (Double-Width Compare-and-Swap):

For full-width tags, use the platform's double-width CAS. On x86-64, this is CMPXCHG16B (128-bit CAS). On 32-bit x86, CMPXCHG8B (64-bit CAS).

dwcas_tagged.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// Double-width CAS with full pointer + counter
// Using __int128 on x86-64 with GCC/Clang
 
#include <stdatomic.h>
#include <stdint.h>
 
typedef struct {
    void *ptr;
    uint64_t tag;
} tagged_ptr_t;
 
// GCC extension for 128-bit atomics
typedef _Atomic __int128 atomic_tagged_t;
 
static inline __int128 to_128(tagged_ptr_t tp) {
    __int128 result;
    __builtin_memcpy(&result, &tp, sizeof(result));
    return result;
}
 
static inline tagged_ptr_t from_128(__int128 val) {
    tagged_ptr_t result;
    __builtin_memcpy(&result, &val, sizeof(result));
    return result;
}
 
atomic_tagged_t head;
 
int pop(void) {
    tagged_ptr_t old_head, new_head;
    struct node *node;
    
    do {
        __int128 old_val = atomic_load(&head);
        old_head = from_128(old_val);
        node = old_head.ptr;
        
        if (node == NULL) return -1;
        
        new_head.ptr = node->next;
        new_head.tag = old_head.tag + 1;
        
        // 128-bit CAS
    } while (!atomic_compare_exchange_weak(
        &head, 
        &((__int128){to_128(old_head)}),
        to_128(new_head)));
    
    int data = node->data;
    free(node);
    return data;
}
 
// Note: CMPXCHG16B requires 16-byte aligned operand
// Use: _Alignas(16) atomic_tagged_t head;

Platform DWCAS Support
Platform	Instruction	Width	Notes
x86-64	CMPXCHG16B	128-bit	Requires CPUID check; post-2005 CPUs
x86-32	CMPXCHG8B	64-bit	Pentium+; 32-bit ptr + 32-bit tag
ARM64	CASP	128-bit	ARMv8.1+; pair of 64-bit registers
ARM64 LL/SC	LDXP/STXP	128-bit	All ARMv8; exclusive pair operations
POWER	lqarx/stqcx	128-bit	POWER8+; quadword atomics

Practical ABA Avoidance Strategies

Beyond specific solutions, there are design patterns and practices that naturally avoid or minimize ABA vulnerability.

Strategy 1: Never Reuse Memory

The simplest approach: don't free memory that CAS might reference. Use per-thread freelists or memory pools with epoch-based cleanup.

no_immediate_free.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Thread-local freelist pattern
// Nodes are recycled within same thread, never freed globally
 
__thread struct node *local_freelist = NULL;
 
struct node *alloc_node(void) {
    if (local_freelist) {
        struct node *n = local_freelist;
        local_freelist = n->next;
        return n;
    }
    return malloc(sizeof(struct node));
}
 
void retire_node(struct node *n) {
    // Don't free - add to thread-local freelist
    n->next = local_freelist;
    local_freelist = n;
}
 
// Why this helps:
// - Nodes are only reused by the same thread
// - Other threads' CAS can't see pointers to "recycled" nodes
// - Still need care if threads share freelists

Strategy 2: Unique Identifiers Instead of Pointers

Use indices into arrays or unique IDs instead of raw pointers. These can be designed never to repeat.

unique_id_approach.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Array-based lock-free stack using indices
// Index never repeats if using 64-bit IDs
 
#define MAX_NODES 10000
 
struct node {
    int data;
    uint32_t next_idx;  // Index, not pointer
};
 
struct node nodes[MAX_NODES];
 
// Use 64-bit ID: 32-bit index + 32-bit version
typedef uint64_t node_id_t;
 
#define MAKE_ID(idx, ver) (((uint64_t)(ver) << 32) | (idx))
#define GET_IDX(id) ((uint32_t)(id))
#define GET_VER(id) ((uint32_t)((id) >> 32))
 
_Atomic node_id_t head = MAKE_ID(UINT32_MAX, 0);  // Empty
 
// Pop with built-in versioning
int pop(void) {
    node_id_t old_head, new_head;
    
    do {
        old_head = atomic_load(&head);
        uint32_t idx = GET_IDX(old_head);
        if (idx == UINT32_MAX) return -1;
        
        // Next index, incremented version
        new_head = MAKE_ID(nodes[idx].next_idx, GET_VER(old_head) + 1);
        
    } while (!atomic_compare_exchange_weak(&head, &old_head, new_head));
    
    return nodes[GET_IDX(old_head)].data;
}
 
// Version number prevents ABA even when indices repeat

ABA Avoidance Best Practices

•Use proven libraries: Don't roll your own lock-free structures. Use boost::lockfree, Intel TBB, or other vetted implementations.
•Prefer blocking locks for simple cases: Lock-free is not always faster. Mutexes avoid ABA entirely.
•Tag everything: If you must use pointer-based CAS, always include a version tag.
•Test with ThreadSanitizer: TSan can detect some ABA-like races (though not all).
•Stress test with preemption: Run tests under heavy load, use sched_yield() or sleep at critical points to force timing issues.
•Review carefully: Lock-free code requires expert review. Subtle bugs hide in innocent-looking lines.

When Lock-Free Is Worth It

Lock-free structures shine when: (1) contention is very high, (2) lock holder might be preempted/killed, (3) priority inversion must be avoided, or (4) real-time guarantees are needed. For most cases, well-implemented locks perform equally well with less complexity and no ABA risk.

Summary: Mastering the ABA Problem

We have thoroughly explored the ABA problem—one of the most subtle and dangerous issues in concurrent programming. Understanding ABA is essential for anyone working with atomic operations and lock-free algorithms. Let's consolidate the key insights:

Key Takeaways

•CAS compares values, not history — If a value changes A→B→A, CAS sees only the current A and cannot detect the intermediate change. Context may be completely different.
•Memory reuse enables ABA — The problem manifests when freed memory is reallocated and happens to match a cached pointer. Same address, different identity.
•Simple spinlocks are safe — Binary 0/1 state machines don't suffer from ABA because there's no context beyond the bit value. More complex locks may be vulnerable.
•Tagged pointers are the primary solution — Pairing pointers with version counters (using pointer stealing or DWCAS) prevents ABA by making A→B→A detectable as (A,1)→(B,2)→(A,3).
•Deferred reclamation helps — Hazard pointers and epoch-based schemes prevent problematic reuse by delaying memory reclamation until safe.
•Prefer proven abstractions — Lock-free programming is hard. Use tested libraries, prefer blocking locks when appropriate, and always include version tags with pointer CAS.

Module Complete:

We have now completed our comprehensive exploration of Test-and-Set locks. From the fundamental atomic operation through hardware implementation, practical lock building, and subtle issues like ABA, you now possess deep understanding of this foundational synchronization primitive.

The knowledge from this module applies far beyond TSL itself—the patterns of atomic operations, cache coherence awareness, lock architecture, and correctness reasoning form the foundation for understanding all synchronization primitives: semaphores, mutexes, condition variables, and beyond.

Module Complete

Congratulations! You have mastered Test-and-Set locks at a deep level—understanding not just how to use them, but how they work at the hardware level, how to implement them efficiently, and how to avoid subtle bugs like ABA. This knowledge distinguishes you as a systems programmer capable of building and debugging correct concurrent systems.

5 / 5

Loading learning content...

Operating SystemsTest-and-Set Lock (TSL)

Test-and-Set Lock (TSL): Hardware-Assisted Synchronization

LevelIntermediate

Duration75 mins

TopicTest-and-Set Lock (TSL)

5 / 5

The ABA Problem: Understanding and Preventing a Subtle Concurrency Bug

When 'Same' Isn't Really The Same

What You Will Learn

Understanding the ABA Problem

Formal Definition:

The ABA problem occurs when:

Thread T1 reads value A from memory location X
Thread T1 is preempted or delayed
Other threads change X from A → B → A
Thread T1 resumes, performs CAS(X, A, C)
CAS succeeds because X contains A
But the context of A has changed—invariants may be violated

Converting Mermaid diagram...

Why Is This a Problem?

aba_example.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Lock-free stack with ABA vulnerability
struct node {
    int data;
    struct node *next;
};
 
struct stack {
    struct node *head;  // Atomic pointer
};
 
// Push operation - ABA safe (creates new nodes)
void push(struct stack *s, int value) {
    struct node *new_node = malloc(sizeof(*new_node));
    new_node->data = value;
    
    do {
        new_node->next = s->head;
    } while (!CAS(&s->head, new_node->next, new_node));
}
 
// Pop operation - ABA VULNERABLE!
int pop(struct stack *s) {
    struct node *old_head;
    struct node *new_head;
    
    do {
        old_head = s->head;
        if (old_head == NULL) return -1;  // Stack empty
        new_head = old_head->next;   // Read next pointer
        // DANGER: If we're preempted here, old_head could be
        // freed and reallocated to a different node!
    } while (!CAS(&s->head, old_head, new_head));
    
    int data = old_head->data;
    free(old_head);  // This free makes ABA possible
    return data;
}

The Fatal Scenario

Detailed Example: Lock-free Stack Corruption

Let's trace through the ABA problem in our lock-free stack implementation step by step. This detailed walkthrough reveals exactly how corruption occurs.

Initial State:

Stack contains: A → B → C (A is on top)

ABA Corruption Timeline in Lock-free Stack
Step	Thread 1	Thread 2	Stack State
1	read head → A	—	head→A→B→C
2	read A.next → B	—	head→A→B→C
3	(stalled)	pop() begins	head→A→B→C
4	(stalled)	CAS(A→B) succeeds, free(A)	head→B→C
5	(stalled)	pop() again	head→B→C
6	(stalled)	CAS(B→C) succeeds, free(B)	head→C
7	(stalled)	push(X) allocates at A's old address	head→X→C
8	resumes	—	head→X→C
9	CAS(head, A, B)	—	head→X→C
10	CAS SUCCEEDS! (A's address = X's address)	—	head→B (FREED!)

The Critical Insight:

Subsequent operations on the stack:

May read garbage data from B's old location
May corrupt heap metadata if B's memory was reused
May cause segmentation faults
May silently produce wrong results

aba_trace.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Detailed Memory View During ABA:
 
STEP 1-2 (Thread 1 reads):
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ head → 0x100├───►│ A @ 0x100   ├───►│ B @ 0x200   ├───► ...
└─────────────┘    │ data: 10    │    │ data: 20    │
                   │ next: 0x200 │    │ next: 0x300 │
                   └─────────────┘    └─────────────┘
                   
Thread 1 state: old_head = 0x100, new_head = 0x200
 
STEP 4-6 (Thread 2 pops A and B):
┌─────────────┐    ┌─────────────┐
│ head → 0x300├───►│ C @ 0x300   ├───► ...
└─────────────┘    │ data: 30    │
                   │ next: 0x400 │
                   └─────────────┘
                   
Memory at 0x100 and 0x200 is FREE (heap)
 
STEP 7 (Thread 2 push reallocates at 0x100):
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ head → 0x100├───►│ X @ 0x100   ├───►│ C @ 0x300   ├───► ...
└─────────────┘    │ data: 99    │    │ data: 30    │
                   │ next: 0x300 │    │ next: 0x400 │
                   └─────────────┘    └─────────────┘
 
Note: 0x100 is reused! Different node, same address.
Thread 1 still has: old_head = 0x100, new_head = 0x200
 
STEP 10 (Thread 1 CAS succeeds):
CAS(&head, 0x100, 0x200) → compares head (0x100) with expected (0x100) → MATCH!
head := 0x200
 
┌─────────────┐    
│ head → 0x200│    ← POINTS TO FREED MEMORY!
└─────────────┘    
                   ┌─────────────┐
                   │ ??? @ 0x200 │  ← May be reallocated to anything
                   └─────────────┘

Debugging Nightmare

ABA in Lock Implementations

While we've focused on lock-free data structures, the ABA problem can also affect lock implementations, particularly those using CAS-based state machines or owner tracking.

Fortunately, Simple Spinlocks Are Safe:

The basic TAS spinlock using 0/1 states does NOT suffer from ABA because:

The state only has two values (locked/unlocked)
The B→A transition (unlock) is the expected operation
We don't rely on any context beyond the state value

safe_spinlock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// This spinlock is ABA-safe
 
void acquire() {
    while (TAS(&lock) == 1) {
        // spin
    }
}
 
void release() {
    lock = 0;
}
 
// Why ABA-safe:
// - State is only 0 or 1
// - A→B→A means: unlocked→locked→unlocked
// - If we see unlocked and successfully lock it, we're correct
//   regardless of what happened before
// - No dangling references, no external state dependency

When Locks CAN Have ABA Issues:

More complex lock implementations may be vulnerable:

Potentially Vulnerable Lock Patterns

•Pointer-based owner tracking: If owner is stored as a pointer and thread structures can be reused, ABA can occur. A new thread at the old address might appear to be the owner.
•Sequence locks with wrapping counters: If a sequence counter wraps around to its original value during a long read, the reader may not detect intervening writes.
•MCS locks with list manipulation: Node recycling without proper ordering could theoretically create ABA, though typical implementations avoid this.
•Lock-free mutex attempts: Pure CAS-based 'mutex' implementations attempting to avoid kernel calls may be vulnerable if they track waiters via pointers.

vulnerable_owner_lock.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// A lock design with potential ABA vulnerability
// (educational example - do not use)
 
struct mutex {
    struct thread *owner;  // NULL = unlocked, else locked
};
 
void lock(struct mutex *m) {
    struct thread *self = current_thread();
    
    while (!CAS(&m->owner, NULL, self)) {
        // Spin waiting for owner to become NULL
        while (m->owner != NULL) {
            wait_hint();
        }
    }
}
 
void unlock(struct mutex *m) {
    m->owner = NULL;
}
 
// ABA scenario:
// 1. Thread A (at address 0x1000) locks, owner = 0x1000
// 2. Thread A unlocks, owner = NULL
// 3. Thread A terminates, thread struct freed
// 4. Thread B created, allocated at address 0x1000
// 5. Thread C is checking if lock is held, reads owner
// 6. Thread C has old cached owner = 0x1000
// 7. Thread C compares: "is thread at 0x1000 still valid?"
//    0x1000 is valid (it's Thread B!) but WRONG identity
//
// This is subtle but can cause issues with ownership checks
// Example: recursive lock checking if current thread is owner

Thread ID vs. Thread Pointer

Solutions to the ABA Problem

Several techniques can prevent or mitigate the ABA problem. Each has different tradeoffs in terms of performance, complexity, and applicability.

Solution 1: Tagged Pointers (Double-Word CAS)

Pair each pointer with a version counter. Every modification increments the counter. ABA is detected because even if the pointer returns to A, the counter has changed.

tagged_pointer.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Tagged pointer solution using double-width CAS
 
// Pack pointer and tag into a single atomic unit
// On x86-64: use 128-bit CMPXCHG16B
// On 32-bit: use 64-bit CMPXCHG8B
 
struct tagged_ptr {
    void *ptr;        // The actual pointer
    uintptr_t tag;    // Version counter
};
 
// Atomic operations on the combined value
_Atomic struct tagged_ptr head;
 
int pop_with_tag(struct stack *s) {
    struct tagged_ptr old_head, new_head;
    
    do {
        old_head = atomic_load(&s->head);
        if (old_head.ptr == NULL) return -1;
        
        // Calculate new head
        new_head.ptr = ((struct node *)old_head.ptr)->next;
        new_head.tag = old_head.tag + 1;  // Increment version
        
    } while (!atomic_compare_exchange_weak(
        &s->head, &old_head, new_head));
    // CAS now compares BOTH pointer AND tag
    // A→B→A would have different tags: (A,1) → (B,2) → (A,3)
    // CAS on (A,1) fails when head is (A,3)
    
    struct node *popped = old_head.ptr;
    int data = popped->data;
    free(popped);  // Now safe!
    return data;
}
 
// On x86-64, this compiles to CMPXCHG16B instruction
// On architectures without wide CAS, alternatives needed

Solution 2: Hazard Pointers

Each thread announces which pointers it's currently accessing. Memory cannot be freed while any thread's hazard pointer references it. This prevents the reuse that causes ABA.

hazard_pointers.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// Hazard pointer scheme - simplified
// Each thread has a small number of hazard pointer slots
 
#define MAX_THREADS 64
#define HP_PER_THREAD 2
 
struct hazard_pointers {
    _Atomic(void *) hp[MAX_THREADS][HP_PER_THREAD];
    // Per-thread retired lists (to-be-freed nodes)
    struct retired_list retired[MAX_THREADS];
};
 
struct hazard_pointers g_hp;
 
int pop_with_hazard(struct stack *s, int thread_id) {
    struct node *old_head;
    
retry:
    old_head = atomic_load(&s->head);
    if (old_head == NULL) return -1;
    
    // Announce we're accessing this node
    atomic_store(&g_hp.hp[thread_id][0], old_head);
    
    // Double-check head hasn't changed while we were announcing
    if (atomic_load(&s->head) != old_head) {
        goto retry;
    }
    
    // Now safe to read old_head->next
    struct node *new_head = old_head->next;
    
    if (!atomic_compare_exchange_weak(&s->head, &old_head, new_head)) {
        goto retry;
    }
    
    // Clear hazard pointer
    atomic_store(&g_hp.hp[thread_id][0], NULL);
    
    // Don't free immediately - retire instead
    retire_node(old_head, thread_id);
    
    return old_head->data;
}
 
void retire_node(struct node *node, int thread_id) {
    // Add to thread's retired list
    list_add(&node->retired_link, &g_hp.retired[thread_id]);
    
    // Periodically scan and free nodes not protected
    if (retired_count(thread_id) > threshold) {
        for each node in retired[thread_id] {
            bool protected = false;
            for each thread t, each hp slot s {
                if (g_hp.hp[t][s] == node) {
                    protected = true;
                    break;
                }
            }
            if (!protected) {
                free(node);  // Safe to free!
            }
        }
    }
}

Solution 3: Epoch-Based Reclamation (EBR)

Similar to hazard pointers but coarser-grained. Threads enter epochs, and memory is only freed when no thread could be in an epoch that accessed it.

ABA Solution Comparison
Solution	Overhead	Complexity	Memory Reclaim	Platform
Tagged pointers	Low (wider CAS)	Low	Immediate	Needs 2× CAS
Hazard pointers	Medium	High	Delayed	Portable
Epoch-based	Low	Medium	Delayed (batched)	Portable
Reference counting	High	Medium	Immediate	Portable
GC (Java, C#)	Variable	Low (handled by runtime)	Automatic	Managed languages

Garbage Collection Solves ABA

Tagged Pointers in Practice

Tagged pointers are the most common ABA solution in systems programming because they have low overhead and integrate naturally with CAS. Let's explore practical implementation details.

Exploiting Pointer Alignment:

On most architectures, dynamically allocated memory is aligned (typically 8 or 16 bytes). This means the low bits of valid pointers are always zero. We can steal these bits for tags!

packed_tagged_pointer.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Using low bits for tag (pointer stealing)
// Works because malloc returns 16-byte aligned pointers
 
// With 16-byte alignment, low 4 bits are available
// We use them for a 4-bit counter (0-15)
 
#define TAG_BITS 4
#define TAG_MASK ((1UL << TAG_BITS) - 1)  // 0x0F
#define PTR_MASK (~TAG_MASK)               // 0xFFFFFFFFFFFFFFF0
 
// Pack pointer and tag
static inline uintptr_t pack(void *ptr, unsigned int tag) {
    return ((uintptr_t)ptr & PTR_MASK) | (tag & TAG_MASK);
}
 
// Unpack pointer
static inline void *unpack_ptr(uintptr_t packed) {
    return (void *)(packed & PTR_MASK);
}
 
// Unpack tag
static inline unsigned int unpack_tag(uintptr_t packed) {
    return packed & TAG_MASK;
}
 
// Atomic tagged stack
_Atomic uintptr_t head = 0;  // NULL with tag 0
 
int pop(void) {
    uintptr_t old_head, new_head;
    struct node *node;
    
    do {
        old_head = atomic_load(&head);
        node = unpack_ptr(old_head);
        if (node == NULL) return -1;
        
        // Increment tag, update pointer
        unsigned int new_tag = (unpack_tag(old_head) + 1) & TAG_MASK;
        new_head = pack(node->next, new_tag);
        
    } while (!atomic_compare_exchange_weak(&head, &old_head, new_head));
    
    int data = node->data;
    free(node);
    return data;
}
 
// Limitation: tag wraps after 16 operations
// ABA still possible if A→B→A with exactly 16 intervening ops
// For most workloads, extremely unlikely

Tag Wraparound

Using DWCAS (Double-Width Compare-and-Swap):

For full-width tags, use the platform's double-width CAS. On x86-64, this is CMPXCHG16B (128-bit CAS). On 32-bit x86, CMPXCHG8B (64-bit CAS).

dwcas_tagged.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// Double-width CAS with full pointer + counter
// Using __int128 on x86-64 with GCC/Clang
 
#include <stdatomic.h>
#include <stdint.h>
 
typedef struct {
    void *ptr;
    uint64_t tag;
} tagged_ptr_t;
 
// GCC extension for 128-bit atomics
typedef _Atomic __int128 atomic_tagged_t;
 
static inline __int128 to_128(tagged_ptr_t tp) {
    __int128 result;
    __builtin_memcpy(&result, &tp, sizeof(result));
    return result;
}
 
static inline tagged_ptr_t from_128(__int128 val) {
    tagged_ptr_t result;
    __builtin_memcpy(&result, &val, sizeof(result));
    return result;
}
 
atomic_tagged_t head;
 
int pop(void) {
    tagged_ptr_t old_head, new_head;
    struct node *node;
    
    do {
        __int128 old_val = atomic_load(&head);
        old_head = from_128(old_val);
        node = old_head.ptr;
        
        if (node == NULL) return -1;
        
        new_head.ptr = node->next;
        new_head.tag = old_head.tag + 1;
        
        // 128-bit CAS
    } while (!atomic_compare_exchange_weak(
        &head, 
        &((__int128){to_128(old_head)}),
        to_128(new_head)));
    
    int data = node->data;
    free(node);
    return data;
}
 
// Note: CMPXCHG16B requires 16-byte aligned operand
// Use: _Alignas(16) atomic_tagged_t head;

Platform DWCAS Support
Platform	Instruction	Width	Notes
x86-64	CMPXCHG16B	128-bit	Requires CPUID check; post-2005 CPUs
x86-32	CMPXCHG8B	64-bit	Pentium+; 32-bit ptr + 32-bit tag
ARM64	CASP	128-bit	ARMv8.1+; pair of 64-bit registers
ARM64 LL/SC	LDXP/STXP	128-bit	All ARMv8; exclusive pair operations
POWER	lqarx/stqcx	128-bit	POWER8+; quadword atomics

Practical ABA Avoidance Strategies

Beyond specific solutions, there are design patterns and practices that naturally avoid or minimize ABA vulnerability.

Strategy 1: Never Reuse Memory

The simplest approach: don't free memory that CAS might reference. Use per-thread freelists or memory pools with epoch-based cleanup.

no_immediate_free.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Thread-local freelist pattern
// Nodes are recycled within same thread, never freed globally
 
__thread struct node *local_freelist = NULL;
 
struct node *alloc_node(void) {
    if (local_freelist) {
        struct node *n = local_freelist;
        local_freelist = n->next;
        return n;
    }
    return malloc(sizeof(struct node));
}
 
void retire_node(struct node *n) {
    // Don't free - add to thread-local freelist
    n->next = local_freelist;
    local_freelist = n;
}
 
// Why this helps:
// - Nodes are only reused by the same thread
// - Other threads' CAS can't see pointers to "recycled" nodes
// - Still need care if threads share freelists

Strategy 2: Unique Identifiers Instead of Pointers

Use indices into arrays or unique IDs instead of raw pointers. These can be designed never to repeat.

unique_id_approach.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Array-based lock-free stack using indices
// Index never repeats if using 64-bit IDs
 
#define MAX_NODES 10000
 
struct node {
    int data;
    uint32_t next_idx;  // Index, not pointer
};
 
struct node nodes[MAX_NODES];
 
// Use 64-bit ID: 32-bit index + 32-bit version
typedef uint64_t node_id_t;
 
#define MAKE_ID(idx, ver) (((uint64_t)(ver) << 32) | (idx))
#define GET_IDX(id) ((uint32_t)(id))
#define GET_VER(id) ((uint32_t)((id) >> 32))
 
_Atomic node_id_t head = MAKE_ID(UINT32_MAX, 0);  // Empty
 
// Pop with built-in versioning
int pop(void) {
    node_id_t old_head, new_head;
    
    do {
        old_head = atomic_load(&head);
        uint32_t idx = GET_IDX(old_head);
        if (idx == UINT32_MAX) return -1;
        
        // Next index, incremented version
        new_head = MAKE_ID(nodes[idx].next_idx, GET_VER(old_head) + 1);
        
    } while (!atomic_compare_exchange_weak(&head, &old_head, new_head));
    
    return nodes[GET_IDX(old_head)].data;
}
 
// Version number prevents ABA even when indices repeat

ABA Avoidance Best Practices

•Use proven libraries: Don't roll your own lock-free structures. Use boost::lockfree, Intel TBB, or other vetted implementations.
•Prefer blocking locks for simple cases: Lock-free is not always faster. Mutexes avoid ABA entirely.
•Tag everything: If you must use pointer-based CAS, always include a version tag.
•Test with ThreadSanitizer: TSan can detect some ABA-like races (though not all).
•Stress test with preemption: Run tests under heavy load, use sched_yield() or sleep at critical points to force timing issues.
•Review carefully: Lock-free code requires expert review. Subtle bugs hide in innocent-looking lines.

When Lock-Free Is Worth It

Summary: Mastering the ABA Problem

Key Takeaways

•CAS compares values, not history — If a value changes A→B→A, CAS sees only the current A and cannot detect the intermediate change. Context may be completely different.
•Memory reuse enables ABA — The problem manifests when freed memory is reallocated and happens to match a cached pointer. Same address, different identity.
•Simple spinlocks are safe — Binary 0/1 state machines don't suffer from ABA because there's no context beyond the bit value. More complex locks may be vulnerable.
•Tagged pointers are the primary solution — Pairing pointers with version counters (using pointer stealing or DWCAS) prevents ABA by making A→B→A detectable as (A,1)→(B,2)→(A,3).
•Deferred reclamation helps — Hazard pointers and epoch-based schemes prevent problematic reuse by delaying memory reclamation until safe.
•Prefer proven abstractions — Lock-free programming is hard. Use tested libraries, prefer blocking locks when appropriate, and always include version tags with pointer CAS.

Module Complete:

Module Complete

5 / 5