Operating SystemsThread Issues

Thread Issues

LevelIntermediate

Duration90 mins

TopicThread Issues

1 / 5

Thread Cancellation

The Challenge of Stopping Threads

In the world of multithreaded programming, starting a thread is straightforward—but stopping one safely is remarkably complex. Unlike processes, which can be terminated abruptly with minimal consequences due to their isolated address spaces, threads share memory and resources with their parent process and sibling threads. Abruptly terminating a thread can leave shared data structures in inconsistent states, leak resources, hold locks indefinitely, and cause cascading failures throughout an application.

Thread cancellation is the mechanism by which one thread can request the termination of another. The operating system and threading libraries provide cancellation facilities precisely because the naive approach—simply killing a thread—creates more problems than it solves. A well-designed cancellation system allows threads to terminate gracefully, releasing resources and restoring invariants before exiting.

What You Will Learn

By the end of this page, you will understand: (1) why thread cancellation is fundamentally challenging, (2) the critical distinction between asynchronous and deferred cancellation, (3) how cancellation points provide safe termination opportunities, (4) cleanup handlers and their role in resource management, and (5) practical patterns for writing cancellation-safe code.

Why Thread Cancellation is Hard

To appreciate thread cancellation's complexity, imagine you're at a restaurant, and you decide to cancel your order after the kitchen has started cooking. The kitchen cannot simply stop mid-way and throw away ingredients—there's cleanup to do, other orders might depend on shared cooking resources, and the billing system needs to know the order was cancelled.

Threads face the same challenges:

When a thread is executing, it might be in the middle of any operation:

Modifying a shared data structure that's temporarily in an inconsistent state
Holding one or more locks (mutexes, reader-writer locks, etc.)
Allocating memory from the heap
Writing to a file or network socket
Updating multiple related variables that must remain consistent with each other

If the thread is terminated at an arbitrary point, all of these operations remain incomplete.

Consequences of Abrupt Thread Termination

•Orphaned Locks — A terminated thread may hold mutexes that other threads are waiting on, causing permanent deadlock. The mutex remains locked forever since no thread can release it.
•Memory Leaks — Dynamically allocated memory that the thread was managing goes unreferenced, causing memory leaks that accumulate over the application's lifetime.
•Corrupted Data Structures — Linked lists, trees, hash tables, and other structures may be left in inconsistent states where invariants are violated, causing crashes or incorrect behavior later.
•Resource Exhaustion — File handles, network sockets, database connections, and other resources held by the thread remain open, eventually exhausting system limits.
•Transaction Rollback Failures — Operations that are part of a logical transaction may be left partially complete, violating atomicity guarantees the application depends on.

The Shared Memory Problem

Processes can be killed safely because each process has its own address space—the OS reclaims all memory and resources when the process terminates. Threads share their parent process's address space, so terminating a thread cannot reclaim resources automatically without potentially corrupting state other threads depend on. This fundamental difference is why thread cancellation requires cooperative mechanisms rather than forceful termination.

The core insight:

Safe thread cancellation is fundamentally a problem of finding safe points where a thread can be terminated without leaving inconsistencies. Rather than terminating threads at arbitrary points, we need mechanisms that let threads:

Receive a request to cancel (not a forced termination)
Continue executing until they reach a safe point
Perform cleanup operations to release resources and restore invariants
Then, and only then, actually terminate

This is exactly what POSIX thread cancellation and similar mechanisms provide.

Cancellation Types: Asynchronous vs. Deferred

POSIX threads (pthreads) define two fundamental modes of cancellation, each with dramatically different safety characteristics and use cases. Understanding this distinction is essential for writing correct multithreaded code.

The pthread cancellation model:

When thread A calls pthread_cancel(threadB), it sends a cancellation request to thread B. This request is not an immediate kill—it's a notification that thread B should terminate. What happens next depends entirely on thread B's cancellation state and type.

Asynchronous Cancellation

•Thread can be cancelled at any point in execution
•Cancellation occurs immediately when request is received
•No opportunity for cleanup before termination
•Extremely dangerous for most code
•Only safe for pure computation with no resources held
•Enabled with PTHREAD_CANCEL_ASYNCHRONOUS

Deferred Cancellation

•Thread cancelled only at cancellation points
•Request is pending until a cancellation point is reached
•Cleanup handlers are invoked before termination
•Safe for most concurrent code
•Allows thread to finish critical sections first
•Enabled with PTHREAD_CANCEL_DEFERRED (default)

cancellation_types.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
 
void *async_cancellable_thread(void *arg) {
    // Set cancellation type to asynchronous
    // DANGER: This thread can be cancelled at ANY point
    pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
    
    // This is only safe if the thread does purely computational work
    // with no locks, no allocations, no file I/O
    long sum = 0;
    for (long i = 0; i < 1000000000L; i++) {
        sum += i;  // Could be cancelled right here, mid-computation
    }
    return (void *)sum;
}
 
void *deferred_cancellable_thread(void *arg) {
    // Deferred cancellation (this is the default)
    pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
    
    while (1) {
        // Acquire resources, do work...
        char *buffer = malloc(1024);
        
        // CRITICAL SECTION: We hold resources here
        // Thread will NOT be cancelled in this section
        process_data(buffer);
        
        // Release resources BEFORE the cancellation point
        free(buffer);
        
        // This is a cancellation point - thread can be cancelled here
        // Resources have been released, so cancellation is safe
        sleep(1);
    }
    return NULL;
}
 
int main() {
    pthread_t tid;
    
    pthread_create(&tid, NULL, deferred_cancellable_thread, NULL);
    
    sleep(5);  // Let thread run for a while
    
    // Request cancellation - does not immediately terminate
    // Thread will terminate at next cancellation point
    pthread_cancel(tid);
    
    // Wait for thread to actually terminate
    pthread_join(tid, NULL);
    
    return 0;
}

Asynchronous Cancellation: Almost Never Safe

Asynchronous cancellation should be treated as essentially unusable for general-purpose code. Even a simple malloc() call followed by memory initialization cannot safely use asynchronous cancellation—if cancellation occurs between malloc() and storing the pointer, the memory is leaked forever. The ONLY safe use case is pure computation loops that never call any function, never acquire any resource, and never modify shared state.

Cancellation State and Control

Beyond cancellation type (async vs. deferred), threads have a cancellation state that determines whether they can be cancelled at all. This provides a mechanism for threads to protect critical sections from cancellation entirely.

The Two States:

PTHREAD_CANCEL_ENABLE — Thread will honor cancellation requests (default)
PTHREAD_CANCEL_DISABLE — Thread ignores cancellation requests; they remain pending

By toggling cancellation state, a thread can create windows where it's protected from cancellation, perform critical operations, and then re-enable cancellation.

cancellation_state.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
 
typedef struct {
    pthread_mutex_t mutex;
    int *data;
    size_t size;
} SharedBuffer;
 
void *worker_thread(void *arg) {
    SharedBuffer *buf = (SharedBuffer *)arg;
    int old_state;
    
    while (1) {
        // Disable cancellation during critical section
        pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &old_state);
        
        // === BEGIN CRITICAL SECTION ===
        // Thread CANNOT be cancelled here, even at cancellation points
        
        pthread_mutex_lock(&buf->mutex);
        
        // Perform complex multi-step operation
        // that must complete atomically
        for (size_t i = 0; i < buf->size; i++) {
            buf->data[i] = compute_value(i);
        }
        finalize_buffer(buf);
        
        pthread_mutex_unlock(&buf->mutex);
        
        // === END CRITICAL SECTION ===
        
        // Re-enable cancellation
        pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &old_state);
        
        // Check if cancellation was requested while disabled
        // This is a cancellation point - if cancel was pending,
        // thread terminates here
        pthread_testcancel();
        
        // Natural cancellation point - sleep is a cancellation point
        sleep(1);
    }
    
    return NULL;
}
 
// Pattern: RAII-style cancellation guard (C++ idiom, C approximation)
typedef struct {
    int saved_state;
} CancellationGuard;
 
void cancel_guard_init(CancellationGuard *guard) {
    pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &guard->saved_state);
}
 
void cancel_guard_destroy(CancellationGuard *guard) {
    pthread_setcancelstate(guard->saved_state, NULL);
}
 
// Usage pattern showing nested guards
void critical_operation(void) {
    CancellationGuard guard;
    cancel_guard_init(&guard);
    
    // Cancellation disabled here...
    do_critical_work();
    
    cancel_guard_destroy(&guard);
    // Previous state restored
}

Thread Cancellation State vs. Type Matrix
State	Type	Behavior When pthread_cancel() Called
ENABLED	ASYNCHRONOUS	Thread terminates immediately, asynchronously
ENABLED	DEFERRED	Thread terminates at next cancellation point
DISABLED	ASYNCHRONOUS	Request pending; honored when state becomes ENABLED
DISABLED	DEFERRED	Request pending; honored at next cancellation point after ENABLED

Design Pattern: Minimize Disabled Regions

While disabling cancellation protects critical sections, keeping cancellation disabled for too long makes threads unresponsive to cancellation requests. The best practice is to disable cancellation only for the briefest possible windows—typically just around mutex lock/unlock pairs or resource acquisition/release. This balances safety with responsiveness.

Cancellation Points

Cancellation points are specific locations in code where deferred cancellation actually occurs. POSIX defines which functions contain cancellation points—these are generally functions that may block or perform significant I/O.

The rationale:

Forcing cancellation to occur only at defined points serves multiple purposes:

Predictability — Programmers know exactly where cancellation can occur
Resource Safety — Cancellation points are typically at function boundaries, not mid-operation
Cleanup Opportunity — Code between cancellation points runs atomically with respect to cancellation

POSIX divides functions into two categories: those that must be cancellation points, and those that may be cancellation points.

POSIX-Mandated Cancellation Points (Partial List)
Category	Functions
Thread/Process	`pthread_join`, `pthread_cond_wait`, `pthread_cond_timedwait`, `pthread_testcancel`
I/O Operations	`read`, `write`, `open`, `close`, `recv`, `send`, `accept`, `connect`
File Operations	`fcntl` (with F_SETLKW), `fsync`, `fdatasync`
Process Control	`wait`, `waitpid`, `sleep`, `usleep`, `nanosleep`
Terminal I/O	`tcdrain`, `tcflow`, `tcflush`, `tcsendbreak`
Signals	`sigwait`, `sigwaitinfo`, `sigsuspend`, `pause`
IPC	`msgrcv`, `msgsnd`, `mq_receive`, `mq_send`, `sem_wait`, `sem_timedwait`

cancellation_points_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
 
void *io_bound_thread(void *arg) {
    char buffer[1024];
    int fd = open("/dev/input", O_RDONLY);
    
    if (fd < 0) {
        perror("open");
        return NULL;
    }
    
    while (1) {
        // read() is a cancellation point
        // Thread may be cancelled while waiting for input
        ssize_t bytes = read(fd, buffer, sizeof(buffer));
        
        if (bytes <= 0) break;
        
        // Process data...
        // This section runs atomically w.r.t. cancellation
        process_input(buffer, bytes);
    }
    
    close(fd);  // close() is also a cancellation point
    return NULL;
}
 
void *compute_bound_thread(void *arg) {
    // This thread does pure computation
    // It has NO natural cancellation points!
    
    while (1) {
        // Long computation with no I/O or blocking
        for (int i = 0; i < 1000000; i++) {
            perform_heavy_calculation();
        }
        
        // PROBLEM: Without explicit cancellation points,
        // this thread will never respond to cancellation requests!
        
        // Solution 1: Add explicit cancellation point
        pthread_testcancel();
        
        // Solution 2: Check a flag (cooperative cancellation)
        if (should_terminate) {
            break;
        }
    }
    return NULL;
}
 
// pthread_testcancel() - Creates an explicit cancellation point
// Does nothing if no cancellation is pending
// If cancellation IS pending and state is ENABLED:
//   - Cleanup handlers are invoked
//   - Thread terminates
void *explicit_cancellation_point_example(void *arg) {
    while (1) {
        // Phase 1: Non-cancellable computation
        perform_critical_computation();
        
        // Explicit cancellation point
        pthread_testcancel();
        
        // Phase 2: More computation
        perform_more_computation();
        
        // Another explicit cancellation point
        pthread_testcancel();
    }
    return NULL;
}

The pthread_testcancel() Function

pthread_testcancel() is the programmatic way to add cancellation points. It does nothing if no cancellation is pending, but if a cancel has been requested and cancellation is enabled, calling pthread_testcancel() causes the thread to terminate (after running cleanup handlers). This is essential for compute-bound threads that don't call blocking functions.

Cleanup Handlers

Even with deferred cancellation, threads often hold resources when they reach cancellation points. POSIX provides cleanup handlers—functions that are automatically invoked when a thread is cancelled, ensuring resources are released and invariants restored.

The cleanup handler stack:

Cleanup handlers work like a stack (LIFO order):

pthread_cleanup_push() registers a handler
pthread_cleanup_pop() removes a handler (optionally executing it)
When thread is cancelled, all registered handlers execute in reverse order

This mirrors the stack discipline of resource acquisition—handlers are invoked in the opposite order they were registered, naturally pairing acquire/release operations.

cleanup_handlers.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
// Cleanup handler for mutex
void cleanup_mutex(void *arg) {
    pthread_mutex_t *mutex = (pthread_mutex_t *)arg;
    printf("Cleanup: Releasing mutex\n");
    pthread_mutex_unlock(mutex);
}
 
// Cleanup handler for dynamically allocated memory
void cleanup_memory(void *arg) {
    void **ptr = (void **)arg;
    if (*ptr != NULL) {
        printf("Cleanup: Freeing memory at %p\n", *ptr);
        free(*ptr);
        *ptr = NULL;
    }
}
 
// Cleanup handler for file descriptors
void cleanup_file(void *arg) {
    int *fd = (int *)arg;
    if (*fd >= 0) {
        printf("Cleanup: Closing file descriptor %d\n", *fd);
        close(*fd);
        *fd = -1;
    }
}
 
void *worker_with_cleanup(void *arg) {
    pthread_mutex_t *mutex = (pthread_mutex_t *)arg;
    char *buffer = NULL;
    int fd = -1;
    
    // Push cleanup handlers in REVERSE order of acquisition
    // They will be called in reverse order (LIFO)
    
    // 1. Register mutex cleanup FIRST (will be called LAST)
    pthread_cleanup_push(cleanup_mutex, mutex);
    pthread_mutex_lock(mutex);
    
    // 2. Register memory cleanup
    pthread_cleanup_push(cleanup_memory, &buffer);
    buffer = malloc(4096);
    if (!buffer) {
        pthread_cleanup_pop(0);  // Pop memory cleanup (don't execute)
        pthread_cleanup_pop(1);  // Pop and execute mutex cleanup
        return NULL;
    }
    
    // 3. Register file cleanup LAST (will be called FIRST)
    pthread_cleanup_push(cleanup_file, &fd);
    fd = open("/tmp/data.txt", O_RDWR | O_CREAT, 0644);
    if (fd < 0) {
        pthread_cleanup_pop(0);
        pthread_cleanup_pop(1);  // Execute memory cleanup
        pthread_cleanup_pop(1);  // Execute mutex cleanup
        return NULL;
    }
    
    // Main work loop - any cancellation point here will
    // trigger all three cleanup handlers
    while (1) {
        // This is a cancellation point
        ssize_t bytes = read(fd, buffer, 4096);
        
        if (bytes <= 0) break;
        
        process_data(buffer, bytes);
    }
    
    // Normal exit: pop handlers
    // Argument 0 = don't execute, 1 = execute
    pthread_cleanup_pop(1);  // Close file
    pthread_cleanup_pop(1);  // Free memory
    pthread_cleanup_pop(1);  // Unlock mutex
    
    return NULL;
}
 
// More complex example: Nested cleanup with conditionals
void *complex_worker(void *arg) {
    SharedState *state = (SharedState *)arg;
    DatabaseConn *conn = NULL;
    Transaction *txn = NULL;
    
    // Push in reverse order
    pthread_cleanup_push(cleanup_connection, &conn);
    conn = db_connect(state->db_url);
    
    pthread_cleanup_push(cleanup_transaction, &txn);
    
    while (!state->shutdown) {
        txn = db_begin_transaction(conn);
        
        // CRITICAL: The cancellation point is here
        // If cancelled, both handlers run:
        // 1. cleanup_transaction (rollback)
        // 2. cleanup_connection (disconnect)
        db_execute(conn, "SELECT * FROM data");  // May block
        
        db_commit(txn);
        txn = NULL;  // Mark as no longer needing rollback
    }
    
    pthread_cleanup_pop(0);  // Don't rollback - already committed
    pthread_cleanup_pop(1);  // Do disconnect
    
    return NULL;
}

Critical: Push/Pop Must Be Paired

pthread_cleanup_push() and pthread_cleanup_pop() are often implemented as macros that include unbalanced braces. They MUST appear in matched pairs within the same lexical scope. Failure to pair them correctly causes compilation errors or undefined behavior. Some implementations use attribute((cleanup)) as an alternative (GCC extension).

When Cleanup Handlers Run:

Cleanup handlers are invoked in these situations:

Thread is cancelled and reaches a cancellation point
Thread calls pthread_exit()
pthread_cleanup_pop() is called with a non-zero argument

Design Principle:

Write cleanup handlers as if the thread could be cancelled at any moment during the critical section. The handler should restore system state to a consistent configuration, even if operations were only partially completed.

Safe Cancellation Patterns

Writing cancellation-safe code requires disciplined patterns. Here are the key strategies used in production systems:

Cancellation Safety Patterns

•Bracket Resources with Cleanup Handlers — Every resource acquisition should have a corresponding cleanup handler registered before the resource is used. This ensures the resource is released regardless of how the thread exits.
•Keep Critical Sections Short — Minimize the code between disabling cancellation and re-enabling it. Long non-cancellable sections make threads unresponsive.
•Use Cooperative Cancellation Flags — Instead of pthread_cancel(), many production systems use atomic flags that threads check periodically. This gives threads complete control over their termination points.
•Design for Idempotent Cleanup — Cleanup handlers should be safe to call even if resources weren't fully acquired. Use NULL checks, invalid-value sentinels (fd = -1), and similar guards.
•Document Cancellation Properties — For each function, document whether it's async-cancel-safe, deferred-cancel-safe, or not cancel-safe at all. This helps callers compose safe systems.

cooperative_cancellation.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#include <pthread.h>
#include <stdatomic.h>
#include <stdbool.h>
#include <stdio.h>
 
// Production pattern: Cooperative cancellation with atomic flags
// More portable and controllable than pthread_cancel
 
typedef struct {
    pthread_t thread;
    atomic_bool should_stop;
    atomic_bool is_running;
} ManagedThread;
 
void managed_thread_init(ManagedThread *mt) {
    atomic_store(&mt->should_stop, false);
    atomic_store(&mt->is_running, false);
}
 
void managed_thread_request_stop(ManagedThread *mt) {
    atomic_store(&mt->should_stop, true);
}
 
bool managed_thread_should_stop(ManagedThread *mt) {
    return atomic_load(&mt->should_stop);
}
 
void *worker_cooperative(void *arg) {
    ManagedThread *self = (ManagedThread *)arg;
    atomic_store(&self->is_running, true);
    
    // Main work loop
    while (!managed_thread_should_stop(self)) {
        // Do work...
        WorkItem *item = get_next_work_item();
        if (item) {
            process_item(item);
            free_work_item(item);
        }
        
        // Cooperative check point with timeout
        // This replaces pthread_testcancel() with more control
        struct timespec sleep_time = {0, 100000000};  // 100ms
        nanosleep(&sleep_time, NULL);
    }
    
    // Thread has full control over cleanup
    printf("Thread: Performing controlled shutdown\n");
    cleanup_thread_resources();
    
    atomic_store(&self->is_running, false);
    return NULL;
}
 
// Usage
int main() {
    ManagedThread worker;
    managed_thread_init(&worker);
    
    pthread_create(&worker.thread, NULL, worker_cooperative, &worker);
    
    // Let thread run...
    sleep(5);
    
    // Request graceful shutdown
    printf("Main: Requesting thread shutdown\n");
    managed_thread_request_stop(&worker);
    
    // Wait for thread to finish cleanup
    pthread_join(worker.thread, NULL);
    printf("Main: Thread has exited cleanly\n");
    
    return 0;
}

Cooperative vs. Pthread Cancellation

Many production systems prefer cooperative cancellation (checking flags) over pthread_cancel() because: (1) it works identically across all platforms, (2) the cancelled thread has complete control over its termination point, (3) there are no surprises about which functions are cancellation points, and (4) cleanup logic is explicit and predictable. pthread_cancel() is more powerful but harder to use correctly.

Platform Differences and Portability

Thread cancellation mechanisms vary significantly across operating systems and threading libraries. Understanding these differences is essential for writing portable code.

Thread Cancellation Across Platforms
Platform	Cancellation Mechanism	Notes
POSIX/Linux	`pthread_cancel()`, cleanup handlers	Full support for deferred/async modes and cleanup handlers
Windows	`TerminateThread()` (unsafe) or cooperative	No equivalent to deferred cancellation; cooperative patterns required
macOS	POSIX pthreads	Full POSIX support; Foundation framework uses cooperative patterns
Java	`Thread.interrupt()`, `InterruptedException`	Cooperative interruption; thread must check interrupted status
C++11+	No built-in cancellation	Must implement cooperative cancellation; std::jthread adds stop_token in C++20
Go	`context.Context` cancellation	Cooperative via context; goroutines check ctx.Done() channel
Rust	No forced cancellation	Ownership model prevents resource leaks; use flags or channels

C++20 std::jthread and stop_token

C++20 introduced std::jthread (joining thread) with integrated cooperative cancellation via stop_token. This provides a standardized, type-safe cancellation mechanism that works with condition variables and integrates with the language's RAII model. It represents the culmination of decades of threading experience encoded into a clean, standard facility.

cpp20_jthread.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <thread>
#include <stop_token>
#include <iostream>
#include <chrono>
 
// C++20 std::jthread with stop_token
void worker(std::stop_token stoken) {
    while (!stoken.stop_requested()) {
        std::cout << "Working...\n";
        std::this_thread::sleep_for(std::chrono::milliseconds(500));
    }
    std::cout << "Received stop request, cleaning up\n";
    // Cleanup happens here, thread has full control
}
 
int main() {
    // jthread automatically joins on destruction
    std::jthread worker_thread(worker);
    
    std::this_thread::sleep_for(std::chrono::seconds(2));
    
    // Request stop - sets the stop_token
    worker_thread.request_stop();
    
    // jthread destructor calls join() automatically
    // No need for explicit join!
    
    std::cout << "Thread has exited\n";
    return 0;
}

Summary: Thread Cancellation Mastery

Thread cancellation is one of the most subtle aspects of concurrent programming. Let's consolidate the essential concepts:

Key Takeaways

•Threads cannot be safely killed arbitrarily — Unlike processes, threads share resources and state. Abrupt termination causes orphaned locks, memory leaks, and data corruption.
•Deferred cancellation is the safe default — Threads are cancelled only at defined cancellation points, allowing them to finish critical operations first.
•Asynchronous cancellation is almost never safe — Reserve it only for pure computation with no resource management whatsoever.
•Cleanup handlers ensure resource release — Register handlers for every acquired resource; they run automatically on cancellation or pthread_exit().
•Cancellation state provides critical section protection — Disable cancellation during operations that must complete atomically, then re-enable.
•pthread_testcancel() adds explicit cancellation points — Essential for compute-bound threads without natural cancellation points.
•Cooperative cancellation is often preferred — Using atomic flags gives threads complete control and improves portability across platforms.

Page Complete

You now understand thread cancellation at a deep, implementation level. The principles here—safe termination points, resource cleanup, cooperative shutdown—apply broadly across all concurrent programming, regardless of the specific language or framework. Next, we'll explore how signals interact with threads, adding another layer of complexity to multithreaded programs.

1 / 5

Loading learning content...

Operating SystemsThread Issues

Thread Issues

LevelIntermediate

Duration90 mins

TopicThread Issues

1 / 5

Thread Cancellation

The Challenge of Stopping Threads

What You Will Learn

Why Thread Cancellation is Hard

Threads face the same challenges:

When a thread is executing, it might be in the middle of any operation:

Modifying a shared data structure that's temporarily in an inconsistent state
Holding one or more locks (mutexes, reader-writer locks, etc.)
Allocating memory from the heap
Writing to a file or network socket
Updating multiple related variables that must remain consistent with each other

If the thread is terminated at an arbitrary point, all of these operations remain incomplete.

Consequences of Abrupt Thread Termination

•Orphaned Locks — A terminated thread may hold mutexes that other threads are waiting on, causing permanent deadlock. The mutex remains locked forever since no thread can release it.
•Memory Leaks — Dynamically allocated memory that the thread was managing goes unreferenced, causing memory leaks that accumulate over the application's lifetime.
•Corrupted Data Structures — Linked lists, trees, hash tables, and other structures may be left in inconsistent states where invariants are violated, causing crashes or incorrect behavior later.
•Resource Exhaustion — File handles, network sockets, database connections, and other resources held by the thread remain open, eventually exhausting system limits.
•Transaction Rollback Failures — Operations that are part of a logical transaction may be left partially complete, violating atomicity guarantees the application depends on.

The Shared Memory Problem

The core insight:

Receive a request to cancel (not a forced termination)
Continue executing until they reach a safe point
Perform cleanup operations to release resources and restore invariants
Then, and only then, actually terminate

This is exactly what POSIX thread cancellation and similar mechanisms provide.

Cancellation Types: Asynchronous vs. Deferred

The pthread cancellation model:

Asynchronous Cancellation

•Thread can be cancelled at any point in execution
•Cancellation occurs immediately when request is received
•No opportunity for cleanup before termination
•Extremely dangerous for most code
•Only safe for pure computation with no resources held
•Enabled with PTHREAD_CANCEL_ASYNCHRONOUS

Deferred Cancellation

•Thread cancelled only at cancellation points
•Request is pending until a cancellation point is reached
•Cleanup handlers are invoked before termination
•Safe for most concurrent code
•Allows thread to finish critical sections first
•Enabled with PTHREAD_CANCEL_DEFERRED (default)

cancellation_types.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
 
void *async_cancellable_thread(void *arg) {
    // Set cancellation type to asynchronous
    // DANGER: This thread can be cancelled at ANY point
    pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
    
    // This is only safe if the thread does purely computational work
    // with no locks, no allocations, no file I/O
    long sum = 0;
    for (long i = 0; i < 1000000000L; i++) {
        sum += i;  // Could be cancelled right here, mid-computation
    }
    return (void *)sum;
}
 
void *deferred_cancellable_thread(void *arg) {
    // Deferred cancellation (this is the default)
    pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
    
    while (1) {
        // Acquire resources, do work...
        char *buffer = malloc(1024);
        
        // CRITICAL SECTION: We hold resources here
        // Thread will NOT be cancelled in this section
        process_data(buffer);
        
        // Release resources BEFORE the cancellation point
        free(buffer);
        
        // This is a cancellation point - thread can be cancelled here
        // Resources have been released, so cancellation is safe
        sleep(1);
    }
    return NULL;
}
 
int main() {
    pthread_t tid;
    
    pthread_create(&tid, NULL, deferred_cancellable_thread, NULL);
    
    sleep(5);  // Let thread run for a while
    
    // Request cancellation - does not immediately terminate
    // Thread will terminate at next cancellation point
    pthread_cancel(tid);
    
    // Wait for thread to actually terminate
    pthread_join(tid, NULL);
    
    return 0;
}

Asynchronous Cancellation: Almost Never Safe

Cancellation State and Control

The Two States:

PTHREAD_CANCEL_ENABLE — Thread will honor cancellation requests (default)
PTHREAD_CANCEL_DISABLE — Thread ignores cancellation requests; they remain pending

By toggling cancellation state, a thread can create windows where it's protected from cancellation, perform critical operations, and then re-enable cancellation.

cancellation_state.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
 
typedef struct {
    pthread_mutex_t mutex;
    int *data;
    size_t size;
} SharedBuffer;
 
void *worker_thread(void *arg) {
    SharedBuffer *buf = (SharedBuffer *)arg;
    int old_state;
    
    while (1) {
        // Disable cancellation during critical section
        pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &old_state);
        
        // === BEGIN CRITICAL SECTION ===
        // Thread CANNOT be cancelled here, even at cancellation points
        
        pthread_mutex_lock(&buf->mutex);
        
        // Perform complex multi-step operation
        // that must complete atomically
        for (size_t i = 0; i < buf->size; i++) {
            buf->data[i] = compute_value(i);
        }
        finalize_buffer(buf);
        
        pthread_mutex_unlock(&buf->mutex);
        
        // === END CRITICAL SECTION ===
        
        // Re-enable cancellation
        pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &old_state);
        
        // Check if cancellation was requested while disabled
        // This is a cancellation point - if cancel was pending,
        // thread terminates here
        pthread_testcancel();
        
        // Natural cancellation point - sleep is a cancellation point
        sleep(1);
    }
    
    return NULL;
}
 
// Pattern: RAII-style cancellation guard (C++ idiom, C approximation)
typedef struct {
    int saved_state;
} CancellationGuard;
 
void cancel_guard_init(CancellationGuard *guard) {
    pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &guard->saved_state);
}
 
void cancel_guard_destroy(CancellationGuard *guard) {
    pthread_setcancelstate(guard->saved_state, NULL);
}
 
// Usage pattern showing nested guards
void critical_operation(void) {
    CancellationGuard guard;
    cancel_guard_init(&guard);
    
    // Cancellation disabled here...
    do_critical_work();
    
    cancel_guard_destroy(&guard);
    // Previous state restored
}

Thread Cancellation State vs. Type Matrix
State	Type	Behavior When pthread_cancel() Called
ENABLED	ASYNCHRONOUS	Thread terminates immediately, asynchronously
ENABLED	DEFERRED	Thread terminates at next cancellation point
DISABLED	ASYNCHRONOUS	Request pending; honored when state becomes ENABLED
DISABLED	DEFERRED	Request pending; honored at next cancellation point after ENABLED

Design Pattern: Minimize Disabled Regions

Cancellation Points

The rationale:

Forcing cancellation to occur only at defined points serves multiple purposes:

Predictability — Programmers know exactly where cancellation can occur
Resource Safety — Cancellation points are typically at function boundaries, not mid-operation
Cleanup Opportunity — Code between cancellation points runs atomically with respect to cancellation

POSIX divides functions into two categories: those that must be cancellation points, and those that may be cancellation points.

POSIX-Mandated Cancellation Points (Partial List)
Category	Functions
Thread/Process	`pthread_join`, `pthread_cond_wait`, `pthread_cond_timedwait`, `pthread_testcancel`
I/O Operations	`read`, `write`, `open`, `close`, `recv`, `send`, `accept`, `connect`
File Operations	`fcntl` (with F_SETLKW), `fsync`, `fdatasync`
Process Control	`wait`, `waitpid`, `sleep`, `usleep`, `nanosleep`
Terminal I/O	`tcdrain`, `tcflow`, `tcflush`, `tcsendbreak`
Signals	`sigwait`, `sigwaitinfo`, `sigsuspend`, `pause`
IPC	`msgrcv`, `msgsnd`, `mq_receive`, `mq_send`, `sem_wait`, `sem_timedwait`

cancellation_points_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
 
void *io_bound_thread(void *arg) {
    char buffer[1024];
    int fd = open("/dev/input", O_RDONLY);
    
    if (fd < 0) {
        perror("open");
        return NULL;
    }
    
    while (1) {
        // read() is a cancellation point
        // Thread may be cancelled while waiting for input
        ssize_t bytes = read(fd, buffer, sizeof(buffer));
        
        if (bytes <= 0) break;
        
        // Process data...
        // This section runs atomically w.r.t. cancellation
        process_input(buffer, bytes);
    }
    
    close(fd);  // close() is also a cancellation point
    return NULL;
}
 
void *compute_bound_thread(void *arg) {
    // This thread does pure computation
    // It has NO natural cancellation points!
    
    while (1) {
        // Long computation with no I/O or blocking
        for (int i = 0; i < 1000000; i++) {
            perform_heavy_calculation();
        }
        
        // PROBLEM: Without explicit cancellation points,
        // this thread will never respond to cancellation requests!
        
        // Solution 1: Add explicit cancellation point
        pthread_testcancel();
        
        // Solution 2: Check a flag (cooperative cancellation)
        if (should_terminate) {
            break;
        }
    }
    return NULL;
}
 
// pthread_testcancel() - Creates an explicit cancellation point
// Does nothing if no cancellation is pending
// If cancellation IS pending and state is ENABLED:
//   - Cleanup handlers are invoked
//   - Thread terminates
void *explicit_cancellation_point_example(void *arg) {
    while (1) {
        // Phase 1: Non-cancellable computation
        perform_critical_computation();
        
        // Explicit cancellation point
        pthread_testcancel();
        
        // Phase 2: More computation
        perform_more_computation();
        
        // Another explicit cancellation point
        pthread_testcancel();
    }
    return NULL;
}

The pthread_testcancel() Function

Cleanup Handlers

The cleanup handler stack:

Cleanup handlers work like a stack (LIFO order):

pthread_cleanup_push() registers a handler
pthread_cleanup_pop() removes a handler (optionally executing it)
When thread is cancelled, all registered handlers execute in reverse order

This mirrors the stack discipline of resource acquisition—handlers are invoked in the opposite order they were registered, naturally pairing acquire/release operations.

cleanup_handlers.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
// Cleanup handler for mutex
void cleanup_mutex(void *arg) {
    pthread_mutex_t *mutex = (pthread_mutex_t *)arg;
    printf("Cleanup: Releasing mutex\n");
    pthread_mutex_unlock(mutex);
}
 
// Cleanup handler for dynamically allocated memory
void cleanup_memory(void *arg) {
    void **ptr = (void **)arg;
    if (*ptr != NULL) {
        printf("Cleanup: Freeing memory at %p\n", *ptr);
        free(*ptr);
        *ptr = NULL;
    }
}
 
// Cleanup handler for file descriptors
void cleanup_file(void *arg) {
    int *fd = (int *)arg;
    if (*fd >= 0) {
        printf("Cleanup: Closing file descriptor %d\n", *fd);
        close(*fd);
        *fd = -1;
    }
}
 
void *worker_with_cleanup(void *arg) {
    pthread_mutex_t *mutex = (pthread_mutex_t *)arg;
    char *buffer = NULL;
    int fd = -1;
    
    // Push cleanup handlers in REVERSE order of acquisition
    // They will be called in reverse order (LIFO)
    
    // 1. Register mutex cleanup FIRST (will be called LAST)
    pthread_cleanup_push(cleanup_mutex, mutex);
    pthread_mutex_lock(mutex);
    
    // 2. Register memory cleanup
    pthread_cleanup_push(cleanup_memory, &buffer);
    buffer = malloc(4096);
    if (!buffer) {
        pthread_cleanup_pop(0);  // Pop memory cleanup (don't execute)
        pthread_cleanup_pop(1);  // Pop and execute mutex cleanup
        return NULL;
    }
    
    // 3. Register file cleanup LAST (will be called FIRST)
    pthread_cleanup_push(cleanup_file, &fd);
    fd = open("/tmp/data.txt", O_RDWR | O_CREAT, 0644);
    if (fd < 0) {
        pthread_cleanup_pop(0);
        pthread_cleanup_pop(1);  // Execute memory cleanup
        pthread_cleanup_pop(1);  // Execute mutex cleanup
        return NULL;
    }
    
    // Main work loop - any cancellation point here will
    // trigger all three cleanup handlers
    while (1) {
        // This is a cancellation point
        ssize_t bytes = read(fd, buffer, 4096);
        
        if (bytes <= 0) break;
        
        process_data(buffer, bytes);
    }
    
    // Normal exit: pop handlers
    // Argument 0 = don't execute, 1 = execute
    pthread_cleanup_pop(1);  // Close file
    pthread_cleanup_pop(1);  // Free memory
    pthread_cleanup_pop(1);  // Unlock mutex
    
    return NULL;
}
 
// More complex example: Nested cleanup with conditionals
void *complex_worker(void *arg) {
    SharedState *state = (SharedState *)arg;
    DatabaseConn *conn = NULL;
    Transaction *txn = NULL;
    
    // Push in reverse order
    pthread_cleanup_push(cleanup_connection, &conn);
    conn = db_connect(state->db_url);
    
    pthread_cleanup_push(cleanup_transaction, &txn);
    
    while (!state->shutdown) {
        txn = db_begin_transaction(conn);
        
        // CRITICAL: The cancellation point is here
        // If cancelled, both handlers run:
        // 1. cleanup_transaction (rollback)
        // 2. cleanup_connection (disconnect)
        db_execute(conn, "SELECT * FROM data");  // May block
        
        db_commit(txn);
        txn = NULL;  // Mark as no longer needing rollback
    }
    
    pthread_cleanup_pop(0);  // Don't rollback - already committed
    pthread_cleanup_pop(1);  // Do disconnect
    
    return NULL;
}

Critical: Push/Pop Must Be Paired

When Cleanup Handlers Run:

Cleanup handlers are invoked in these situations:

Thread is cancelled and reaches a cancellation point
Thread calls pthread_exit()
pthread_cleanup_pop() is called with a non-zero argument

Design Principle:

Safe Cancellation Patterns

Writing cancellation-safe code requires disciplined patterns. Here are the key strategies used in production systems:

Cancellation Safety Patterns

•Bracket Resources with Cleanup Handlers — Every resource acquisition should have a corresponding cleanup handler registered before the resource is used. This ensures the resource is released regardless of how the thread exits.
•Keep Critical Sections Short — Minimize the code between disabling cancellation and re-enabling it. Long non-cancellable sections make threads unresponsive.
•Use Cooperative Cancellation Flags — Instead of pthread_cancel(), many production systems use atomic flags that threads check periodically. This gives threads complete control over their termination points.
•Design for Idempotent Cleanup — Cleanup handlers should be safe to call even if resources weren't fully acquired. Use NULL checks, invalid-value sentinels (fd = -1), and similar guards.
•Document Cancellation Properties — For each function, document whether it's async-cancel-safe, deferred-cancel-safe, or not cancel-safe at all. This helps callers compose safe systems.

cooperative_cancellation.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#include <pthread.h>
#include <stdatomic.h>
#include <stdbool.h>
#include <stdio.h>
 
// Production pattern: Cooperative cancellation with atomic flags
// More portable and controllable than pthread_cancel
 
typedef struct {
    pthread_t thread;
    atomic_bool should_stop;
    atomic_bool is_running;
} ManagedThread;
 
void managed_thread_init(ManagedThread *mt) {
    atomic_store(&mt->should_stop, false);
    atomic_store(&mt->is_running, false);
}
 
void managed_thread_request_stop(ManagedThread *mt) {
    atomic_store(&mt->should_stop, true);
}
 
bool managed_thread_should_stop(ManagedThread *mt) {
    return atomic_load(&mt->should_stop);
}
 
void *worker_cooperative(void *arg) {
    ManagedThread *self = (ManagedThread *)arg;
    atomic_store(&self->is_running, true);
    
    // Main work loop
    while (!managed_thread_should_stop(self)) {
        // Do work...
        WorkItem *item = get_next_work_item();
        if (item) {
            process_item(item);
            free_work_item(item);
        }
        
        // Cooperative check point with timeout
        // This replaces pthread_testcancel() with more control
        struct timespec sleep_time = {0, 100000000};  // 100ms
        nanosleep(&sleep_time, NULL);
    }
    
    // Thread has full control over cleanup
    printf("Thread: Performing controlled shutdown\n");
    cleanup_thread_resources();
    
    atomic_store(&self->is_running, false);
    return NULL;
}
 
// Usage
int main() {
    ManagedThread worker;
    managed_thread_init(&worker);
    
    pthread_create(&worker.thread, NULL, worker_cooperative, &worker);
    
    // Let thread run...
    sleep(5);
    
    // Request graceful shutdown
    printf("Main: Requesting thread shutdown\n");
    managed_thread_request_stop(&worker);
    
    // Wait for thread to finish cleanup
    pthread_join(worker.thread, NULL);
    printf("Main: Thread has exited cleanly\n");
    
    return 0;
}

Cooperative vs. Pthread Cancellation

Platform Differences and Portability

Thread cancellation mechanisms vary significantly across operating systems and threading libraries. Understanding these differences is essential for writing portable code.

Thread Cancellation Across Platforms
Platform	Cancellation Mechanism	Notes
POSIX/Linux	`pthread_cancel()`, cleanup handlers	Full support for deferred/async modes and cleanup handlers
Windows	`TerminateThread()` (unsafe) or cooperative	No equivalent to deferred cancellation; cooperative patterns required
macOS	POSIX pthreads	Full POSIX support; Foundation framework uses cooperative patterns
Java	`Thread.interrupt()`, `InterruptedException`	Cooperative interruption; thread must check interrupted status
C++11+	No built-in cancellation	Must implement cooperative cancellation; std::jthread adds stop_token in C++20
Go	`context.Context` cancellation	Cooperative via context; goroutines check ctx.Done() channel
Rust	No forced cancellation	Ownership model prevents resource leaks; use flags or channels

C++20 std::jthread and stop_token

cpp20_jthread.cpp
C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <thread>
#include <stop_token>
#include <iostream>
#include <chrono>
 
// C++20 std::jthread with stop_token
void worker(std::stop_token stoken) {
    while (!stoken.stop_requested()) {
        std::cout << "Working...\n";
        std::this_thread::sleep_for(std::chrono::milliseconds(500));
    }
    std::cout << "Received stop request, cleaning up\n";
    // Cleanup happens here, thread has full control
}
 
int main() {
    // jthread automatically joins on destruction
    std::jthread worker_thread(worker);
    
    std::this_thread::sleep_for(std::chrono::seconds(2));
    
    // Request stop - sets the stop_token
    worker_thread.request_stop();
    
    // jthread destructor calls join() automatically
    // No need for explicit join!
    
    std::cout << "Thread has exited\n";
    return 0;
}

Summary: Thread Cancellation Mastery

Thread cancellation is one of the most subtle aspects of concurrent programming. Let's consolidate the essential concepts:

Key Takeaways

•Threads cannot be safely killed arbitrarily — Unlike processes, threads share resources and state. Abrupt termination causes orphaned locks, memory leaks, and data corruption.
•Deferred cancellation is the safe default — Threads are cancelled only at defined cancellation points, allowing them to finish critical operations first.
•Asynchronous cancellation is almost never safe — Reserve it only for pure computation with no resource management whatsoever.
•Cleanup handlers ensure resource release — Register handlers for every acquired resource; they run automatically on cancellation or pthread_exit().
•Cancellation state provides critical section protection — Disable cancellation during operations that must complete atomically, then re-enable.
•pthread_testcancel() adds explicit cancellation points — Essential for compute-bound threads without natural cancellation points.
•Cooperative cancellation is often preferred — Using atomic flags gives threads complete control and improves portability across platforms.

Page Complete

1 / 5