Loading learning content...
While threads share a remarkable amount of process state, they are not mere clones of each other. Each thread maintains its own private state—resources that belong exclusively to that thread and are not visible to siblings. This private state is what allows threads to execute independently, maintain their own execution context, and operate without interfering with each other's core execution flow.
Understanding thread-specific resources is crucial for:
By the end of this page, you will have mastery over the four categories of thread-private state: the register set, the stack, thread-local storage, and scheduling-related attributes. You'll understand how each is managed, what problems can arise, and how to use them effectively.
The register set is the most fundamental thread-private resource. CPU registers are the fastest storage available to a processor—they hold the data currently being operated on and the execution state.
What's in the Register Set:
| Register | Purpose | Preserved Across Calls? |
|---|---|---|
| RAX | Return value, temporary | No (caller-saved) |
| RBX | Callee-saved general purpose | Yes (callee-saved) |
| RCX | 4th integer argument, counter | No |
| RDX | 3rd integer argument, 2nd return value | No |
| RSI | 2nd integer argument | No |
| RDI | 1st integer argument | No |
| RBP | Base pointer / frame pointer | Yes |
| RSP | Stack pointer | Yes (by definition) |
| R8-R11 | 5th-8th arguments, temporaries | No |
| R12-R15 | Callee-saved general purpose | Yes |
Why Registers Must Be Private:
Consider two threads executing simultaneously:
Thread A: RAX = 5, computing 5 * 3
Thread B: RAX = 100, computing 100 + 7
If they shared registers, one would overwrite the other's computation. The result would be meaningless. Thus, each thread has its own complete set of register values.
Context Switch and Registers:
When the scheduler switches from one thread to another:
This save/restore operation is the core of context switching. The cost of saving and restoring all registers is a significant component of context switch overhead.
123456789101112131415161718192021222324252627282930313233343536373839404142
/* Simplified representation of saved register context */ struct cpu_context { /* General-purpose registers (x86-64) */ uint64_t rax, rbx, rcx, rdx; uint64_t rsi, rdi, rbp, rsp; uint64_t r8, r9, r10, r11; uint64_t r12, r13, r14, r15; /* Instruction pointer */ uint64_t rip; /* Flags register */ uint64_t rflags; /* Segment registers (important for TLS) */ uint16_t cs, ss, ds, es, fs, gs; uint64_t fs_base, gs_base; /* Base addresses for FS/GS segments */ /* Extended state (floating point, SIMD) */ /* This is large: 512 bytes for x87/SSE, up to 2KB+ with AVX-512 */ uint8_t fpu_state[512]; uint8_t extended_state[]; /* Variable size based on CPU features */}; /* * Context switch saves/restores this entire structure. * Modern CPUs can save/restore FPU state lazily to reduce overhead. */ void context_switch(struct thread *old, struct thread *new) { /* Save old thread's state */ save_cpu_context(&old->cpu_context); /* Critical: Switch stack pointer */ /* After this point, we're on the new thread's stack */ /* Restore new thread's state */ restore_cpu_context(&new->cpu_context); /* "Return" - but we return to wherever the new thread was suspended */}Saving/restoring floating-point and SIMD state is expensive (hundreds of bytes). Modern OSes use 'lazy' FPU switching: they don't save/restore FPU state until a thread actually uses floating-point instructions. A 'used FPU' flag is tracked per thread, optimizing the common case where many threads never use floating-point.
Every thread has its own stack—a region of memory used for function call management, local variables, and temporary storage. The stack is perhaps the most operationally important thread-private resource.
What Lives on the Stack:
Stack Allocation and Layout:
Stack Isolation:
Because each thread has its own stack, local variables are inherently thread-safe:
void worker(int id) {
int local_count = 0; // Private to this thread's invocation
char buffer[1024]; // Private
// These can never conflict with another thread's local_count
// unless you deliberately share their addresses
}
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
#include <pthread.h>#include <stdio.h>#include <stdlib.h>#include <string.h> void *stack_info_thread(void *arg) { int local_var; /* A variable on this thread's stack */ /* Get the thread's stack info */ pthread_attr_t attr; pthread_getattr_np(pthread_self(), &attr); void *stack_addr; size_t stack_size; pthread_attr_getstack(&attr, &stack_addr, &stack_size); printf("Thread stack:\n"); printf(" Stack base: %p\n", stack_addr); printf(" Stack size: %zu bytes (%.2f MB)\n", stack_size, stack_size / (1024.0 * 1024.0)); printf(" Stack top: %p\n", (char*)stack_addr + stack_size); printf(" local_var at: %p\n", &local_var); printf(" Stack used: %zu bytes\n", (char*)stack_addr + stack_size - (char*)&local_var); pthread_attr_destroy(&attr); return NULL;} int main() { /* Create thread with custom stack size */ pthread_attr_t attr; pthread_attr_init(&attr); /* Set stack size to 1 MB */ size_t custom_size = 1 * 1024 * 1024; pthread_attr_setstacksize(&attr, custom_size); pthread_t thread; pthread_create(&thread, &attr, stack_info_thread, NULL); pthread_join(thread, NULL); pthread_attr_destroy(&attr); return 0;} /* Alternatively, allocate your own stack memory */void create_thread_with_custom_stack(void) { pthread_attr_t attr; pthread_attr_init(&attr); /* Allocate aligned stack memory */ size_t stack_size = 2 * 1024 * 1024; /* 2 MB */ void *stack = aligned_alloc(16, stack_size); /* Set the stack address and size */ pthread_attr_setstack(&attr, stack, stack_size); pthread_t thread; pthread_create(&thread, &attr, stack_info_thread, NULL); pthread_join(thread, NULL); /* Don't forget to free the stack after thread exits */ free(stack); pthread_attr_destroy(&attr);}Never pass a pointer to a local variable to another thread unless you guarantee the calling function won't return until the other thread is done with it. Stack memory is reused as functions return. What was a valid data structure becomes garbage—or worse, a valid-looking data structure for a different function's frame.
Thread-Local Storage (TLS) provides a mechanism for variables that are:
This solves a common problem: you need a variable that persists across function calls (like a global or static), but you don't want threads to share it.
Common TLS Use Cases:
errno is thread-local so each thread has its own error statestrtok_r stored per-thread123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
#include <threads.h> /* C11 threads */#include <stdio.h> /* C11 standard way: _Thread_local keyword */_Thread_local int thread_errno = 0;_Thread_local char thread_name[64] = {0}; /* GNU C also accepts __thread keyword */__thread int gnu_thread_var = 42; void set_thread_identity(const char *name, int id) { snprintf(thread_name, sizeof(thread_name), "%s", name); thread_errno = id;} void print_thread_identity(void) { /* Each thread sees its OWN values */ printf("Thread: %s, errno: %d\n", thread_name, thread_errno);} int thread_func(void *arg) { int id = *(int*)arg; /* Set this thread's local values */ char name[32]; snprintf(name, sizeof(name), "Worker-%d", id); set_thread_identity(name, id * 100); /* Print shows this thread's values */ print_thread_identity(); return 0;} int main(void) { thrd_t t1, t2; int id1 = 1, id2 = 2; set_thread_identity("Main", 0); thrd_create(&t1, thread_func, &id1); thrd_create(&t2, thread_func, &id2); thrd_join(t1, NULL); thrd_join(t2, NULL); /* Main thread still has its own values */ print_thread_identity(); /* Prints: Thread: Main, errno: 0 */ return 0;}How TLS Works Under the Hood:
Thread-local storage is typically implemented using a segment register (FS or GS on x86-64). Each thread has a unique base address loaded into this register, pointing to that thread's TLS block:
Thread 1: GS_BASE = 0x7f1234560000
→ TLS block at that address contains Thread 1's TLS variables
Thread 2: GS_BASE = 0x7f5678900000
→ TLS block at that address contains Thread 2's TLS variables
Accessing a TLS variable is then:
This is extremely fast—just one or two extra instructions compared to a global variable.
Use _Thread_local / __thread for simple, statically-known TLS variables—it's cleaner and faster. Use pthread_key_create for dynamically allocated per-thread data, especially when you need destructors to clean up resources when threads exit. Libraries that don't know thread identity often use pthread keys.
| Approach | Speed | Flexibility | Destructor Support | Use Case |
|---|---|---|---|---|
| _Thread_local / __thread | Fastest (compiled-in offsets) | Static only | No | Simple per-thread state |
| pthread_key_t | Fast (TLS + indirection) | Dynamic | Yes | Complex objects, libraries |
| Manual thread ID lookup | Slower (hash lookup) | Full control | Manual | When other options don't fit |
Each thread has a unique thread ID and individual scheduling attributes that affect how the scheduler treats it.
Thread Identification:
gettid()12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
#define _GNU_SOURCE#include <pthread.h>#include <sched.h>#include <stdio.h>#include <unistd.h>#include <sys/syscall.h> void *thread_with_custom_scheduling(void *arg) { /* Get this thread's kernel TID */ pid_t tid = syscall(SYS_gettid); printf("Thread TID: %d\n", tid); /* Get current scheduling policy and priority */ int policy; struct sched_param param; pthread_getschedparam(pthread_self(), &policy, ¶m); printf("Scheduling policy: %s\n", policy == SCHED_OTHER ? "SCHED_OTHER" : policy == SCHED_FIFO ? "SCHED_FIFO" : policy == SCHED_RR ? "SCHED_RR" : "Unknown"); printf("Priority: %d\n", param.sched_priority); /* Get CPU affinity */ cpu_set_t cpuset; pthread_getaffinity_np(pthread_self(), sizeof(cpuset), &cpuset); printf("Affinity: "); for (int i = 0; i < CPU_SETSIZE; i++) { if (CPU_ISSET(i, &cpuset)) { printf("CPU%d ", i); } } printf("\n"); return NULL;} int main() { pthread_t thread; pthread_attr_t attr; pthread_attr_init(&attr); /* Set custom CPU affinity - restrict to CPUs 0 and 1 */ cpu_set_t cpuset; CPU_ZERO(&cpuset); CPU_SET(0, &cpuset); CPU_SET(1, &cpuset); pthread_attr_setaffinity_np(&attr, sizeof(cpuset), &cpuset); /* Set scheduling policy and priority (requires root for real-time) */ /* pthread_attr_setschedpolicy(&attr, SCHED_FIFO); */ /* struct sched_param param = { .sched_priority = 50 }; */ /* pthread_attr_setschedparam(&attr, ¶m); */ pthread_create(&thread, &attr, thread_with_custom_scheduling, NULL); pthread_join(thread, NULL); pthread_attr_destroy(&attr); return 0;}SCHED_FIFO and SCHED_RR policies require root privileges (or CAP_SYS_NICE capability). A real-time thread with high priority can starve other threads and even make the system unresponsive if it doesn't block. Use with extreme caution.
Each thread has its own signal mask—a set of signals that are blocked (not delivered) to that specific thread. This is one of the few truly per-thread aspects of signal handling.
Key Points:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
#include <pthread.h>#include <signal.h>#include <stdio.h>#include <unistd.h> void *worker_thread(void *arg) { int id = *(int*)arg; /* Get current signal mask for this thread */ sigset_t current_mask; pthread_sigmask(SIG_BLOCK, NULL, ¤t_mask); printf("Thread %d: SIGINT %s\n", id, sigismember(¤t_mask, SIGINT) ? "BLOCKED" : "not blocked"); printf("Thread %d: SIGTERM %s\n", id, sigismember(¤t_mask, SIGTERM) ? "BLOCKED" : "not blocked"); /* Sleep and see if we receive signals */ for (int i = 0; i < 5; i++) { printf("Thread %d: iteration %d\n", id, i); sleep(1); } return NULL;} int main() { /* Thread 1: Block SIGINT */ sigset_t mask1; sigemptyset(&mask1); sigaddset(&mask1, SIGINT); pthread_sigmask(SIG_BLOCK, &mask1, NULL); int id1 = 1; pthread_t t1; pthread_create(&t1, NULL, worker_thread, &id1); /* Thread 2: Different mask - block SIGTERM instead */ sigset_t mask2; sigemptyset(&mask2); sigaddset(&mask2, SIGTERM); pthread_sigmask(SIG_SETMASK, &mask2, NULL); /* Replace entire mask */ int id2 = 2; pthread_t t2; pthread_create(&t2, NULL, worker_thread, &id2); /* If SIGINT is sent, Thread 1 won't receive it (blocked) */ /* Thread 2 might receive it (not blocked for SIGINT) */ pthread_join(t1, NULL); pthread_join(t2, NULL); return 0;}Block all signals in main() before creating any threads. Each new thread inherits this mask. Create one dedicated signal-handling thread that sigwaits for signals. This avoids the complexity of async-signal-safety entirely—only the signal thread handles signals, and it does so synchronously.
Each thread has its own state (running, ready, blocked, terminated) tracked by the kernel, and when terminated, maintains an exit status until joined.
Thread States (private to each thread):
Each thread can be in different states simultaneously. One thread blocked on I/O doesn't prevent another from running.
Exit Status:
When a thread terminates (via return or pthread_exit), it produces an exit value. This value is held until another thread calls pthread_join to retrieve it:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
#include <pthread.h>#include <stdio.h>#include <stdlib.h> /* Thread returning a simple value */void *compute_sum(void *arg) { int *numbers = (int*)arg; int sum = 0; for (int i = 0; i < 5; i++) { sum += numbers[i]; } /* Return value cast to void* */ return (void*)(long)sum;} /* Thread returning a heap-allocated structure */struct result { int sum; int max; int min;}; void *compute_stats(void *arg) { int *numbers = (int*)arg; struct result *res = malloc(sizeof(struct result)); res->sum = 0; res->max = numbers[0]; res->min = numbers[0]; for (int i = 0; i < 5; i++) { res->sum += numbers[i]; if (numbers[i] > res->max) res->max = numbers[i]; if (numbers[i] < res->min) res->min = numbers[i]; } /* Return pointer to heap-allocated result */ return res; /* Caller must free */} int main() { int data[] = {10, 20, 5, 30, 15}; pthread_t t1, t2; pthread_create(&t1, NULL, compute_sum, data); pthread_create(&t2, NULL, compute_stats, data); /* Retrieve exit status from t1 */ void *ret1; pthread_join(t1, &ret1); printf("Sum (simple return): %ld\n", (long)ret1); /* Retrieve exit status from t2 */ void *ret2; pthread_join(t2, &ret2); struct result *stats = (struct result*)ret2; printf("Stats: sum=%d, max=%d, min=%d\n", stats->sum, stats->max, stats->min); free(stats); /* Free the heap-allocated result */ return 0;}By default, threads are 'joinable'—their resources persist after termination until joined. If you don't care about the return value, call pthread_detach() or create the thread with PTHREAD_CREATE_DETACHED attribute. Detached threads release resources immediately on termination, preventing resource leaks.
Let's visualize how all thread-specific resources fit together within a process:
| Resource | Private/Shared | Synchronization Needed | Notes |
|---|---|---|---|
| Register set | Private | No (hardware enforced) | Saved/restored on context switch |
| Stack | Private | No (unless addresses shared) | Function calls, local variables |
| Thread-local storage | Private | No (per-thread copies) | Thread-specific globals |
| Thread ID | Private | No (immutable) | Unique identifier |
| Signal mask | Private | No (per-thread) | Controls signal delivery |
| Scheduling attributes | Private | No (kernel-managed) | Priority, affinity, policy |
| Thread state | Private | No (kernel-managed) | Running, blocked, etc. |
| Exit status | Private | Via pthread_join | Available until joined |
We've thoroughly examined the resources that each thread keeps private. Let's consolidate the essential insights:
_Thread_local for simple cases, pthread_key_t for complex objects with destructors.What's Next:
With a complete understanding of what threads own privately and share with siblings, we're ready to explore why threading is valuable. The next page examines the Benefits of Threading—responsiveness, resource sharing, economy, and scalability—and when these benefits outweigh the complexity costs.
You now have comprehensive knowledge of thread-private resources—registers, stack, TLS, scheduling attributes, signal masks, and state. This understanding is crucial for writing correct concurrent code and debugging thread-related issues.