Operating SystemsThread Concepts

Thread Libraries

LevelIntermediate

Duration75 mins

TopicThread Concepts

2 / 5

Windows Threads

Threading in the Windows Ecosystem

While POSIX threads dominate the Unix world, the Windows Threading API represents a fundamentally different approach to concurrent programming—one born from the unique architecture of Windows NT and evolved over three decades of enterprise computing. Understanding Windows threads is essential for any engineer working on Windows applications, cross-platform development, or system-level programming.

Windows threads are deeply integrated into the NT kernel's object model, presenting threads as first-class kernel objects with security descriptors, handles, and rich query interfaces. This design reflects Windows' heritage as an enterprise operating system where security, auditing, and manageability are paramount.

This page provides a comprehensive exploration of Windows threading—from the low-level CreateThread API through modern thread pools and the architectural patterns that distinguish Windows concurrency from Unix-style threading.

What You Will Master

By the end of this page, you will understand the complete Windows threading model: the Win32 thread API, thread handles and IDs, thread local storage, synchronization primitives, the thread pool API, and how Windows threads interact with the Windows security model. You will be equipped to write robust multithreaded Windows applications.

Windows Thread Architecture

The Windows threading model is built on the foundation of the NT Kernel, which treats threads as fundamental scheduling units within processes. Unlike early Unix systems that evolved threading as an afterthought, Windows NT was designed from its inception (1989-1993) with threads as core primitives.

Threads as Kernel Objects

In Windows, a thread is a kernel object—an instance of the KTHREAD structure maintained by the kernel. This means:

Threads have handles (like file handles) that can be passed between processes
Threads have security descriptors controlling who can access them
Threads can be waited upon using the unified wait API (WaitForSingleObject, etc.)
Threads appear in system management tools and can be enumerated

This object-oriented approach to threads enables rich functionality but implies more overhead than the minimal Pthreads model.

Windows Thread Object Hierarchy
Structure	Location	Key Contents
ETHREAD	Executive (kernel)	Thread ID, process link, IRP list, security info, timing
KTHREAD	Kernel core	Scheduling state, quantum, priority, stack, wait blocks
TEB	User space (per-thread)	TLS array, stack limits, last error, exception info
CSR_THREAD	Client/Server Runtime	Console subsystem state, shutdown info

Thread Environment Block (TEB)

Every Windows thread has a Thread Environment Block (TEB) mapped into user-space memory. The TEB provides:

Thread Local Storage (TLS) array — Fixed-size array for fast TLS access
Stack boundaries — Base and limit addresses for stack overflow detection
Last error code — The value returned by GetLastError()
Exception handling — Head of the structured exception handler chain
Thread ID — Cached copy of the kernel thread ID

The TEB is directly accessible via the FS or GS segment register (x86/x64), enabling extremely fast access to thread-local data without function calls.

// Accessing TEB on x64 (GS segment)
// The GS segment base points to the TEB
// Offset 0x30 contains the pointer to TEB itself (self-reference)
// Offset 0x48 contains the thread ID

User Mode vs Kernel Mode Stacks

Each Windows thread has TWO stacks: a user-mode stack (typically 1MB by default) and a kernel-mode stack (12KB on x86, 24KB on x64). When a thread makes a system call, it switches to its kernel stack. This separation prevents user code from corrupting kernel state and limits kernel stack usage from user-mode recursion.

Creating Threads in Windows

Windows provides multiple APIs for creating threads, each with different capabilities and use cases. Understanding when to use each is crucial for correct Windows programming.

CreateThread: The Foundation

The CreateThread function is the core Win32 API for creating threads:

HANDLE CreateThread(
    LPSECURITY_ATTRIBUTES   lpThreadAttributes,  // Security descriptor
    SIZE_T                  dwStackSize,         // Stack size (0 = default)
    LPTHREAD_START_ROUTINE  lpStartAddress,      // Thread function
    LPVOID                  lpParameter,         // Argument to thread
    DWORD                   dwCreationFlags,     // Creation flags
    LPDWORD                 lpThreadId           // Output: thread ID
);

This returns a HANDLE to the thread object, which must be closed with CloseHandle() when no longer needed (even if the thread has exited).

windows_thread_creation.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
#include <windows.h>
#include <stdio.h>
 
/*
 * Basic Thread Creation Pattern
 * -----------------------------
 * The fundamental Windows thread creation idiom
 */
 
// Thread function signature (WINAPI = __stdcall calling convention)
DWORD WINAPI WorkerThread(LPVOID lpParam) {
    int threadNum = (int)(INT_PTR)lpParam;
    
    printf("Thread %d starting\n", threadNum);
    
    // Simulate work
    Sleep(1000);
    
    printf("Thread %d complete\n", threadNum);
    
    // Return value becomes thread exit code
    return threadNum * 10;
}
 
void CreateBasicThread(void) {
    HANDLE hThread;
    DWORD threadId;
    
    hThread = CreateThread(
        NULL,               // Default security
        0,                  // Default stack size
        WorkerThread,       // Thread function
        (LPVOID)1,          // Thread parameter
        0,                  // Run immediately
        &threadId           // Receive thread ID
    );
    
    if (hThread == NULL) {
        printf("CreateThread failed: %lu\n", GetLastError());
        return;
    }
    
    printf("Created thread with ID: %lu\n", threadId);
    
    // Wait for thread to complete
    WaitForSingleObject(hThread, INFINITE);
    
    // Get thread exit code
    DWORD exitCode;
    GetExitCodeThread(hThread, &exitCode);
    printf("Thread exit code: %lu\n", exitCode);
    
    // CRITICAL: Close the handle
    CloseHandle(hThread);
}
 
/*
 * Creating Multiple Threads with Proper Argument Passing
 * -------------------------------------------------------
 * Using heap-allocated structures for thread arguments
 */
 
typedef struct {
    int threadId;
    int startValue;
    int endValue;
    int *resultArray;
    CRITICAL_SECTION *pCS;  // For synchronization
} ThreadContext;
 
DWORD WINAPI ComputeWorker(LPVOID lpParam) {
    ThreadContext *ctx = (ThreadContext *)lpParam;
    
    printf("Worker %d: processing range [%d, %d)\n",
           ctx->threadId, ctx->startValue, ctx->endValue);
    
    for (int i = ctx->startValue; i < ctx->endValue; i++) {
        // Compute something
        int result = i * i;
        
        // Store result with synchronization
        EnterCriticalSection(ctx->pCS);
        ctx->resultArray[i] = result;
        LeaveCriticalSection(ctx->pCS);
    }
    
    // Free our context (we own it)
    free(ctx);
    
    return 0;
}
 
void CreateWorkerTeam(int numThreads, int totalWork) {
    HANDLE *threads = (HANDLE *)malloc(numThreads * sizeof(HANDLE));
    int *results = (int *)calloc(totalWork, sizeof(int));
    CRITICAL_SECTION cs;
    
    InitializeCriticalSection(&cs);
    
    int chunkSize = totalWork / numThreads;
    
    for (int i = 0; i < numThreads; i++) {
        // Allocate context on heap (thread will free)
        ThreadContext *ctx = (ThreadContext *)malloc(sizeof(ThreadContext));
        ctx->threadId = i;
        ctx->startValue = i * chunkSize;
        ctx->endValue = (i == numThreads - 1) ? totalWork : (i + 1) * chunkSize;
        ctx->resultArray = results;
        ctx->pCS = &cs;
        
        threads[i] = CreateThread(NULL, 0, ComputeWorker, ctx, 0, NULL);
        if (threads[i] == NULL) {
            printf("Failed to create thread %d\n", i);
        }
    }
    
    // Wait for all threads
    WaitForMultipleObjects(numThreads, threads, TRUE, INFINITE);
    
    // Cleanup
    for (int i = 0; i < numThreads; i++) {
        CloseHandle(threads[i]);
    }
    DeleteCriticalSection(&cs);
    free(threads);
    free(results);
}

CreateThread vs _beginthreadex

When using the C Runtime Library (CRT), always use _beginthreadex() instead of CreateThread(). The CRT maintains per-thread state (errno, strtok buffers, etc.) that requires initialization. CreateThread() doesn't initialize CRT state, causing subtle bugs. _beginthreadex() wraps CreateThread() and properly initializes CRT structures.

beginthreadex_pattern.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <windows.h>
#include <process.h>  // For _beginthreadex
#include <stdio.h>
 
/*
 * Using _beginthreadex for CRT-Safe Thread Creation
 * --------------------------------------------------
 * ALWAYS prefer this when using C Runtime Library functions
 */
 
// Note: Different signature than LPTHREAD_START_ROUTINE
unsigned __stdcall SafeWorker(void *arg) {
    int id = (int)(INT_PTR)arg;
    
    // Safe to use CRT functions like strtok, rand, etc.
    char buffer[100];
    sprintf(buffer, "Thread %d: using CRT safely\n", id);
    printf("%s", buffer);
    
    // strtok is safe - each thread has its own context
    char str[] = "hello,world,test";
    char *token = strtok(str, ",");
    while (token) {
        printf("Thread %d token: %s\n", id, token);
        token = strtok(NULL, ",");
    }
    
    // _endthreadex for proper cleanup (called automatically on return)
    return 0;
}
 
HANDLE CreateSafeThread(int id) {
    // Cast to HANDLE since _beginthreadex returns uintptr_t
    HANDLE hThread = (HANDLE)_beginthreadex(
        NULL,           // Security
        0,              // Stack size
        SafeWorker,     // Thread function
        (void *)(INT_PTR)id,  // Argument
        0,              // Creation flags
        NULL            // Thread ID (optional)
    );
    
    if (hThread == 0) {
        printf("_beginthreadex failed: %d\n", errno);
        return NULL;
    }
    
    return hThread;
}

Thread Handles and IDs

Windows distinguishes between thread handles and thread IDs—a distinction that causes confusion for developers from Unix backgrounds but provides important capabilities.

Thread IDs

A Thread ID (TID) is a system-wide unique identifier for a thread:

32-bit unsigned integer
Unique across all processes while the thread exists
Assigned by the kernel at creation
Can be reused after thread termination
Useful for inter-process thread identification

Thread Handles

A Thread Handle is a process-local reference to a thread object:

Opaque HANDLE type (pointer-sized)
Grants specific access rights (THREAD_TERMINATE, THREAD_SUSPEND_RESUME, etc.)
Must be closed with CloseHandle()
Multiple handles can reference the same thread
Subject to security checks

Thread ID Operations

•GetCurrentThreadId() — Get calling thread's ID
•GetThreadId(handle) — Get ID from handle
•Pass to other processes for identification
•Use in logging and debugging
•Key for thread-specific data structures

Thread Handle Operations

•GetCurrentThread() — Pseudo-handle to self
•OpenThread(access, id) — Get handle from ID
•DuplicateHandle() — Clone handle for sharing
•WaitForSingleObject() — Wait for termination
•TerminateThread() — Force termination (dangerous!)

handles_and_ids.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
#include <windows.h>
#include <stdio.h>
 
/*
 * Demonstrating Handle vs ID Distinctions
 * ----------------------------------------
 */
 
DWORD WINAPI DemoThread(LPVOID lpParam) {
    DWORD myId = GetCurrentThreadId();
    HANDLE myPseudoHandle = GetCurrentThread();
    
    printf("Inside thread:\n");
    printf("  Thread ID: %lu\n", myId);
    printf("  Pseudo handle: %p\n", myPseudoHandle);
    
    /*
     * IMPORTANT: GetCurrentThread() returns a PSEUDO-HANDLE
     * Value is always -2 (0xFFFFFFFE on 32-bit)
     * It's a special value that the kernel interprets as "current thread"
     * 
     * Pseudo-handles:
     * - Cannot be passed to other threads/processes
     * - Do not need to be closed
     * - Always valid within current thread context
     */
    
    // To get a real handle to current thread:
    HANDLE realHandle;
    BOOL success = DuplicateHandle(
        GetCurrentProcess(),    // Source process
        GetCurrentThread(),     // Source handle (pseudo)
        GetCurrentProcess(),    // Target process
        &realHandle,            // Output: real handle
        0,                      // Access (0 = same as source)
        FALSE,                  // Inheritable
        DUPLICATE_SAME_ACCESS   // Options
    );
    
    if (success) {
        printf("  Real handle: %p\n", realHandle);
        // Must close real handles
        CloseHandle(realHandle);
    }
    
    Sleep(5000);  // Keep thread alive for demo
    return 0;
}
 
void DemonstrateHandleSharing(void) {
    HANDLE hThread;
    DWORD threadId;
    
    hThread = CreateThread(NULL, 0, DemoThread, NULL, 0, &threadId);
    
    printf("Main thread:\n");
    printf("  Created thread ID: %lu\n", threadId);
    printf("  Handle value: %p\n", hThread);
    
    // We can query thread info using the handle
    DWORD exitCode;
    GetExitCodeThread(hThread, &exitCode);
    printf("  Exit code: %lu (%s)\n", exitCode,
           exitCode == STILL_ACTIVE ? "STILL_ACTIVE" : "terminated");
    
    // Get the ID back from the handle
    DWORD retrievedId = GetThreadId(hThread);
    printf("  Retrieved ID from handle: %lu\n", retrievedId);
    
    // We can also open another handle from the ID
    HANDLE hThread2 = OpenThread(
        THREAD_QUERY_INFORMATION,  // Desired access
        FALSE,                      // Inherit handle
        threadId                    // Thread ID
    );
    
    if (hThread2) {
        printf("  Second handle: %p\n", hThread2);
        CloseHandle(hThread2);
    }
    
    WaitForSingleObject(hThread, INFINITE);
    CloseHandle(hThread);
}

Handle Pseudo-Handles

GetCurrentThread() and GetCurrentProcess() return pseudo-handles, not real handles. These are special values (-1 and -2) that the kernel recognizes as 'current thread/process'. They're always valid in the current context and don't need closing. However, if you need to pass a handle to another thread or process, you must use DuplicateHandle() to get a real handle.

Thread Synchronization Primitives

Windows provides a rich set of synchronization primitives, ranging from lightweight user-mode objects to heavyweight kernel objects. Understanding the performance characteristics and use cases of each is crucial for efficient concurrent programming.

User-Mode vs Kernel-Mode Primitives

User-mode primitives (Critical Sections, SRW Locks) operate entirely in user space when uncontended, never entering the kernel. They're extremely fast but cannot be shared across processes.

Kernel-mode primitives (Mutexes, Semaphores, Events) are kernel objects that can be named and shared across processes but require kernel transitions even in the uncontended case.

Windows Synchronization Primitives Comparison
Primitive	Mode	Cross-Process	Performance	Use Case
Critical Section	User + Kernel fallback	No	Fastest	General mutual exclusion
SRW Lock	User + Kernel fallback	No	Very fast	Reader-writer scenarios
Mutex	Kernel	Yes (named)	Slow	Cross-process synchronization
Semaphore	Kernel	Yes (named)	Slow	Counting/resource pools
Event	Kernel	Yes (named)	Slow	Signaling/notification
Condition Variable	User + Kernel fallback	No	Fast	Wait for condition

windows_synchronization.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
#include <windows.h>
#include <stdio.h>
 
/*
 * Critical Section: The Workhorse of Windows Synchronization
 * -----------------------------------------------------------
 * Fast, lightweight, but process-local only
 */
 
CRITICAL_SECTION g_cs;
int g_sharedData = 0;
 
void UseCriticalSection(void) {
    // Initialize (can also use InitializeCriticalSectionAndSpinCount)
    InitializeCriticalSection(&g_cs);
    
    // In worker threads:
    EnterCriticalSection(&g_cs);
    g_sharedData++;
    LeaveCriticalSection(&g_cs);
    
    // TryEnterCriticalSection for non-blocking attempts
    if (TryEnterCriticalSection(&g_cs)) {
        // Got the lock
        g_sharedData++;
        LeaveCriticalSection(&g_cs);
    } else {
        // Lock held by another thread
    }
    
    // Cleanup
    DeleteCriticalSection(&g_cs);
}
 
/*
 * SRW Lock: Modern Slim Reader/Writer Lock
 * -----------------------------------------
 * Introduced in Vista. Extremely efficient.
 */
 
SRWLOCK g_srwLock = SRWLOCK_INIT;  // Static initialization!
int g_data = 0;
 
DWORD WINAPI Reader(LPVOID lpParam) {
    int id = (int)(INT_PTR)lpParam;
    
    AcquireSRWLockShared(&g_srwLock);  // Multiple readers OK
    printf("Reader %d: value = %d\n", id, g_data);
    ReleaseSRWLockShared(&g_srwLock);
    
    return 0;
}
 
DWORD WINAPI Writer(LPVOID lpParam) {
    int id = (int)(INT_PTR)lpParam;
    
    AcquireSRWLockExclusive(&g_srwLock);  // Exclusive access
    g_data++;
    printf("Writer %d: set value = %d\n", id, g_data);
    ReleaseSRWLockExclusive(&g_srwLock);
    
    return 0;
}
 
/*
 * Condition Variable: Wait for Conditions
 * ----------------------------------------
 * Works with Critical Sections or SRW Locks
 */
 
CRITICAL_SECTION g_queueCS;
CONDITION_VARIABLE g_queueCV;
int g_queue[100];
int g_queueCount = 0;
 
DWORD WINAPI Producer(LPVOID lpParam) {
    for (int i = 0; i < 10; i++) {
        EnterCriticalSection(&g_queueCS);
        
        g_queue[g_queueCount++] = i;
        printf("Produced: %d\n", i);
        
        // Wake one waiting consumer
        WakeConditionVariable(&g_queueCV);
        
        LeaveCriticalSection(&g_queueCS);
        Sleep(100);
    }
    return 0;
}
 
DWORD WINAPI Consumer(LPVOID lpParam) {
    for (int i = 0; i < 10; i++) {
        EnterCriticalSection(&g_queueCS);
        
        // Wait while queue is empty
        while (g_queueCount == 0) {
            // Atomically releases CS and waits
            SleepConditionVariableCS(&g_queueCV, &g_queueCS, INFINITE);
            // CS is reacquired when we wake
        }
        
        int value = g_queue[--g_queueCount];
        printf("Consumed: %d\n", value);
        
        LeaveCriticalSection(&g_queueCS);
    }
    return 0;
}
 
/*
 * Kernel Objects: For Cross-Process Synchronization
 * ---------------------------------------------------
 */
 
void UseKernelMutex(void) {
    // Create named mutex (can be opened by other processes)
    HANDLE hMutex = CreateMutex(
        NULL,                   // Security
        FALSE,                  // Initial owner
        TEXT("Global\\MyMutex")  // Name (Global\ for session 0)
    );
    
    if (hMutex == NULL) {
        printf("CreateMutex failed: %lu\n", GetLastError());
        return;
    }
    
    // Wait to acquire
    DWORD result = WaitForSingleObject(hMutex, INFINITE);
    if (result == WAIT_OBJECT_0) {
        printf("Acquired mutex\n");
        
        // Critical section...
        
        ReleaseMutex(hMutex);
    } else if (result == WAIT_ABANDONED) {
        // Previous owner terminated without releasing
        printf("Mutex was abandoned\n");
        ReleaseMutex(hMutex);
    }
    
    CloseHandle(hMutex);
}

InitializeCriticalSectionAndSpinCount

For locks held briefly on multiprocessor systems, use InitializeCriticalSectionAndSpinCount() with a spin count (e.g., 4000). This causes threads to spin in user mode before blocking, avoiding expensive kernel transitions for quick lock/unlock cycles. The heap manager uses a spin count of 4000.

Thread Local Storage (TLS)

Windows Thread Local Storage provides per-thread data that persists across function calls. Windows offers two mechanisms: Dynamic TLS (the API) and Static TLS (compiler-supported).

Dynamic TLS

Dynamic TLS uses the TlsAlloc, TlsSetValue, TlsGetValue, and TlsFree functions. Each process has a limited number of TLS slots (minimum 64, typically 1088+ in modern Windows).

Static TLS

Static TLS uses the __declspec(thread) storage class specifier. The compiler and loader cooperate to allocate space in each thread's TEB. This is simpler but has some restrictions (can't be used in dynamically loaded DLLs on older systems).

windows_tls.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
#include <windows.h>
#include <stdio.h>
 
/*
 * Dynamic TLS Example
 * --------------------
 * Runtime allocation of thread-local slots
 */
 
// Global TLS index
DWORD g_tlsIndex = TLS_OUT_OF_INDEXES;
 
typedef struct {
    DWORD threadId;
    char name[64];
    int requestCount;
} ThreadData;
 
BOOL InitializeThreadData(const char *name) {
    ThreadData *data = (ThreadData *)LocalAlloc(LPTR, sizeof(ThreadData));
    if (!data) return FALSE;
    
    data->threadId = GetCurrentThreadId();
    strcpy_s(data->name, sizeof(data->name), name);
    data->requestCount = 0;
    
    return TlsSetValue(g_tlsIndex, data);
}
 
ThreadData *GetThreadData(void) {
    return (ThreadData *)TlsGetValue(g_tlsIndex);
}
 
void CleanupThreadData(void) {
    ThreadData *data = GetThreadData();
    if (data) {
        LocalFree(data);
        TlsSetValue(g_tlsIndex, NULL);
    }
}
 
DWORD WINAPI WorkerWithTLS(LPVOID lpParam) {
    int id = (int)(INT_PTR)lpParam;
    char name[64];
    sprintf_s(name, sizeof(name), "Worker-%d", id);
    
    // Initialize TLS for this thread
    if (!InitializeThreadData(name)) {
        printf("Failed to init TLS\n");
        return 1;
    }
    
    // Use TLS data throughout the thread
    for (int i = 0; i < 10; i++) {
        ThreadData *data = GetThreadData();
        data->requestCount++;
        printf("[%s] Request %d\n", data->name, data->requestCount);
        Sleep(100);
    }
    
    // Cleanup
    CleanupThreadData();
    return 0;
}
 
int MainWithDynamicTLS(void) {
    // Allocate TLS index at program start
    g_tlsIndex = TlsAlloc();
    if (g_tlsIndex == TLS_OUT_OF_INDEXES) {
        printf("TlsAlloc failed\n");
        return 1;
    }
    
    // Create threads
    HANDLE threads[4];
    for (int i = 0; i < 4; i++) {
        threads[i] = CreateThread(NULL, 0, WorkerWithTLS, 
                                  (LPVOID)(INT_PTR)i, 0, NULL);
    }
    
    WaitForMultipleObjects(4, threads, TRUE, INFINITE);
    
    for (int i = 0; i < 4; i++) {
        CloseHandle(threads[i]);
    }
    
    // Free TLS index
    TlsFree(g_tlsIndex);
    
    return 0;
}
 
/*
 * Static TLS with __declspec(thread)
 * -----------------------------------
 * Simpler but compiler-dependent
 */
 
// Each thread gets its own copy of these variables
__declspec(thread) int t_requestId = 0;
__declspec(thread) char t_lastError[256] = "";
 
DWORD WINAPI WorkerWithStaticTLS(LPVOID lpParam) {
    int id = (int)(INT_PTR)lpParam;
    
    // Each thread sees its own t_requestId
    for (int i = 0; i < 5; i++) {
        t_requestId++;
        sprintf_s(t_lastError, sizeof(t_lastError), 
                  "Thread %d, request %d", id, t_requestId);
        printf("%s\n", t_lastError);
        Sleep(50);
    }
    
    return 0;
}

DLL and Static TLS

On Windows XP/2003, using __declspec(thread) in a DLL that's loaded with LoadLibrary() causes crashes or incorrect behavior. Modern Windows (Vista+) handles this correctly. For maximum compatibility in DLLs, use dynamic TLS with TlsAlloc/TlsFree.

Thread Pool API

Modern Windows provides a sophisticated Thread Pool API that manages thread creation, destruction, and work distribution automatically. Using thread pools is strongly recommended over creating threads directly for most applications.

Why Use Thread Pools?

Reduced overhead — Threads are reused instead of created/destroyed per task
Automatic scaling — Pool grows/shrinks based on workload and CPU count
Work cancellation — Built-in support for canceling pending work
Wait operations — Efficiently wait on kernel objects and trigger callbacks
Timer callbacks — Schedule work to run at future times or periodically

The Windows thread pool uses an I/O completion port internally for maximum efficiency.

thread_pool_api.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
#include <windows.h>
#include <stdio.h>
 
/*
 * Simple Work Item Submission
 * ----------------------------
 * The easiest way to use the thread pool
 */
 
VOID CALLBACK SimpleWorkCallback(
    PTP_CALLBACK_INSTANCE Instance,
    PVOID Context,
    PTP_WORK Work
) {
    int taskId = (int)(INT_PTR)Context;
    printf("Task %d executing on thread %lu\n", 
           taskId, GetCurrentThreadId());
    
    // Simulate work
    Sleep(100);
    
    printf("Task %d complete\n", taskId);
}
 
void SubmitSimpleWork(void) {
    // Create work items
    PTP_WORK workItems[10];
    
    for (int i = 0; i < 10; i++) {
        workItems[i] = CreateThreadpoolWork(
            SimpleWorkCallback,
            (PVOID)(INT_PTR)i,  // Context
            NULL                // Environment (NULL = default pool)
        );
        
        if (workItems[i] == NULL) {
            printf("CreateThreadpoolWork failed\n");
            continue;
        }
        
        // Submit to thread pool
        SubmitThreadpoolWork(workItems[i]);
    }
    
    // Wait for all work to complete
    for (int i = 0; i < 10; i++) {
        if (workItems[i]) {
            WaitForThreadpoolWorkCallbacks(workItems[i], FALSE);
            CloseThreadpoolWork(workItems[i]);
        }
    }
}
 
/*
 * Callback Environment for Custom Pool Behavior
 * -----------------------------------------------
 * Control pool size, cleanup group, etc.
 */
 
void UseCustomEnvironment(void) {
    // Create custom thread pool
    PTP_POOL pool = CreateThreadpool(NULL);
    if (!pool) {
        printf("CreateThreadpool failed\n");
        return;
    }
    
    // Set thread counts
    SetThreadpoolThreadMinimum(pool, 2);
    SetThreadpoolThreadMaximum(pool, 8);
    
    // Create cleanup group (for automatic cleanup)
    PTP_CLEANUP_GROUP cleanupGroup = CreateThreadpoolCleanupGroup();
    
    // Initialize callback environment
    TP_CALLBACK_ENVIRON env;
    InitializeThreadpoolEnvironment(&env);
    SetThreadpoolCallbackPool(&env, pool);
    SetThreadpoolCallbackCleanupGroup(&env, cleanupGroup, NULL);
    
    // Create work items using custom environment
    for (int i = 0; i < 5; i++) {
        PTP_WORK work = CreateThreadpoolWork(
            SimpleWorkCallback,
            (PVOID)(INT_PTR)i,
            &env  // Use our custom environment
        );
        
        if (work) {
            SubmitThreadpoolWork(work);
        }
    }
    
    // Cleanup: wait for all and close
    CloseThreadpoolCleanupGroupMembers(cleanupGroup, FALSE, NULL);
    CloseThreadpoolCleanupGroup(cleanupGroup);
    DestroyThreadpoolEnvironment(&env);
    CloseThreadpool(pool);
}
 
/*
 * Wait Callbacks: Efficient Object Waiting
 * ------------------------------------------
 * Wait on kernel objects without blocking a thread
 */
 
VOID CALLBACK WaitCallback(
    PTP_CALLBACK_INSTANCE Instance,
    PVOID Context,
    PTP_WAIT Wait,
    TP_WAIT_RESULT WaitResult
) {
    const char *name = (const char *)Context;
    
    if (WaitResult == WAIT_OBJECT_0) {
        printf("Wait triggered for: %s\n", name);
    } else if (WaitResult == WAIT_TIMEOUT) {
        printf("Wait timed out for: %s\n", name);
    }
}
 
void UseWaitCallback(HANDLE someEvent) {
    PTP_WAIT wait = CreateThreadpoolWait(
        WaitCallback,
        (PVOID)"MyEvent",
        NULL
    );
    
    if (wait) {
        // Start waiting (NULL timeout = infinite)
        SetThreadpoolWait(wait, someEvent, NULL);
        
        // ... event gets signaled elsewhere ...
        
        // Cleanup
        WaitForThreadpoolWaitCallbacks(wait, FALSE);
        CloseThreadpoolWait(wait);
    }
}
 
/*
 * Timer Callbacks: Scheduled Execution
 * --------------------------------------
 */
 
VOID CALLBACK TimerCallback(
    PTP_CALLBACK_INSTANCE Instance,
    PVOID Context,
    PTP_TIMER Timer
) {
    printf("Timer fired at tick %lu\n", GetTickCount());
}
 
void UseTimerCallback(void) {
    PTP_TIMER timer = CreateThreadpoolTimer(
        TimerCallback,
        NULL,
        NULL
    );
    
    if (timer) {
        // Due in 1 second, repeat every 500ms
        FILETIME dueTime;
        ULARGE_INTEGER ulDueTime;
        ulDueTime.QuadPart = (ULONGLONG)-(1 * 10000000LL); // -1 second
        dueTime.dwHighDateTime = ulDueTime.HighPart;
        dueTime.dwLowDateTime = ulDueTime.LowPart;
        
        SetThreadpoolTimer(timer, &dueTime, 500, 0);
        
        // Let it run for 3 seconds
        Sleep(3000);
        
        // Stop and cleanup
        SetThreadpoolTimer(timer, NULL, 0, 0);  // Disable
        WaitForThreadpoolTimerCallbacks(timer, TRUE);  // Cancel pending
        CloseThreadpoolTimer(timer);
    }
}

Default vs Custom Pools

The default thread pool (pass NULL for environment) is shared process-wide and is appropriate for most applications. Create custom pools only when you need isolation (preventing one subsystem's long-running tasks from starving another) or specific thread count limits. Don't create many custom pools—that defeats the purpose of pooling.

Comparison with POSIX Threads

Understanding the philosophical and practical differences between Windows threads and POSIX threads is essential for cross-platform development and for appreciating the design trade-offs each system makes.

Windows Threads vs POSIX Threads
Aspect	Windows Threads	POSIX Threads
Thread Identity	Handle (object) + ID	pthread_t (opaque type)
Error Reporting	GetLastError() or HRESULT	Return value (0 = success)
Security	Full ACL support on handles	Minimal (thread credentials)
Cross-Process	Named objects, handle sharing	Requires shared memory
Wait Operations	Unified wait (any object)	pthread_join, pthread_cond_wait
Cancellation	TerminateThread (dangerous)	pthread_cancel (cooperative)
TLS	TlsAlloc or __declspec(thread)	pthread_key_create or __thread
Thread Pools	Rich built-in API	Not standardized (libraries)
Reader/Writer	SRW locks (Vista+)	pthread_rwlock
Philosophy	Heavy objects, rich features	Minimal primitives, composable

Key Differences in Practice

Handle Management: Windows requires explicit handle cleanup (CloseHandle), while pthread_t identifiers don't require cleanup. Failing to close handles leaks kernel resources.

Unified Waiting: Windows' WaitForMultipleObjects can wait on threads, mutexes, semaphores, events, processes, and more—all with one API. POSIX requires different wait functions for different object types.

Security Model: Windows threads are full kernel objects with security descriptors. You can grant or deny specific thread operations (suspend, terminate, query) to specific users. POSIX has no equivalent.

Cancellation Approach: POSIX provides cooperative cancellation with cancellation points and cleanup handlers. Windows' TerminateThread is a blunt instrument that can't safely release resources. The Windows approach is to use signaling (events) for cooperative termination.

Cross-Platform Development

For cross-platform code, consider using abstraction libraries like C++11 std::thread, Boost.Thread, or SDL threads. These provide a common interface over both Windows and POSIX threads. Even simple wrappers that map Windows handles to Pthreads-style interfaces can greatly simplify cross-platform threading.

Best Practices and Summary

Windows threading is a comprehensive system with many options. Following established best practices ensures robust, efficient applications.

Windows Threading Best Practices

•Use _beginthreadex for CRT safety — Never use CreateThread if you call any C Runtime functions. Memory leaks and crashes await otherwise.
•Always close handles — Every handle from CreateThread, CreateMutex, etc. must be closed with CloseHandle() to avoid resource leaks.
•Prefer thread pools — For task-based workloads, use the thread pool API instead of creating threads directly. It's more efficient and handles edge cases.
•Use SRW locks for reader/writer scenarios — They're faster than mutexes and critical sections when reads dominate.
•Use Critical Sections for simple mutual exclusion — They're much faster than kernel mutexes for process-local synchronization.
•Never use TerminateThread — It doesn't run destructors, doesn't release locks, and leaves state corrupted. Use events for signaling shutdown.
•Use InitializeCriticalSectionAndSpinCount — The spin count (e.g., 4000) improves performance on multiprocessor systems.
•Check GetLastError immediately — The last error is per-thread but can be overwritten by subsequent API calls.
•Be aware of pseudo-handles — GetCurrentThread/Process return pseudo-handles that can't be shared. Use DuplicateHandle for real handles.
•Consider NUMA on large systems — Use SetThreadAffinityMask and VirtualAllocExNuma for NUMA-aware allocation on multi-socket servers.

Summary

Windows provides a rich, object-oriented threading model built on kernel objects with security, waiting, and management capabilities that exceed POSIX in some dimensions. The cost is additional complexity and ceremony compared to the minimal Pthreads model.

Key takeaways:

Threads are kernel objects with handles and security descriptors
User-mode primitives (Critical Section, SRW Lock) are fast; kernel primitives are flexible
The thread pool API should be your default for task-based concurrency
Handle management is critical—leaks accumulate silently
Cross-platform code should use abstraction layers

Page Complete

You now have a thorough understanding of Windows threading—the architecture, APIs, synchronization primitives, thread pools, and how Windows differs from POSIX. Next, we'll explore Java threads to see how a high-level, platform-independent language approaches threading with managed execution and garbage collection.

2 / 5

Loading learning content...

Operating SystemsThread Concepts

Thread Libraries

LevelIntermediate

Duration75 mins

TopicThread Concepts

2 / 5

Windows Threads

Threading in the Windows Ecosystem

What You Will Master

Windows Thread Architecture

Threads as Kernel Objects

In Windows, a thread is a kernel object—an instance of the KTHREAD structure maintained by the kernel. This means:

Threads have handles (like file handles) that can be passed between processes
Threads have security descriptors controlling who can access them
Threads can be waited upon using the unified wait API (WaitForSingleObject, etc.)
Threads appear in system management tools and can be enumerated

This object-oriented approach to threads enables rich functionality but implies more overhead than the minimal Pthreads model.

Windows Thread Object Hierarchy
Structure	Location	Key Contents
ETHREAD	Executive (kernel)	Thread ID, process link, IRP list, security info, timing
KTHREAD	Kernel core	Scheduling state, quantum, priority, stack, wait blocks
TEB	User space (per-thread)	TLS array, stack limits, last error, exception info
CSR_THREAD	Client/Server Runtime	Console subsystem state, shutdown info

Thread Environment Block (TEB)

Every Windows thread has a Thread Environment Block (TEB) mapped into user-space memory. The TEB provides:

Thread Local Storage (TLS) array — Fixed-size array for fast TLS access
Stack boundaries — Base and limit addresses for stack overflow detection
Last error code — The value returned by GetLastError()
Exception handling — Head of the structured exception handler chain
Thread ID — Cached copy of the kernel thread ID

The TEB is directly accessible via the FS or GS segment register (x86/x64), enabling extremely fast access to thread-local data without function calls.

// Accessing TEB on x64 (GS segment)
// The GS segment base points to the TEB
// Offset 0x30 contains the pointer to TEB itself (self-reference)
// Offset 0x48 contains the thread ID

User Mode vs Kernel Mode Stacks

Creating Threads in Windows

Windows provides multiple APIs for creating threads, each with different capabilities and use cases. Understanding when to use each is crucial for correct Windows programming.

CreateThread: The Foundation

The CreateThread function is the core Win32 API for creating threads:

HANDLE CreateThread(
    LPSECURITY_ATTRIBUTES   lpThreadAttributes,  // Security descriptor
    SIZE_T                  dwStackSize,         // Stack size (0 = default)
    LPTHREAD_START_ROUTINE  lpStartAddress,      // Thread function
    LPVOID                  lpParameter,         // Argument to thread
    DWORD                   dwCreationFlags,     // Creation flags
    LPDWORD                 lpThreadId           // Output: thread ID
);

This returns a HANDLE to the thread object, which must be closed with CloseHandle() when no longer needed (even if the thread has exited).

windows_thread_creation.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
#include <windows.h>
#include <stdio.h>
 
/*
 * Basic Thread Creation Pattern
 * -----------------------------
 * The fundamental Windows thread creation idiom
 */
 
// Thread function signature (WINAPI = __stdcall calling convention)
DWORD WINAPI WorkerThread(LPVOID lpParam) {
    int threadNum = (int)(INT_PTR)lpParam;
    
    printf("Thread %d starting\n", threadNum);
    
    // Simulate work
    Sleep(1000);
    
    printf("Thread %d complete\n", threadNum);
    
    // Return value becomes thread exit code
    return threadNum * 10;
}
 
void CreateBasicThread(void) {
    HANDLE hThread;
    DWORD threadId;
    
    hThread = CreateThread(
        NULL,               // Default security
        0,                  // Default stack size
        WorkerThread,       // Thread function
        (LPVOID)1,          // Thread parameter
        0,                  // Run immediately
        &threadId           // Receive thread ID
    );
    
    if (hThread == NULL) {
        printf("CreateThread failed: %lu\n", GetLastError());
        return;
    }
    
    printf("Created thread with ID: %lu\n", threadId);
    
    // Wait for thread to complete
    WaitForSingleObject(hThread, INFINITE);
    
    // Get thread exit code
    DWORD exitCode;
    GetExitCodeThread(hThread, &exitCode);
    printf("Thread exit code: %lu\n", exitCode);
    
    // CRITICAL: Close the handle
    CloseHandle(hThread);
}
 
/*
 * Creating Multiple Threads with Proper Argument Passing
 * -------------------------------------------------------
 * Using heap-allocated structures for thread arguments
 */
 
typedef struct {
    int threadId;
    int startValue;
    int endValue;
    int *resultArray;
    CRITICAL_SECTION *pCS;  // For synchronization
} ThreadContext;
 
DWORD WINAPI ComputeWorker(LPVOID lpParam) {
    ThreadContext *ctx = (ThreadContext *)lpParam;
    
    printf("Worker %d: processing range [%d, %d)\n",
           ctx->threadId, ctx->startValue, ctx->endValue);
    
    for (int i = ctx->startValue; i < ctx->endValue; i++) {
        // Compute something
        int result = i * i;
        
        // Store result with synchronization
        EnterCriticalSection(ctx->pCS);
        ctx->resultArray[i] = result;
        LeaveCriticalSection(ctx->pCS);
    }
    
    // Free our context (we own it)
    free(ctx);
    
    return 0;
}
 
void CreateWorkerTeam(int numThreads, int totalWork) {
    HANDLE *threads = (HANDLE *)malloc(numThreads * sizeof(HANDLE));
    int *results = (int *)calloc(totalWork, sizeof(int));
    CRITICAL_SECTION cs;
    
    InitializeCriticalSection(&cs);
    
    int chunkSize = totalWork / numThreads;
    
    for (int i = 0; i < numThreads; i++) {
        // Allocate context on heap (thread will free)
        ThreadContext *ctx = (ThreadContext *)malloc(sizeof(ThreadContext));
        ctx->threadId = i;
        ctx->startValue = i * chunkSize;
        ctx->endValue = (i == numThreads - 1) ? totalWork : (i + 1) * chunkSize;
        ctx->resultArray = results;
        ctx->pCS = &cs;
        
        threads[i] = CreateThread(NULL, 0, ComputeWorker, ctx, 0, NULL);
        if (threads[i] == NULL) {
            printf("Failed to create thread %d\n", i);
        }
    }
    
    // Wait for all threads
    WaitForMultipleObjects(numThreads, threads, TRUE, INFINITE);
    
    // Cleanup
    for (int i = 0; i < numThreads; i++) {
        CloseHandle(threads[i]);
    }
    DeleteCriticalSection(&cs);
    free(threads);
    free(results);
}

CreateThread vs _beginthreadex

beginthreadex_pattern.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <windows.h>
#include <process.h>  // For _beginthreadex
#include <stdio.h>
 
/*
 * Using _beginthreadex for CRT-Safe Thread Creation
 * --------------------------------------------------
 * ALWAYS prefer this when using C Runtime Library functions
 */
 
// Note: Different signature than LPTHREAD_START_ROUTINE
unsigned __stdcall SafeWorker(void *arg) {
    int id = (int)(INT_PTR)arg;
    
    // Safe to use CRT functions like strtok, rand, etc.
    char buffer[100];
    sprintf(buffer, "Thread %d: using CRT safely\n", id);
    printf("%s", buffer);
    
    // strtok is safe - each thread has its own context
    char str[] = "hello,world,test";
    char *token = strtok(str, ",");
    while (token) {
        printf("Thread %d token: %s\n", id, token);
        token = strtok(NULL, ",");
    }
    
    // _endthreadex for proper cleanup (called automatically on return)
    return 0;
}
 
HANDLE CreateSafeThread(int id) {
    // Cast to HANDLE since _beginthreadex returns uintptr_t
    HANDLE hThread = (HANDLE)_beginthreadex(
        NULL,           // Security
        0,              // Stack size
        SafeWorker,     // Thread function
        (void *)(INT_PTR)id,  // Argument
        0,              // Creation flags
        NULL            // Thread ID (optional)
    );
    
    if (hThread == 0) {
        printf("_beginthreadex failed: %d\n", errno);
        return NULL;
    }
    
    return hThread;
}

Thread Handles and IDs

Windows distinguishes between thread handles and thread IDs—a distinction that causes confusion for developers from Unix backgrounds but provides important capabilities.

Thread IDs

A Thread ID (TID) is a system-wide unique identifier for a thread:

32-bit unsigned integer
Unique across all processes while the thread exists
Assigned by the kernel at creation
Can be reused after thread termination
Useful for inter-process thread identification

Thread Handles

A Thread Handle is a process-local reference to a thread object:

Opaque HANDLE type (pointer-sized)
Grants specific access rights (THREAD_TERMINATE, THREAD_SUSPEND_RESUME, etc.)
Must be closed with CloseHandle()
Multiple handles can reference the same thread
Subject to security checks

Thread ID Operations

•GetCurrentThreadId() — Get calling thread's ID
•GetThreadId(handle) — Get ID from handle
•Pass to other processes for identification
•Use in logging and debugging
•Key for thread-specific data structures

Thread Handle Operations

•GetCurrentThread() — Pseudo-handle to self
•OpenThread(access, id) — Get handle from ID
•DuplicateHandle() — Clone handle for sharing
•WaitForSingleObject() — Wait for termination
•TerminateThread() — Force termination (dangerous!)

handles_and_ids.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
#include <windows.h>
#include <stdio.h>
 
/*
 * Demonstrating Handle vs ID Distinctions
 * ----------------------------------------
 */
 
DWORD WINAPI DemoThread(LPVOID lpParam) {
    DWORD myId = GetCurrentThreadId();
    HANDLE myPseudoHandle = GetCurrentThread();
    
    printf("Inside thread:\n");
    printf("  Thread ID: %lu\n", myId);
    printf("  Pseudo handle: %p\n", myPseudoHandle);
    
    /*
     * IMPORTANT: GetCurrentThread() returns a PSEUDO-HANDLE
     * Value is always -2 (0xFFFFFFFE on 32-bit)
     * It's a special value that the kernel interprets as "current thread"
     * 
     * Pseudo-handles:
     * - Cannot be passed to other threads/processes
     * - Do not need to be closed
     * - Always valid within current thread context
     */
    
    // To get a real handle to current thread:
    HANDLE realHandle;
    BOOL success = DuplicateHandle(
        GetCurrentProcess(),    // Source process
        GetCurrentThread(),     // Source handle (pseudo)
        GetCurrentProcess(),    // Target process
        &realHandle,            // Output: real handle
        0,                      // Access (0 = same as source)
        FALSE,                  // Inheritable
        DUPLICATE_SAME_ACCESS   // Options
    );
    
    if (success) {
        printf("  Real handle: %p\n", realHandle);
        // Must close real handles
        CloseHandle(realHandle);
    }
    
    Sleep(5000);  // Keep thread alive for demo
    return 0;
}
 
void DemonstrateHandleSharing(void) {
    HANDLE hThread;
    DWORD threadId;
    
    hThread = CreateThread(NULL, 0, DemoThread, NULL, 0, &threadId);
    
    printf("Main thread:\n");
    printf("  Created thread ID: %lu\n", threadId);
    printf("  Handle value: %p\n", hThread);
    
    // We can query thread info using the handle
    DWORD exitCode;
    GetExitCodeThread(hThread, &exitCode);
    printf("  Exit code: %lu (%s)\n", exitCode,
           exitCode == STILL_ACTIVE ? "STILL_ACTIVE" : "terminated");
    
    // Get the ID back from the handle
    DWORD retrievedId = GetThreadId(hThread);
    printf("  Retrieved ID from handle: %lu\n", retrievedId);
    
    // We can also open another handle from the ID
    HANDLE hThread2 = OpenThread(
        THREAD_QUERY_INFORMATION,  // Desired access
        FALSE,                      // Inherit handle
        threadId                    // Thread ID
    );
    
    if (hThread2) {
        printf("  Second handle: %p\n", hThread2);
        CloseHandle(hThread2);
    }
    
    WaitForSingleObject(hThread, INFINITE);
    CloseHandle(hThread);
}

Handle Pseudo-Handles

Thread Synchronization Primitives

User-Mode vs Kernel-Mode Primitives

User-mode primitives (Critical Sections, SRW Locks) operate entirely in user space when uncontended, never entering the kernel. They're extremely fast but cannot be shared across processes.

Kernel-mode primitives (Mutexes, Semaphores, Events) are kernel objects that can be named and shared across processes but require kernel transitions even in the uncontended case.

Windows Synchronization Primitives Comparison
Primitive	Mode	Cross-Process	Performance	Use Case
Critical Section	User + Kernel fallback	No	Fastest	General mutual exclusion
SRW Lock	User + Kernel fallback	No	Very fast	Reader-writer scenarios
Mutex	Kernel	Yes (named)	Slow	Cross-process synchronization
Semaphore	Kernel	Yes (named)	Slow	Counting/resource pools
Event	Kernel	Yes (named)	Slow	Signaling/notification
Condition Variable	User + Kernel fallback	No	Fast	Wait for condition

windows_synchronization.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
#include <windows.h>
#include <stdio.h>
 
/*
 * Critical Section: The Workhorse of Windows Synchronization
 * -----------------------------------------------------------
 * Fast, lightweight, but process-local only
 */
 
CRITICAL_SECTION g_cs;
int g_sharedData = 0;
 
void UseCriticalSection(void) {
    // Initialize (can also use InitializeCriticalSectionAndSpinCount)
    InitializeCriticalSection(&g_cs);
    
    // In worker threads:
    EnterCriticalSection(&g_cs);
    g_sharedData++;
    LeaveCriticalSection(&g_cs);
    
    // TryEnterCriticalSection for non-blocking attempts
    if (TryEnterCriticalSection(&g_cs)) {
        // Got the lock
        g_sharedData++;
        LeaveCriticalSection(&g_cs);
    } else {
        // Lock held by another thread
    }
    
    // Cleanup
    DeleteCriticalSection(&g_cs);
}
 
/*
 * SRW Lock: Modern Slim Reader/Writer Lock
 * -----------------------------------------
 * Introduced in Vista. Extremely efficient.
 */
 
SRWLOCK g_srwLock = SRWLOCK_INIT;  // Static initialization!
int g_data = 0;
 
DWORD WINAPI Reader(LPVOID lpParam) {
    int id = (int)(INT_PTR)lpParam;
    
    AcquireSRWLockShared(&g_srwLock);  // Multiple readers OK
    printf("Reader %d: value = %d\n", id, g_data);
    ReleaseSRWLockShared(&g_srwLock);
    
    return 0;
}
 
DWORD WINAPI Writer(LPVOID lpParam) {
    int id = (int)(INT_PTR)lpParam;
    
    AcquireSRWLockExclusive(&g_srwLock);  // Exclusive access
    g_data++;
    printf("Writer %d: set value = %d\n", id, g_data);
    ReleaseSRWLockExclusive(&g_srwLock);
    
    return 0;
}
 
/*
 * Condition Variable: Wait for Conditions
 * ----------------------------------------
 * Works with Critical Sections or SRW Locks
 */
 
CRITICAL_SECTION g_queueCS;
CONDITION_VARIABLE g_queueCV;
int g_queue[100];
int g_queueCount = 0;
 
DWORD WINAPI Producer(LPVOID lpParam) {
    for (int i = 0; i < 10; i++) {
        EnterCriticalSection(&g_queueCS);
        
        g_queue[g_queueCount++] = i;
        printf("Produced: %d\n", i);
        
        // Wake one waiting consumer
        WakeConditionVariable(&g_queueCV);
        
        LeaveCriticalSection(&g_queueCS);
        Sleep(100);
    }
    return 0;
}
 
DWORD WINAPI Consumer(LPVOID lpParam) {
    for (int i = 0; i < 10; i++) {
        EnterCriticalSection(&g_queueCS);
        
        // Wait while queue is empty
        while (g_queueCount == 0) {
            // Atomically releases CS and waits
            SleepConditionVariableCS(&g_queueCV, &g_queueCS, INFINITE);
            // CS is reacquired when we wake
        }
        
        int value = g_queue[--g_queueCount];
        printf("Consumed: %d\n", value);
        
        LeaveCriticalSection(&g_queueCS);
    }
    return 0;
}
 
/*
 * Kernel Objects: For Cross-Process Synchronization
 * ---------------------------------------------------
 */
 
void UseKernelMutex(void) {
    // Create named mutex (can be opened by other processes)
    HANDLE hMutex = CreateMutex(
        NULL,                   // Security
        FALSE,                  // Initial owner
        TEXT("Global\\MyMutex")  // Name (Global\ for session 0)
    );
    
    if (hMutex == NULL) {
        printf("CreateMutex failed: %lu\n", GetLastError());
        return;
    }
    
    // Wait to acquire
    DWORD result = WaitForSingleObject(hMutex, INFINITE);
    if (result == WAIT_OBJECT_0) {
        printf("Acquired mutex\n");
        
        // Critical section...
        
        ReleaseMutex(hMutex);
    } else if (result == WAIT_ABANDONED) {
        // Previous owner terminated without releasing
        printf("Mutex was abandoned\n");
        ReleaseMutex(hMutex);
    }
    
    CloseHandle(hMutex);
}

InitializeCriticalSectionAndSpinCount

Thread Local Storage (TLS)

Windows Thread Local Storage provides per-thread data that persists across function calls. Windows offers two mechanisms: Dynamic TLS (the API) and Static TLS (compiler-supported).

Dynamic TLS

Dynamic TLS uses the TlsAlloc, TlsSetValue, TlsGetValue, and TlsFree functions. Each process has a limited number of TLS slots (minimum 64, typically 1088+ in modern Windows).

Static TLS

windows_tls.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
#include <windows.h>
#include <stdio.h>
 
/*
 * Dynamic TLS Example
 * --------------------
 * Runtime allocation of thread-local slots
 */
 
// Global TLS index
DWORD g_tlsIndex = TLS_OUT_OF_INDEXES;
 
typedef struct {
    DWORD threadId;
    char name[64];
    int requestCount;
} ThreadData;
 
BOOL InitializeThreadData(const char *name) {
    ThreadData *data = (ThreadData *)LocalAlloc(LPTR, sizeof(ThreadData));
    if (!data) return FALSE;
    
    data->threadId = GetCurrentThreadId();
    strcpy_s(data->name, sizeof(data->name), name);
    data->requestCount = 0;
    
    return TlsSetValue(g_tlsIndex, data);
}
 
ThreadData *GetThreadData(void) {
    return (ThreadData *)TlsGetValue(g_tlsIndex);
}
 
void CleanupThreadData(void) {
    ThreadData *data = GetThreadData();
    if (data) {
        LocalFree(data);
        TlsSetValue(g_tlsIndex, NULL);
    }
}
 
DWORD WINAPI WorkerWithTLS(LPVOID lpParam) {
    int id = (int)(INT_PTR)lpParam;
    char name[64];
    sprintf_s(name, sizeof(name), "Worker-%d", id);
    
    // Initialize TLS for this thread
    if (!InitializeThreadData(name)) {
        printf("Failed to init TLS\n");
        return 1;
    }
    
    // Use TLS data throughout the thread
    for (int i = 0; i < 10; i++) {
        ThreadData *data = GetThreadData();
        data->requestCount++;
        printf("[%s] Request %d\n", data->name, data->requestCount);
        Sleep(100);
    }
    
    // Cleanup
    CleanupThreadData();
    return 0;
}
 
int MainWithDynamicTLS(void) {
    // Allocate TLS index at program start
    g_tlsIndex = TlsAlloc();
    if (g_tlsIndex == TLS_OUT_OF_INDEXES) {
        printf("TlsAlloc failed\n");
        return 1;
    }
    
    // Create threads
    HANDLE threads[4];
    for (int i = 0; i < 4; i++) {
        threads[i] = CreateThread(NULL, 0, WorkerWithTLS, 
                                  (LPVOID)(INT_PTR)i, 0, NULL);
    }
    
    WaitForMultipleObjects(4, threads, TRUE, INFINITE);
    
    for (int i = 0; i < 4; i++) {
        CloseHandle(threads[i]);
    }
    
    // Free TLS index
    TlsFree(g_tlsIndex);
    
    return 0;
}
 
/*
 * Static TLS with __declspec(thread)
 * -----------------------------------
 * Simpler but compiler-dependent
 */
 
// Each thread gets its own copy of these variables
__declspec(thread) int t_requestId = 0;
__declspec(thread) char t_lastError[256] = "";
 
DWORD WINAPI WorkerWithStaticTLS(LPVOID lpParam) {
    int id = (int)(INT_PTR)lpParam;
    
    // Each thread sees its own t_requestId
    for (int i = 0; i < 5; i++) {
        t_requestId++;
        sprintf_s(t_lastError, sizeof(t_lastError), 
                  "Thread %d, request %d", id, t_requestId);
        printf("%s\n", t_lastError);
        Sleep(50);
    }
    
    return 0;
}

DLL and Static TLS

Thread Pool API

Why Use Thread Pools?

Reduced overhead — Threads are reused instead of created/destroyed per task
Automatic scaling — Pool grows/shrinks based on workload and CPU count
Work cancellation — Built-in support for canceling pending work
Wait operations — Efficiently wait on kernel objects and trigger callbacks
Timer callbacks — Schedule work to run at future times or periodically

The Windows thread pool uses an I/O completion port internally for maximum efficiency.

thread_pool_api.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
#include <windows.h>
#include <stdio.h>
 
/*
 * Simple Work Item Submission
 * ----------------------------
 * The easiest way to use the thread pool
 */
 
VOID CALLBACK SimpleWorkCallback(
    PTP_CALLBACK_INSTANCE Instance,
    PVOID Context,
    PTP_WORK Work
) {
    int taskId = (int)(INT_PTR)Context;
    printf("Task %d executing on thread %lu\n", 
           taskId, GetCurrentThreadId());
    
    // Simulate work
    Sleep(100);
    
    printf("Task %d complete\n", taskId);
}
 
void SubmitSimpleWork(void) {
    // Create work items
    PTP_WORK workItems[10];
    
    for (int i = 0; i < 10; i++) {
        workItems[i] = CreateThreadpoolWork(
            SimpleWorkCallback,
            (PVOID)(INT_PTR)i,  // Context
            NULL                // Environment (NULL = default pool)
        );
        
        if (workItems[i] == NULL) {
            printf("CreateThreadpoolWork failed\n");
            continue;
        }
        
        // Submit to thread pool
        SubmitThreadpoolWork(workItems[i]);
    }
    
    // Wait for all work to complete
    for (int i = 0; i < 10; i++) {
        if (workItems[i]) {
            WaitForThreadpoolWorkCallbacks(workItems[i], FALSE);
            CloseThreadpoolWork(workItems[i]);
        }
    }
}
 
/*
 * Callback Environment for Custom Pool Behavior
 * -----------------------------------------------
 * Control pool size, cleanup group, etc.
 */
 
void UseCustomEnvironment(void) {
    // Create custom thread pool
    PTP_POOL pool = CreateThreadpool(NULL);
    if (!pool) {
        printf("CreateThreadpool failed\n");
        return;
    }
    
    // Set thread counts
    SetThreadpoolThreadMinimum(pool, 2);
    SetThreadpoolThreadMaximum(pool, 8);
    
    // Create cleanup group (for automatic cleanup)
    PTP_CLEANUP_GROUP cleanupGroup = CreateThreadpoolCleanupGroup();
    
    // Initialize callback environment
    TP_CALLBACK_ENVIRON env;
    InitializeThreadpoolEnvironment(&env);
    SetThreadpoolCallbackPool(&env, pool);
    SetThreadpoolCallbackCleanupGroup(&env, cleanupGroup, NULL);
    
    // Create work items using custom environment
    for (int i = 0; i < 5; i++) {
        PTP_WORK work = CreateThreadpoolWork(
            SimpleWorkCallback,
            (PVOID)(INT_PTR)i,
            &env  // Use our custom environment
        );
        
        if (work) {
            SubmitThreadpoolWork(work);
        }
    }
    
    // Cleanup: wait for all and close
    CloseThreadpoolCleanupGroupMembers(cleanupGroup, FALSE, NULL);
    CloseThreadpoolCleanupGroup(cleanupGroup);
    DestroyThreadpoolEnvironment(&env);
    CloseThreadpool(pool);
}
 
/*
 * Wait Callbacks: Efficient Object Waiting
 * ------------------------------------------
 * Wait on kernel objects without blocking a thread
 */
 
VOID CALLBACK WaitCallback(
    PTP_CALLBACK_INSTANCE Instance,
    PVOID Context,
    PTP_WAIT Wait,
    TP_WAIT_RESULT WaitResult
) {
    const char *name = (const char *)Context;
    
    if (WaitResult == WAIT_OBJECT_0) {
        printf("Wait triggered for: %s\n", name);
    } else if (WaitResult == WAIT_TIMEOUT) {
        printf("Wait timed out for: %s\n", name);
    }
}
 
void UseWaitCallback(HANDLE someEvent) {
    PTP_WAIT wait = CreateThreadpoolWait(
        WaitCallback,
        (PVOID)"MyEvent",
        NULL
    );
    
    if (wait) {
        // Start waiting (NULL timeout = infinite)
        SetThreadpoolWait(wait, someEvent, NULL);
        
        // ... event gets signaled elsewhere ...
        
        // Cleanup
        WaitForThreadpoolWaitCallbacks(wait, FALSE);
        CloseThreadpoolWait(wait);
    }
}
 
/*
 * Timer Callbacks: Scheduled Execution
 * --------------------------------------
 */
 
VOID CALLBACK TimerCallback(
    PTP_CALLBACK_INSTANCE Instance,
    PVOID Context,
    PTP_TIMER Timer
) {
    printf("Timer fired at tick %lu\n", GetTickCount());
}
 
void UseTimerCallback(void) {
    PTP_TIMER timer = CreateThreadpoolTimer(
        TimerCallback,
        NULL,
        NULL
    );
    
    if (timer) {
        // Due in 1 second, repeat every 500ms
        FILETIME dueTime;
        ULARGE_INTEGER ulDueTime;
        ulDueTime.QuadPart = (ULONGLONG)-(1 * 10000000LL); // -1 second
        dueTime.dwHighDateTime = ulDueTime.HighPart;
        dueTime.dwLowDateTime = ulDueTime.LowPart;
        
        SetThreadpoolTimer(timer, &dueTime, 500, 0);
        
        // Let it run for 3 seconds
        Sleep(3000);
        
        // Stop and cleanup
        SetThreadpoolTimer(timer, NULL, 0, 0);  // Disable
        WaitForThreadpoolTimerCallbacks(timer, TRUE);  // Cancel pending
        CloseThreadpoolTimer(timer);
    }
}

Default vs Custom Pools

Comparison with POSIX Threads

Windows Threads vs POSIX Threads
Aspect	Windows Threads	POSIX Threads
Thread Identity	Handle (object) + ID	pthread_t (opaque type)
Error Reporting	GetLastError() or HRESULT	Return value (0 = success)
Security	Full ACL support on handles	Minimal (thread credentials)
Cross-Process	Named objects, handle sharing	Requires shared memory
Wait Operations	Unified wait (any object)	pthread_join, pthread_cond_wait
Cancellation	TerminateThread (dangerous)	pthread_cancel (cooperative)
TLS	TlsAlloc or __declspec(thread)	pthread_key_create or __thread
Thread Pools	Rich built-in API	Not standardized (libraries)
Reader/Writer	SRW locks (Vista+)	pthread_rwlock
Philosophy	Heavy objects, rich features	Minimal primitives, composable

Key Differences in Practice

Handle Management: Windows requires explicit handle cleanup (CloseHandle), while pthread_t identifiers don't require cleanup. Failing to close handles leaks kernel resources.

Cross-Platform Development

Best Practices and Summary

Windows threading is a comprehensive system with many options. Following established best practices ensures robust, efficient applications.

Windows Threading Best Practices

•Use _beginthreadex for CRT safety — Never use CreateThread if you call any C Runtime functions. Memory leaks and crashes await otherwise.
•Always close handles — Every handle from CreateThread, CreateMutex, etc. must be closed with CloseHandle() to avoid resource leaks.
•Prefer thread pools — For task-based workloads, use the thread pool API instead of creating threads directly. It's more efficient and handles edge cases.
•Use SRW locks for reader/writer scenarios — They're faster than mutexes and critical sections when reads dominate.
•Use Critical Sections for simple mutual exclusion — They're much faster than kernel mutexes for process-local synchronization.
•Never use TerminateThread — It doesn't run destructors, doesn't release locks, and leaves state corrupted. Use events for signaling shutdown.
•Use InitializeCriticalSectionAndSpinCount — The spin count (e.g., 4000) improves performance on multiprocessor systems.
•Check GetLastError immediately — The last error is per-thread but can be overwritten by subsequent API calls.
•Be aware of pseudo-handles — GetCurrentThread/Process return pseudo-handles that can't be shared. Use DuplicateHandle for real handles.
•Consider NUMA on large systems — Use SetThreadAffinityMask and VirtualAllocExNuma for NUMA-aware allocation on multi-socket servers.

Summary

Key takeaways:

Threads are kernel objects with handles and security descriptors
User-mode primitives (Critical Section, SRW Lock) are fast; kernel primitives are flexible
The thread pool API should be your default for task-based concurrency
Handle management is critical—leaks accumulate silently
Cross-platform code should use abstraction layers

Page Complete

2 / 5