Loading learning content...
In the previous page, we explored processes as independent execution environments. We saw how the operating system provides each process with its own isolated address space, file descriptors, and security context. This isolation is powerful—but it comes at a cost.
The Process Problem:
Consider building a web server that handles 10,000 concurrent connections. With processes:
For many concurrent workloads, this overhead is simply too high. We need a lighter-weight abstraction that enables concurrency within a process, sharing resources while maintaining independent execution flows.
By the end of this page, you will understand threads as the fundamental unit of CPU scheduling—how they share process resources, why they're more efficient for certain workloads, and what new challenges they introduce. This knowledge is essential for designing concurrent systems that are both performant and correct.
A thread is an independent sequence of execution within a process. While a process defines the environment for execution (address space, resources, security context), a thread defines a path of execution through that environment.
The Core Insight:
Threads are sometimes called "lightweight processes," but this undersells the key architectural difference:
Multiple threads within the same process share the process's resources but execute independently. Each thread has its own:
But threads share:
┌─────────────────────────────── PROCESS ───────────────────────────────┐│ ││ ┌─────────────────── SHARED RESOURCES ───────────────────┐ ││ │ │ ││ │ Code Segment (TEXT) │ Data Segment │ ││ │ ───────────────────── │ ──────────── │ ││ │ Executable instructions │ Global variables │ ││ │ (read-only, shared) │ Static variables │ ││ │ │ │ ││ ├────────────────────────────┴───────────────────────────┤ ││ │ │ ││ │ HEAP │ ││ │ (Dynamically allocated memory) │ ││ │ Shared among all threads - CAUTION! │ ││ │ │ ││ ├─────────────────────────────────────────────────────────┤ ││ │ │ ││ │ Open Files │ Sockets │ Signal Handlers │ ││ │ │ │ │ ││ └─────────────────────────────────────────────────────────┘ ││ ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ THREAD 1 │ │ THREAD 2 │ │ THREAD 3 │ ││ │ │ │ │ │ │ ││ │ ┌──────────┐ │ │ ┌──────────┐ │ │ ┌──────────┐ │ ││ │ │ Stack │ │ │ │ Stack │ │ │ │ Stack │ │ ││ │ │ (Private)│ │ │ │ (Private)│ │ │ │ (Private)│ │ ││ │ └──────────┘ │ │ └──────────┘ │ │ └──────────┘ │ ││ │ │ │ │ │ │ ││ │ Registers │ │ Registers │ │ Registers │ ││ │ Program Ctr │ │ Program Ctr │ │ Program Ctr │ ││ │ Thread State │ │ Thread State │ │ Thread State │ ││ │ │ │ │ │ │ ││ └──────────────┘ └──────────────┘ └──────────────┘ ││ │ │ │ ││ └──────────────────┴──────────────────┘ ││ All threads execute concurrently ││ in the same address space ││ │└─────────────────────────────────────────────────────────────────────────┘Think of a process as a highway system (the infrastructure), and threads as individual cars traveling on those highways. All cars share the same roads, bridges, and tunnels (shared resources), but each car has its own driver, dashboard instruments, and current position (private thread state). Cars can travel independently, but they must follow traffic rules to avoid crashes (synchronization).
Just as the Process Control Block (PCB) tracks process-level information, the operating system maintains a Thread Control Block (TCB) for each thread. The TCB is significantly smaller than the PCB because threads share most process-level resources.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
// Conceptual representation of a Thread Control Blockinterface ThreadControlBlock { // === THREAD IDENTIFICATION === threadId: number; // Unique thread ID within process processId: number; // Parent process ID (shared) // === THREAD STATE === state: ThreadState; // Current execution state priority: number; // Thread scheduling priority // === CPU CONTEXT (saved during context switches) === programCounter: number; // Address of next instruction stackPointer: number; // Top of thread's stack basePointer: number; // Base of current stack frame generalRegisters: RegisterSet; // CPU register values floatingPointRegisters: FPRegisterSet; // FPU state statusFlags: CPUFlags; // Condition codes // === STACK INFORMATION === stackBase: number; // Bottom of stack allocation stackSize: number; // Stack size limit stackGuardPage: number; // Guard page for overflow detection // === THREAD-SPECIFIC DATA === threadLocalStorage: Map<string, any>; // TLS variables errorNumber: number; // Thread-local errno // === SYNCHRONIZATION STATE === waitingOn: SyncPrimitive | null; // What is thread blocked on? ownedMutexes: Mutex[]; // Mutexes held by this thread // === SIGNAL HANDLING === signalMask: SignalMask; // Thread's signal mask pendingSignals: Signal[]; // Signals for this thread // === SCHEDULING METADATA === cpuAffinity: number[]; // Preferred CPU cores lastRunTime: number; // When thread last ran totalCpuTime: number; // Accumulated CPU time} enum ThreadState { RUNNABLE = 'RUNNABLE', // Ready to run RUNNING = 'RUNNING', // Currently executing BLOCKED = 'BLOCKED', // Waiting on synchronization WAITING = 'WAITING', // Waiting on I/O or timer TERMINATED = 'TERMINATED' // Thread has exited}| Information | In PCB (Process) | In TCB (Thread) |
|---|---|---|
| Process ID | ✓ | Reference to parent |
| Thread ID | — | ✓ |
| Address space (page table) | ✓ | Shared from PCB |
| Open file descriptors | ✓ | Shared from PCB |
| CPU registers | — | ✓ (per thread) |
| Program counter | — | ✓ (per thread) |
| Stack pointer/frame | — | ✓ (per thread) |
| User/Group ID | ✓ | Shared from PCB |
| Signal handlers | ✓ | Shared |
| Signal mask | — | ✓ (per thread) |
| Thread-local storage | — | ✓ (per thread) |
| Scheduling priority | Process-wide default | ✓ (per thread) |
A PCB might consume 1-4 KB of kernel memory. A TCB typically needs only 200-500 bytes because it references the parent process's resources rather than duplicating them. This is why systems can support many more threads than processes—the overhead per unit of concurrency is dramatically lower.
The choice between threads and processes is one of the most fundamental architectural decisions in concurrent system design. Each approach has distinct advantages and costs. Understanding these trade-offs enables informed decisions rather than default choices.
| Metric | Threads | Processes | Difference |
|---|---|---|---|
| Creation time | ~10-50 μs | ~1-10 ms | 100-1000x faster |
| Context switch | ~1-5 μs | ~10-50 μs | 5-50x faster |
| Memory overhead | ~8 KB - 1 MB | ~4-10 MB+ | 10-100x smaller |
| Communication latency | ~10-100 ns | ~1-10 μs | 10-1000x faster |
| Isolation | None | Full | Qualitative |
| Failure blast radius | Whole process | Single process | Contained |
Use threads when: you need low-latency communication, share large amounts of data, or require many concurrent activities with moderate isolation needs. Use processes when: you need failure isolation, run untrusted code, require different security contexts, or are aggregating independent services. Many modern systems use both—processes for service boundaries, threads within each service for parallelism.
Not all threads are created equal. There are fundamental differences in how threads can be implemented, and these differences have profound implications for performance, capabilities, and portability.
Kernel threads (also called native threads or OS threads) are managed directly by the operating system kernel. Each user-level thread maps to exactly one kernel scheduling entity.
Characteristics:
Used by: Linux (NPTL), Windows, macOS, most modern operating systems
┌────────────────────────── User Space ──────────────────────────┐
│ │
│ Thread A Thread B Thread C Thread D │
│ │ │ │ │ │
└──────┼───────────────┼───────────────┼───────────────┼─────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────────────── Kernel Space ──────────────────────────┐
│ │
│ Kernel Kernel Kernel Kernel │
│ Thread A Thread B Thread C Thread D │
│ │ │ │ │ │
└──────┼───────────────┼───────────────┼───────────────┼─────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌───────────────────────── Hardware ─────────────────────────────┐
│ │
│ CPU Core 0 CPU Core 1 │
│ (runs A or B) (runs C or D) │
│ │
└─────────────────────────────────────────────────────────────────┘
Modern languages increasingly adopt M:N models. Go's goroutines, Kotlin's coroutines, and Rust's async/await all provide millions of lightweight concurrent units multiplexed onto a smaller pool of OS threads. This gives the scalability of user-level threads with the parallelism of kernel threads.
Understanding exactly what threads share—and don't share—is crucial for writing correct concurrent code. This memory model determines where data races can occur and where they cannot.
| Memory Region | Shared? | Implications |
|---|---|---|
| Code (Text) Segment | ✓ Shared | All threads execute the same instructions. Read-only, so no races. |
| Global Variables | ✓ Shared | ⚠️ Can cause data races! Must synchronize access. |
| Static Variables | ✓ Shared | ⚠️ Same as globals—potential race condition source. |
| Heap Memory | ✓ Shared | ⚠️ Objects allocated with new/malloc are accessible to all threads. |
| Stack (each thread) | ✗ Private | Local variables are safe from other threads by default. |
| Registers/PC | ✗ Private | Each thread has its own CPU context—no sharing. |
| Thread-Local Storage | ✗ Private | Explicitly thread-private variables. No synchronization needed. |
| Open Files | ✓ Shared | File descriptors are shared, but file position may vary by option. |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
/** * Demonstrating what threads share and don't share */ // SHARED: Module-level (global) variables - all threads see the same valuelet sharedCounter = 0; // ⚠️ DANGER: Race condition if multiple threads modify // SHARED: Objects allocated in heap memoryinterface SharedState { items: string[]; metrics: { requestCount: number; errorCount: number; };} const sharedState: SharedState = { items: [], // ⚠️ Needs synchronization for safe concurrent access metrics: { requestCount: 0, // ⚠️ Race condition without atomic operations errorCount: 0 }}; // PRIVATE: Function parameters and local variables live on thread's stackfunction processRequest(requestId: string): void { // These are PRIVATE to each thread's call const startTime = Date.now(); // Local variable - stack allocated const localBuffer: number[] = []; // Local array reference on stack // (but if passed elsewhere, could be shared) // This modifies SHARED state - needs synchronization! sharedState.metrics.requestCount++; // ⚠️ Not thread-safe! // Local processing - completely thread-safe for (let i = 0; i < 100; i++) { localBuffer.push(i * requestId.length); } const duration = Date.now() - startTime; // Stack - private console.log(`Request ${requestId} took ${duration}ms`);} // THREAD-LOCAL: Each thread gets its own instance// In Node.js, AsyncLocalStorage provides thread-local-like semanticsimport { AsyncLocalStorage } from 'async_hooks'; const requestContext = new AsyncLocalStorage<{ traceId: string; userId: string;}>(); // Each async context has its own isolated "thread-local" dataasync function handleRequest(traceId: string, userId: string) { return requestContext.run({ traceId, userId }, async () => { // Get current context - private to this async context const ctx = requestContext.getStore()!; console.log(`[Trace: ${ctx.traceId}] Processing for user ${ctx.userId}`); // Nested calls see the same context await performDatabaseQuery(); });} async function performDatabaseQuery() { const ctx = requestContext.getStore()!; // Access trace context without passing it explicitly console.log(`[Trace: ${ctx.traceId}] Executing query`);}While stack-allocated local variables are private to a thread, if you take a pointer/reference to a stack variable and pass it to another thread, that data becomes shared. This is a common source of subtle bugs: the variable's lifetime is tied to the function's stack frame, but another thread may try to access it after the function returns.
Multithreading is not merely about performance—it enables entire categories of application behaviors that would be impossible or impractical with single-threaded execution.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
import { Worker, isMainThread, parentPort, workerData } from 'worker_threads';import { cpus } from 'os'; /** * Example: Image processing with worker threads * Demonstrates parallelism benefits on multi-core systems */ interface ImageTask { imageId: string; imageData: Buffer;} interface ProcessResult { imageId: string; processedData: Buffer; processingTimeMs: number;} // CPU-intensive image processing functionfunction processImage(imageData: Buffer): Buffer { // Simulate complex image processing (resize, filter, compress) // In reality: sharp, jimp, or native bindings let result = Buffer.alloc(imageData.length); for (let i = 0; i < imageData.length; i++) { result[i] = Math.floor(imageData[i] * 0.8); // Simplified transform } return result;} if (isMainThread) { // MAIN THREAD: Orchestrates parallel processing async function processImagesBatch(images: ImageTask[]): Promise<ProcessResult[]> { const numWorkers = cpus().length; const results: ProcessResult[] = []; const chunkSize = Math.ceil(images.length / numWorkers); console.log(`Processing ${images.length} images with ${numWorkers} workers`); const startTime = Date.now(); const workerPromises = []; for (let i = 0; i < numWorkers; i++) { const chunk = images.slice(i * chunkSize, (i + 1) * chunkSize); if (chunk.length === 0) continue; const promise = new Promise<ProcessResult[]>((resolve, reject) => { const worker = new Worker(__filename, { workerData: chunk }); worker.on('message', resolve); worker.on('error', reject); }); workerPromises.push(promise); } const workerResults = await Promise.all(workerPromises); const allResults = workerResults.flat(); const totalTime = Date.now() - startTime; console.log(`Processed ${allResults.length} images in ${totalTime}ms`); console.log(`Average: ${(totalTime / allResults.length).toFixed(2)}ms per image`); console.log(`Speedup vs single-threaded: ~${numWorkers}x`); return allResults; } } else { // WORKER THREAD: Processes assigned images const tasks: ImageTask[] = workerData; const results: ProcessResult[] = []; for (const task of tasks) { const start = Date.now(); const processedData = processImage(task.imageData); const processingTimeMs = Date.now() - start; results.push({ imageId: task.imageId, processedData, processingTimeMs }); } // Send results back to main thread parentPort?.postMessage(results);}Parallelism has limits. Amdahl's Law states that speedup is constrained by the sequential portion of your program. If 20% of work must be sequential, maximum speedup is 5x regardless of thread count. Identify the serial bottlenecks before throwing threads at a problem—sometimes the better investment is reducing the sequential fraction.
The benefits of threads come with significant engineering challenges. These problems are not just inconvenient—they're fundamentally difficult because they stem from non-determinism and the limits of human reasoning about concurrent execution.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
/** * Classic race condition: Lost Update Problem * * Two threads incrementing a counter "simultaneously" * Expected: counter = 2000 after 1000 increments each * Actual: counter < 2000 due to lost updates */ // Shared mutable state - the source of our problemslet counter = 0; // What we think happens:// Thread A reads counter (0)// Thread A increments (1)// Thread A writes counter (1)// Thread B reads counter (1)// Thread B increments (2)// Thread B writes counter (2)// Result: counter = 2 // What can actually happen (interleaved execution):// Thread A reads counter (0)// Thread B reads counter (0) ← Both read BEFORE either writes// Thread A increments (1)// Thread B increments (1) ← Both compute 0 + 1 = 1// Thread A writes counter (1)// Thread B writes counter (1) ← Thread B overwrites A's update// Result: counter = 1 (lost one increment!) // Visual timeline of the race condition:/*Time │ Thread A │ Thread B │ counter value──────┼─────────────────────┼─────────────────────┼──────────────── t0 │ READ counter=0 │ │ 0 t1 │ │ READ counter=0 │ 0 t2 │ compute 0+1=1 │ │ 0 t3 │ │ compute 0+1=1 │ 0 t4 │ WRITE counter=1 │ │ 1 t5 │ │ WRITE counter=1 │ 1 ← Should be 2!*/ // The fix: Use atomic operations or locksimport { Worker, isMainThread, parentPort } from 'worker_threads'; // SOLUTION 1: Use atomic operations (best for simple counters)const sharedBuffer = new SharedArrayBuffer(4);const atomicCounter = new Int32Array(sharedBuffer); function safeIncrement(): void { // Atomics.add is guaranteed to be atomic - no lost updates Atomics.add(atomicCounter, 0, 1);} // SOLUTION 2: Use mutex/lock (for complex multi-step operations)import { Mutex } from 'async-mutex'; const mutex = new Mutex();let protectedCounter = 0; async function safeIncrementWithLock(): Promise<void> { const release = await mutex.acquire(); try { // Only one thread can be in this section at a time const current = protectedCounter; protectedCounter = current + 1; } finally { release(); }}Even experts get concurrent code wrong. Studies show that experienced developers regularly introduce race conditions, even when specifically trying to avoid them. This isn't a failure of skill—it's a fundamental limitation of human cognition. We think sequentially; concurrent systems are not sequential. This is why modern best practices favor message-passing, immutability, and higher-level abstractions over raw shared-state threading.
We've developed a comprehensive understanding of threads as lightweight execution units within processes. Let's consolidate the essential concepts:
What's Next:
Understanding what threads are is just the beginning. In the next page, we'll explore the thread lifecycle—how threads are created, how they move through execution states, and how they communicate their completion. This lifecycle understanding is essential for correctly managing thread resources and coordinating concurrent operations.
You now understand threads as lightweight execution units—their structure, how they relate to processes, what they share, their benefits, and their inherent challenges. This mental model is the foundation for understanding thread lifecycle management and synchronization patterns that follow.