System Design (LLD)Concurrency Fundamentals

Threads and Processes

LevelIntermediate

Duration75 mins

TopicConcurrency Fundamentals

2 / 4

Thread: The Lightweight Execution Unit

The Need for Lighter-Weight Concurrency

In the previous page, we explored processes as independent execution environments. We saw how the operating system provides each process with its own isolated address space, file descriptors, and security context. This isolation is powerful—but it comes at a cost.

The Process Problem:

Consider building a web server that handles 10,000 concurrent connections. With processes:

Each process consumes several megabytes of memory for its own address space
Creating a process takes milliseconds (loading, initializing, allocating)
Switching between processes is expensive (TLB flush, cache pollution)
Sharing data between processes requires explicit IPC with serialization overhead

For many concurrent workloads, this overhead is simply too high. We need a lighter-weight abstraction that enables concurrency within a process, sharing resources while maintaining independent execution flows.

What You Will Master

By the end of this page, you will understand threads as the fundamental unit of CPU scheduling—how they share process resources, why they're more efficient for certain workloads, and what new challenges they introduce. This knowledge is essential for designing concurrent systems that are both performant and correct.

What Is a Thread?

A thread is an independent sequence of execution within a process. While a process defines the environment for execution (address space, resources, security context), a thread defines a path of execution through that environment.

The Core Insight:

Threads are sometimes called "lightweight processes," but this undersells the key architectural difference:

A process owns resources: memory, files, sockets, handles
A thread uses those resources while maintaining its own execution context

Multiple threads within the same process share the process's resources but execute independently. Each thread has its own:

Program counter — what instruction is currently executing
Stack — local variables, function call history
Register set — current CPU state
Thread-local storage — thread-specific data

But threads share:

Code segment — the same program instructions
Data segment — global and static variables
Heap — dynamically allocated memory
Open files — file descriptors, network sockets
Process metadata — PID, user ID, working directory

thread-vs-process-model.txt
┌─────────────────────────────── PROCESS ───────────────────────────────┐
│                                                                         │
│   ┌─────────────────── SHARED RESOURCES ───────────────────┐           │
│   │                                                         │           │
│   │   Code Segment (TEXT)      │    Data Segment           │           │
│   │   ─────────────────────    │    ────────────           │           │
│   │   Executable instructions  │    Global variables       │           │
│   │   (read-only, shared)      │    Static variables       │           │
│   │                            │                           │           │
│   ├────────────────────────────┴───────────────────────────┤           │
│   │                                                         │           │
│   │                        HEAP                             │           │
│   │              (Dynamically allocated memory)             │           │
│   │        Shared among all threads - CAUTION!              │           │
│   │                                                         │           │
│   ├─────────────────────────────────────────────────────────┤           │
│   │                                                         │           │
│   │   Open Files    │  Sockets   │   Signal Handlers       │           │
│   │                 │            │                          │           │
│   └─────────────────────────────────────────────────────────┘           │
│                                                                         │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                 │
│   │   THREAD 1   │  │   THREAD 2   │  │   THREAD 3   │                 │
│   │              │  │              │  │              │                 │
│   │ ┌──────────┐ │  │ ┌──────────┐ │  │ ┌──────────┐ │                 │
│   │ │ Stack    │ │  │ │ Stack    │ │  │ │ Stack    │ │                 │
│   │ │ (Private)│ │  │ │ (Private)│ │  │ │ (Private)│ │                 │
│   │ └──────────┘ │  │ └──────────┘ │  │ └──────────┘ │                 │
│   │              │  │              │  │              │                 │
│   │ Registers    │  │ Registers    │  │ Registers    │                 │
│   │ Program Ctr  │  │ Program Ctr  │  │ Program Ctr  │                 │
│   │ Thread State │  │ Thread State │  │ Thread State │                 │
│   │              │  │              │  │              │                 │
│   └──────────────┘  └──────────────┘  └──────────────┘                 │
│         │                  │                  │                         │
│         └──────────────────┴──────────────────┘                         │
│                     All threads execute concurrently                    │
│                     in the same address space                           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

The Highway Analogy

Think of a process as a highway system (the infrastructure), and threads as individual cars traveling on those highways. All cars share the same roads, bridges, and tunnels (shared resources), but each car has its own driver, dashboard instruments, and current position (private thread state). Cars can travel independently, but they must follow traffic rules to avoid crashes (synchronization).

Thread Control Block: The Thread's Identity

Just as the Process Control Block (PCB) tracks process-level information, the operating system maintains a Thread Control Block (TCB) for each thread. The TCB is significantly smaller than the PCB because threads share most process-level resources.

thread-control-block.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// Conceptual representation of a Thread Control Block
interface ThreadControlBlock {
    // === THREAD IDENTIFICATION ===
    threadId: number;                     // Unique thread ID within process
    processId: number;                    // Parent process ID (shared)
    
    // === THREAD STATE ===
    state: ThreadState;                   // Current execution state
    priority: number;                     // Thread scheduling priority
    
    // === CPU CONTEXT (saved during context switches) ===
    programCounter: number;               // Address of next instruction
    stackPointer: number;                 // Top of thread's stack
    basePointer: number;                  // Base of current stack frame
    generalRegisters: RegisterSet;        // CPU register values
    floatingPointRegisters: FPRegisterSet; // FPU state
    statusFlags: CPUFlags;                // Condition codes
    
    // === STACK INFORMATION ===
    stackBase: number;                    // Bottom of stack allocation
    stackSize: number;                    // Stack size limit
    stackGuardPage: number;               // Guard page for overflow detection
    
    // === THREAD-SPECIFIC DATA ===
    threadLocalStorage: Map<string, any>; // TLS variables
    errorNumber: number;                  // Thread-local errno
    
    // === SYNCHRONIZATION STATE ===
    waitingOn: SyncPrimitive | null;      // What is thread blocked on?
    ownedMutexes: Mutex[];                // Mutexes held by this thread
    
    // === SIGNAL HANDLING ===
    signalMask: SignalMask;               // Thread's signal mask
    pendingSignals: Signal[];             // Signals for this thread
    
    // === SCHEDULING METADATA ===
    cpuAffinity: number[];                // Preferred CPU cores
    lastRunTime: number;                  // When thread last ran
    totalCpuTime: number;                 // Accumulated CPU time
}
 
enum ThreadState {
    RUNNABLE = 'RUNNABLE',                // Ready to run
    RUNNING = 'RUNNING',                  // Currently executing
    BLOCKED = 'BLOCKED',                  // Waiting on synchronization
    WAITING = 'WAITING',                  // Waiting on I/O or timer
    TERMINATED = 'TERMINATED'             // Thread has exited
}

PCB vs TCB: What Does Each Track?
Information	In PCB (Process)	In TCB (Thread)
Process ID	✓	Reference to parent
Thread ID	—	✓
Address space (page table)	✓	Shared from PCB
Open file descriptors	✓	Shared from PCB
CPU registers	—	✓ (per thread)
Program counter	—	✓ (per thread)
Stack pointer/frame	—	✓ (per thread)
User/Group ID	✓	Shared from PCB
Signal handlers	✓	Shared
Signal mask	—	✓ (per thread)
Thread-local storage	—	✓ (per thread)
Scheduling priority	Process-wide default	✓ (per thread)

Why TCBs Are Smaller

A PCB might consume 1-4 KB of kernel memory. A TCB typically needs only 200-500 bytes because it references the parent process's resources rather than duplicating them. This is why systems can support many more threads than processes—the overhead per unit of concurrency is dramatically lower.

Threads vs Processes: The Complete Trade-off Analysis

The choice between threads and processes is one of the most fundamental architectural decisions in concurrent system design. Each approach has distinct advantages and costs. Understanding these trade-offs enables informed decisions rather than default choices.

Advantages of Threads

•Fast Creation — Creating a thread is 10-100x faster than forking a process. No address space duplication needed.
•Efficient Context Switching — Thread switches within a process don't require TLB flushes or page table switches.
•Low Memory Overhead — Threads share most memory; only stack and TCB are per-thread. A thread might add only 8KB-1MB.
•Easy Data Sharing — Threads share the heap and global variables directly. No serialization or IPC mechanisms needed.
•Fast Communication — Threads communicate through shared memory at hardware speeds, not syscall speeds.
•Resource Sharing — Open files, sockets, and handles are automatically available to all threads.

Challenges of Threads

•No Isolation — A bug in one thread can corrupt data used by other threads. Memory errors affect the whole process.
•Complex Synchronization — Shared mutable state requires locks, which introduce deadlocks, livelocks, and performance bottlenecks.
•Cascading Failures — An unhandled exception or fatal error in any thread crashes the entire process.
•Debugging Difficulty — Race conditions are non-deterministic, hard to reproduce, and challenging to diagnose.
•Security Risks — All threads share the same security context. Compromising one thread compromises all.
•Limited Scalability — All threads contend for the same address space and kernel structures.

Quantitative Comparison: Threads vs Processes
Metric	Threads	Processes	Difference
Creation time	~10-50 μs	~1-10 ms	100-1000x faster
Context switch	~1-5 μs	~10-50 μs	5-50x faster
Memory overhead	~8 KB - 1 MB	~4-10 MB+	10-100x smaller
Communication latency	~10-100 ns	~1-10 μs	10-1000x faster
Isolation	None	Full	Qualitative
Failure blast radius	Whole process	Single process	Contained

The Architecture Decision Matrix

Use threads when: you need low-latency communication, share large amounts of data, or require many concurrent activities with moderate isolation needs. Use processes when: you need failure isolation, run untrusted code, require different security contexts, or are aggregating independent services. Many modern systems use both—processes for service boundaries, threads within each service for parallelism.

Types of Threads: Kernel vs User-Level

Not all threads are created equal. There are fundamental differences in how threads can be implemented, and these differences have profound implications for performance, capabilities, and portability.

Kernel threads (also called native threads or OS threads) are managed directly by the operating system kernel. Each user-level thread maps to exactly one kernel scheduling entity.

Characteristics:

•Kernel visibility — The OS scheduler sees and manages each thread individually
•Pre-emptive scheduling — The kernel can interrupt any thread at any time
•True parallelism — Threads can run simultaneously on multiple CPU cores
•Blocking is efficient — When one thread blocks on I/O, others can run
•System call overhead — Thread operations require kernel transitions
•Limited scalability — Each thread consumes kernel resources (~8-64 KB)

Used by: Linux (NPTL), Windows, macOS, most modern operating systems

┌────────────────────────── User Space ──────────────────────────┐
│                                                                 │
│   Thread A        Thread B        Thread C        Thread D     │
│      │               │               │               │         │
└──────┼───────────────┼───────────────┼───────────────┼─────────┘
       │               │               │               │
       ▼               ▼               ▼               ▼
┌──────────────────────── Kernel Space ──────────────────────────┐
│                                                                 │
│   Kernel           Kernel          Kernel          Kernel      │
│   Thread A         Thread B        Thread C        Thread D    │
│      │               │               │               │         │
└──────┼───────────────┼───────────────┼───────────────┼─────────┘
       │               │               │               │
       ▼               ▼               ▼               ▼
┌───────────────────────── Hardware ─────────────────────────────┐
│                                                                 │
│     CPU Core 0                     CPU Core 1                  │
│    (runs A or B)                  (runs C or D)                │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

The Modern Trend: M:N Concurrency

Modern languages increasingly adopt M:N models. Go's goroutines, Kotlin's coroutines, and Rust's async/await all provide millions of lightweight concurrent units multiplexed onto a smaller pool of OS threads. This gives the scalability of user-level threads with the parallelism of kernel threads.

Thread Memory Model: What's Shared, What's Private

Understanding exactly what threads share—and don't share—is crucial for writing correct concurrent code. This memory model determines where data races can occur and where they cannot.

Thread Memory Sharing Model
Memory Region	Shared?	Implications
Code (Text) Segment	✓ Shared	All threads execute the same instructions. Read-only, so no races.
Global Variables	✓ Shared	⚠️ Can cause data races! Must synchronize access.
Static Variables	✓ Shared	⚠️ Same as globals—potential race condition source.
Heap Memory	✓ Shared	⚠️ Objects allocated with new/malloc are accessible to all threads.
Stack (each thread)	✗ Private	Local variables are safe from other threads by default.
Registers/PC	✗ Private	Each thread has its own CPU context—no sharing.
Thread-Local Storage	✗ Private	Explicitly thread-private variables. No synchronization needed.
Open Files	✓ Shared	File descriptors are shared, but file position may vary by option.

thread-memory-sharing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
/**
 * Demonstrating what threads share and don't share
 */
 
// SHARED: Module-level (global) variables - all threads see the same value
let sharedCounter = 0;  // ⚠️ DANGER: Race condition if multiple threads modify
 
// SHARED: Objects allocated in heap memory
interface SharedState {
    items: string[];
    metrics: {
        requestCount: number;
        errorCount: number;
    };
}
 
const sharedState: SharedState = {
    items: [],           // ⚠️ Needs synchronization for safe concurrent access
    metrics: {
        requestCount: 0,  // ⚠️ Race condition without atomic operations
        errorCount: 0
    }
};
 
// PRIVATE: Function parameters and local variables live on thread's stack
function processRequest(requestId: string): void {
    // These are PRIVATE to each thread's call
    const startTime = Date.now();     // Local variable - stack allocated
    const localBuffer: number[] = []; // Local array reference on stack
                                       // (but if passed elsewhere, could be shared)
    
    // This modifies SHARED state - needs synchronization!
    sharedState.metrics.requestCount++;  // ⚠️ Not thread-safe!
    
    // Local processing - completely thread-safe
    for (let i = 0; i < 100; i++) {
        localBuffer.push(i * requestId.length);
    }
    
    const duration = Date.now() - startTime;     // Stack - private
    console.log(`Request ${requestId} took ${duration}ms`);
}
 
// THREAD-LOCAL: Each thread gets its own instance
// In Node.js, AsyncLocalStorage provides thread-local-like semantics
import { AsyncLocalStorage } from 'async_hooks';
 
const requestContext = new AsyncLocalStorage<{
    traceId: string;
    userId: string;
}>();
 
// Each async context has its own isolated "thread-local" data
async function handleRequest(traceId: string, userId: string) {
    return requestContext.run({ traceId, userId }, async () => {
        // Get current context - private to this async context
        const ctx = requestContext.getStore()!;
        console.log(`[Trace: ${ctx.traceId}] Processing for user ${ctx.userId}`);
        
        // Nested calls see the same context
        await performDatabaseQuery();
    });
}
 
async function performDatabaseQuery() {
    const ctx = requestContext.getStore()!;
    // Access trace context without passing it explicitly
    console.log(`[Trace: ${ctx.traceId}] Executing query`);
}

The Stack Escape Hatch

While stack-allocated local variables are private to a thread, if you take a pointer/reference to a stack variable and pass it to another thread, that data becomes shared. This is a common source of subtle bugs: the variable's lifetime is tied to the function's stack frame, but another thread may try to access it after the function returns.

Benefits of Multithreading

Multithreading is not merely about performance—it enables entire categories of application behaviors that would be impossible or impractical with single-threaded execution.

Core Benefits of Multithreading

•Responsiveness — In interactive applications, long-running operations can execute in background threads while the UI thread remains responsive. A file download doesn't freeze the application; a database query doesn't lock up the user interface.
•Parallelism on Multi-Core Systems — Modern CPUs have 4, 8, 16, or more cores. A single-threaded program uses only one core—wasting 75-95%+ of available computing power. Threads enable true parallel execution across all cores.
•Resource Utilization — While one thread waits for I/O (disk, network, user input), other threads can perform useful computation. The CPU time that would be wasted waiting becomes productive work.
•Simplified Program Structure — Some problems naturally decompose into concurrent activities. A web server handling multiple clients, or a game with physics, rendering, and AI—threads allow modeling each concern independently rather than interleaving them manually.
•Reduced Latency — By parallelizing independent operations, you can reduce end-to-end latency. Fetching data from three microservices concurrently takes the time of the slowest call, not the sum of all three.
•Throughput Scaling — Adding threads (up to the point of contention) can linearly increase throughput for parallelizable workloads. Processing 10,000 images? Eight threads can approach 8x speedup.

thread-benefits-example.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
import { Worker, isMainThread, parentPort, workerData } from 'worker_threads';
import { cpus } from 'os';
 
/**
 * Example: Image processing with worker threads
 * Demonstrates parallelism benefits on multi-core systems
 */
 
interface ImageTask {
    imageId: string;
    imageData: Buffer;
}
 
interface ProcessResult {
    imageId: string;
    processedData: Buffer;
    processingTimeMs: number;
}
 
// CPU-intensive image processing function
function processImage(imageData: Buffer): Buffer {
    // Simulate complex image processing (resize, filter, compress)
    // In reality: sharp, jimp, or native bindings
    let result = Buffer.alloc(imageData.length);
    for (let i = 0; i < imageData.length; i++) {
        result[i] = Math.floor(imageData[i] * 0.8); // Simplified transform
    }
    return result;
}
 
if (isMainThread) {
    // MAIN THREAD: Orchestrates parallel processing
    
    async function processImagesBatch(images: ImageTask[]): Promise<ProcessResult[]> {
        const numWorkers = cpus().length;
        const results: ProcessResult[] = [];
        const chunkSize = Math.ceil(images.length / numWorkers);
        
        console.log(`Processing ${images.length} images with ${numWorkers} workers`);
        const startTime = Date.now();
        
        const workerPromises = [];
        
        for (let i = 0; i < numWorkers; i++) {
            const chunk = images.slice(i * chunkSize, (i + 1) * chunkSize);
            if (chunk.length === 0) continue;
            
            const promise = new Promise<ProcessResult[]>((resolve, reject) => {
                const worker = new Worker(__filename, {
                    workerData: chunk
                });
                
                worker.on('message', resolve);
                worker.on('error', reject);
            });
            
            workerPromises.push(promise);
        }
        
        const workerResults = await Promise.all(workerPromises);
        const allResults = workerResults.flat();
        
        const totalTime = Date.now() - startTime;
        console.log(`Processed ${allResults.length} images in ${totalTime}ms`);
        console.log(`Average: ${(totalTime / allResults.length).toFixed(2)}ms per image`);
        console.log(`Speedup vs single-threaded: ~${numWorkers}x`);
        
        return allResults;
    }
    
} else {
    // WORKER THREAD: Processes assigned images
    
    const tasks: ImageTask[] = workerData;
    const results: ProcessResult[] = [];
    
    for (const task of tasks) {
        const start = Date.now();
        const processedData = processImage(task.imageData);
        const processingTimeMs = Date.now() - start;
        
        results.push({
            imageId: task.imageId,
            processedData,
            processingTimeMs
        });
    }
    
    // Send results back to main thread
    parentPort?.postMessage(results);
}

Amdahl's Law Reality Check

Parallelism has limits. Amdahl's Law states that speedup is constrained by the sequential portion of your program. If 20% of work must be sequential, maximum speedup is 5x regardless of thread count. Identify the serial bottlenecks before throwing threads at a problem—sometimes the better investment is reducing the sequential fraction.

The Challenges: Why Threads Are Hard

The benefits of threads come with significant engineering challenges. These problems are not just inconvenient—they're fundamentally difficult because they stem from non-determinism and the limits of human reasoning about concurrent execution.

Core Challenges of Multithreaded Programming

•Race Conditions — When the program's behavior depends on the relative timing of thread execution, results become unpredictable. The same code may work correctly 99.9% of the time and fail catastrophically 0.1% of the time—usually in production, under load.
•Deadlocks — When threads wait for each other in a cycle (A waits for B, B waits for A), none can proceed. The program freezes permanently. Detecting and preventing deadlocks requires careful resource ordering.
•Livelocks — Threads actively work but make no progress, constantly reacting to each other's actions. Like two people in a hallway who keep stepping aside in the same direction.
•Starvation — Some threads perpetually defer to others, never getting CPU time. Priority inversion can cause high-priority threads to wait for low-priority threads.
•Data Corruption — Without proper synchronization, concurrent writes to shared data can leave it in an inconsistent state. Half-written updates, torn reads, lost updates.
•Non-Deterministic Bugs — Thread bugs often cannot be reproduced reliably. Adding debug logging changes timing and hides the bug. The heisenbug problem—observing changes behavior.

race-condition-example.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
/**
 * Classic race condition: Lost Update Problem
 * 
 * Two threads incrementing a counter "simultaneously"
 * Expected: counter = 2000 after 1000 increments each
 * Actual: counter < 2000 due to lost updates
 */
 
// Shared mutable state - the source of our problems
let counter = 0;
 
// What we think happens:
//   Thread A reads counter (0)
//   Thread A increments (1)
//   Thread A writes counter (1)
//   Thread B reads counter (1)
//   Thread B increments (2)
//   Thread B writes counter (2)
//   Result: counter = 2
 
// What can actually happen (interleaved execution):
//   Thread A reads counter (0)
//   Thread B reads counter (0)     ← Both read BEFORE either writes
//   Thread A increments (1)
//   Thread B increments (1)         ← Both compute 0 + 1 = 1
//   Thread A writes counter (1)
//   Thread B writes counter (1)     ← Thread B overwrites A's update
//   Result: counter = 1 (lost one increment!)
 
// Visual timeline of the race condition:
/*
Time  │  Thread A          │  Thread B          │  counter value
──────┼─────────────────────┼─────────────────────┼────────────────
 t0   │  READ counter=0     │                     │       0
 t1   │                     │  READ counter=0     │       0
 t2   │  compute 0+1=1      │                     │       0
 t3   │                     │  compute 0+1=1      │       0
 t4   │  WRITE counter=1    │                     │       1
 t5   │                     │  WRITE counter=1    │       1  ← Should be 2!
*/
 
// The fix: Use atomic operations or locks
import { Worker, isMainThread, parentPort } from 'worker_threads';
 
// SOLUTION 1: Use atomic operations (best for simple counters)
const sharedBuffer = new SharedArrayBuffer(4);
const atomicCounter = new Int32Array(sharedBuffer);
 
function safeIncrement(): void {
    // Atomics.add is guaranteed to be atomic - no lost updates
    Atomics.add(atomicCounter, 0, 1);
}
 
// SOLUTION 2: Use mutex/lock (for complex multi-step operations)
import { Mutex } from 'async-mutex';
 
const mutex = new Mutex();
let protectedCounter = 0;
 
async function safeIncrementWithLock(): Promise<void> {
    const release = await mutex.acquire();
    try {
        // Only one thread can be in this section at a time
        const current = protectedCounter;
        protectedCounter = current + 1;
    } finally {
        release();
    }
}

The Humbling Truth

Even experts get concurrent code wrong. Studies show that experienced developers regularly introduce race conditions, even when specifically trying to avoid them. This isn't a failure of skill—it's a fundamental limitation of human cognition. We think sequentially; concurrent systems are not sequential. This is why modern best practices favor message-passing, immutability, and higher-level abstractions over raw shared-state threading.

Summary: The Thread Mental Model

We've developed a comprehensive understanding of threads as lightweight execution units within processes. Let's consolidate the essential concepts:

Key Takeaways

•Threads are execution paths within a process — They share process resources (memory, files) while maintaining private execution context (stack, registers, PC).
•The TCB is lightweight — Thread Control Blocks consume far less memory than Process Control Blocks because most state is shared.
•Threads trade isolation for efficiency — Faster creation, cheaper context switches, easier data sharing—but bugs can corrupt the entire process.
•Thread implementations vary — Kernel threads (1:1), user threads (N:1), and hybrid (M:N) models each have distinct trade-offs.
•Memory sharing is double-edged — Shared heap and globals enable fast communication but require careful synchronization.
•Benefits are substantial — Responsiveness, parallelism, resource utilization, and structural clarity.
•Challenges are fundamental — Race conditions, deadlocks, and non-determinism make concurrent programming inherently difficult.

What's Next:

Understanding what threads are is just the beginning. In the next page, we'll explore the thread lifecycle—how threads are created, how they move through execution states, and how they communicate their completion. This lifecycle understanding is essential for correctly managing thread resources and coordinating concurrent operations.

Page Complete

You now understand threads as lightweight execution units—their structure, how they relate to processes, what they share, their benefits, and their inherent challenges. This mental model is the foundation for understanding thread lifecycle management and synchronization patterns that follow.

2 / 4

Loading learning content...

System Design (LLD)Concurrency Fundamentals

Threads and Processes

LevelIntermediate

Duration75 mins

TopicConcurrency Fundamentals

2 / 4

Thread: The Lightweight Execution Unit

The Need for Lighter-Weight Concurrency

The Process Problem:

Consider building a web server that handles 10,000 concurrent connections. With processes:

Each process consumes several megabytes of memory for its own address space
Creating a process takes milliseconds (loading, initializing, allocating)
Switching between processes is expensive (TLB flush, cache pollution)
Sharing data between processes requires explicit IPC with serialization overhead

What You Will Master

What Is a Thread?

The Core Insight:

Threads are sometimes called "lightweight processes," but this undersells the key architectural difference:

A process owns resources: memory, files, sockets, handles
A thread uses those resources while maintaining its own execution context

Multiple threads within the same process share the process's resources but execute independently. Each thread has its own:

Program counter — what instruction is currently executing
Stack — local variables, function call history
Register set — current CPU state
Thread-local storage — thread-specific data

But threads share:

Code segment — the same program instructions
Data segment — global and static variables
Heap — dynamically allocated memory
Open files — file descriptors, network sockets
Process metadata — PID, user ID, working directory

thread-vs-process-model.txt
┌─────────────────────────────── PROCESS ───────────────────────────────┐
│                                                                         │
│   ┌─────────────────── SHARED RESOURCES ───────────────────┐           │
│   │                                                         │           │
│   │   Code Segment (TEXT)      │    Data Segment           │           │
│   │   ─────────────────────    │    ────────────           │           │
│   │   Executable instructions  │    Global variables       │           │
│   │   (read-only, shared)      │    Static variables       │           │
│   │                            │                           │           │
│   ├────────────────────────────┴───────────────────────────┤           │
│   │                                                         │           │
│   │                        HEAP                             │           │
│   │              (Dynamically allocated memory)             │           │
│   │        Shared among all threads - CAUTION!              │           │
│   │                                                         │           │
│   ├─────────────────────────────────────────────────────────┤           │
│   │                                                         │           │
│   │   Open Files    │  Sockets   │   Signal Handlers       │           │
│   │                 │            │                          │           │
│   └─────────────────────────────────────────────────────────┘           │
│                                                                         │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                 │
│   │   THREAD 1   │  │   THREAD 2   │  │   THREAD 3   │                 │
│   │              │  │              │  │              │                 │
│   │ ┌──────────┐ │  │ ┌──────────┐ │  │ ┌──────────┐ │                 │
│   │ │ Stack    │ │  │ │ Stack    │ │  │ │ Stack    │ │                 │
│   │ │ (Private)│ │  │ │ (Private)│ │  │ │ (Private)│ │                 │
│   │ └──────────┘ │  │ └──────────┘ │  │ └──────────┘ │                 │
│   │              │  │              │  │              │                 │
│   │ Registers    │  │ Registers    │  │ Registers    │                 │
│   │ Program Ctr  │  │ Program Ctr  │  │ Program Ctr  │                 │
│   │ Thread State │  │ Thread State │  │ Thread State │                 │
│   │              │  │              │  │              │                 │
│   └──────────────┘  └──────────────┘  └──────────────┘                 │
│         │                  │                  │                         │
│         └──────────────────┴──────────────────┘                         │
│                     All threads execute concurrently                    │
│                     in the same address space                           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

The Highway Analogy

Thread Control Block: The Thread's Identity

thread-control-block.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// Conceptual representation of a Thread Control Block
interface ThreadControlBlock {
    // === THREAD IDENTIFICATION ===
    threadId: number;                     // Unique thread ID within process
    processId: number;                    // Parent process ID (shared)
    
    // === THREAD STATE ===
    state: ThreadState;                   // Current execution state
    priority: number;                     // Thread scheduling priority
    
    // === CPU CONTEXT (saved during context switches) ===
    programCounter: number;               // Address of next instruction
    stackPointer: number;                 // Top of thread's stack
    basePointer: number;                  // Base of current stack frame
    generalRegisters: RegisterSet;        // CPU register values
    floatingPointRegisters: FPRegisterSet; // FPU state
    statusFlags: CPUFlags;                // Condition codes
    
    // === STACK INFORMATION ===
    stackBase: number;                    // Bottom of stack allocation
    stackSize: number;                    // Stack size limit
    stackGuardPage: number;               // Guard page for overflow detection
    
    // === THREAD-SPECIFIC DATA ===
    threadLocalStorage: Map<string, any>; // TLS variables
    errorNumber: number;                  // Thread-local errno
    
    // === SYNCHRONIZATION STATE ===
    waitingOn: SyncPrimitive | null;      // What is thread blocked on?
    ownedMutexes: Mutex[];                // Mutexes held by this thread
    
    // === SIGNAL HANDLING ===
    signalMask: SignalMask;               // Thread's signal mask
    pendingSignals: Signal[];             // Signals for this thread
    
    // === SCHEDULING METADATA ===
    cpuAffinity: number[];                // Preferred CPU cores
    lastRunTime: number;                  // When thread last ran
    totalCpuTime: number;                 // Accumulated CPU time
}
 
enum ThreadState {
    RUNNABLE = 'RUNNABLE',                // Ready to run
    RUNNING = 'RUNNING',                  // Currently executing
    BLOCKED = 'BLOCKED',                  // Waiting on synchronization
    WAITING = 'WAITING',                  // Waiting on I/O or timer
    TERMINATED = 'TERMINATED'             // Thread has exited
}

PCB vs TCB: What Does Each Track?
Information	In PCB (Process)	In TCB (Thread)
Process ID	✓	Reference to parent
Thread ID	—	✓
Address space (page table)	✓	Shared from PCB
Open file descriptors	✓	Shared from PCB
CPU registers	—	✓ (per thread)
Program counter	—	✓ (per thread)
Stack pointer/frame	—	✓ (per thread)
User/Group ID	✓	Shared from PCB
Signal handlers	✓	Shared
Signal mask	—	✓ (per thread)
Thread-local storage	—	✓ (per thread)
Scheduling priority	Process-wide default	✓ (per thread)

Why TCBs Are Smaller

Threads vs Processes: The Complete Trade-off Analysis

Advantages of Threads

•Fast Creation — Creating a thread is 10-100x faster than forking a process. No address space duplication needed.
•Efficient Context Switching — Thread switches within a process don't require TLB flushes or page table switches.
•Low Memory Overhead — Threads share most memory; only stack and TCB are per-thread. A thread might add only 8KB-1MB.
•Easy Data Sharing — Threads share the heap and global variables directly. No serialization or IPC mechanisms needed.
•Fast Communication — Threads communicate through shared memory at hardware speeds, not syscall speeds.
•Resource Sharing — Open files, sockets, and handles are automatically available to all threads.

Challenges of Threads

•No Isolation — A bug in one thread can corrupt data used by other threads. Memory errors affect the whole process.
•Complex Synchronization — Shared mutable state requires locks, which introduce deadlocks, livelocks, and performance bottlenecks.
•Cascading Failures — An unhandled exception or fatal error in any thread crashes the entire process.
•Debugging Difficulty — Race conditions are non-deterministic, hard to reproduce, and challenging to diagnose.
•Security Risks — All threads share the same security context. Compromising one thread compromises all.
•Limited Scalability — All threads contend for the same address space and kernel structures.

Quantitative Comparison: Threads vs Processes
Metric	Threads	Processes	Difference
Creation time	~10-50 μs	~1-10 ms	100-1000x faster
Context switch	~1-5 μs	~10-50 μs	5-50x faster
Memory overhead	~8 KB - 1 MB	~4-10 MB+	10-100x smaller
Communication latency	~10-100 ns	~1-10 μs	10-1000x faster
Isolation	None	Full	Qualitative
Failure blast radius	Whole process	Single process	Contained

The Architecture Decision Matrix

Types of Threads: Kernel vs User-Level

Kernel threads (also called native threads or OS threads) are managed directly by the operating system kernel. Each user-level thread maps to exactly one kernel scheduling entity.

Characteristics:

•Kernel visibility — The OS scheduler sees and manages each thread individually
•Pre-emptive scheduling — The kernel can interrupt any thread at any time
•True parallelism — Threads can run simultaneously on multiple CPU cores
•Blocking is efficient — When one thread blocks on I/O, others can run
•System call overhead — Thread operations require kernel transitions
•Limited scalability — Each thread consumes kernel resources (~8-64 KB)

Used by: Linux (NPTL), Windows, macOS, most modern operating systems

┌────────────────────────── User Space ──────────────────────────┐
│                                                                 │
│   Thread A        Thread B        Thread C        Thread D     │
│      │               │               │               │         │
└──────┼───────────────┼───────────────┼───────────────┼─────────┘
       │               │               │               │
       ▼               ▼               ▼               ▼
┌──────────────────────── Kernel Space ──────────────────────────┐
│                                                                 │
│   Kernel           Kernel          Kernel          Kernel      │
│   Thread A         Thread B        Thread C        Thread D    │
│      │               │               │               │         │
└──────┼───────────────┼───────────────┼───────────────┼─────────┘
       │               │               │               │
       ▼               ▼               ▼               ▼
┌───────────────────────── Hardware ─────────────────────────────┐
│                                                                 │
│     CPU Core 0                     CPU Core 1                  │
│    (runs A or B)                  (runs C or D)                │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

The Modern Trend: M:N Concurrency

Thread Memory Model: What's Shared, What's Private

Understanding exactly what threads share—and don't share—is crucial for writing correct concurrent code. This memory model determines where data races can occur and where they cannot.

Thread Memory Sharing Model
Memory Region	Shared?	Implications
Code (Text) Segment	✓ Shared	All threads execute the same instructions. Read-only, so no races.
Global Variables	✓ Shared	⚠️ Can cause data races! Must synchronize access.
Static Variables	✓ Shared	⚠️ Same as globals—potential race condition source.
Heap Memory	✓ Shared	⚠️ Objects allocated with new/malloc are accessible to all threads.
Stack (each thread)	✗ Private	Local variables are safe from other threads by default.
Registers/PC	✗ Private	Each thread has its own CPU context—no sharing.
Thread-Local Storage	✗ Private	Explicitly thread-private variables. No synchronization needed.
Open Files	✓ Shared	File descriptors are shared, but file position may vary by option.

thread-memory-sharing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
/**
 * Demonstrating what threads share and don't share
 */
 
// SHARED: Module-level (global) variables - all threads see the same value
let sharedCounter = 0;  // ⚠️ DANGER: Race condition if multiple threads modify
 
// SHARED: Objects allocated in heap memory
interface SharedState {
    items: string[];
    metrics: {
        requestCount: number;
        errorCount: number;
    };
}
 
const sharedState: SharedState = {
    items: [],           // ⚠️ Needs synchronization for safe concurrent access
    metrics: {
        requestCount: 0,  // ⚠️ Race condition without atomic operations
        errorCount: 0
    }
};
 
// PRIVATE: Function parameters and local variables live on thread's stack
function processRequest(requestId: string): void {
    // These are PRIVATE to each thread's call
    const startTime = Date.now();     // Local variable - stack allocated
    const localBuffer: number[] = []; // Local array reference on stack
                                       // (but if passed elsewhere, could be shared)
    
    // This modifies SHARED state - needs synchronization!
    sharedState.metrics.requestCount++;  // ⚠️ Not thread-safe!
    
    // Local processing - completely thread-safe
    for (let i = 0; i < 100; i++) {
        localBuffer.push(i * requestId.length);
    }
    
    const duration = Date.now() - startTime;     // Stack - private
    console.log(`Request ${requestId} took ${duration}ms`);
}
 
// THREAD-LOCAL: Each thread gets its own instance
// In Node.js, AsyncLocalStorage provides thread-local-like semantics
import { AsyncLocalStorage } from 'async_hooks';
 
const requestContext = new AsyncLocalStorage<{
    traceId: string;
    userId: string;
}>();
 
// Each async context has its own isolated "thread-local" data
async function handleRequest(traceId: string, userId: string) {
    return requestContext.run({ traceId, userId }, async () => {
        // Get current context - private to this async context
        const ctx = requestContext.getStore()!;
        console.log(`[Trace: ${ctx.traceId}] Processing for user ${ctx.userId}`);
        
        // Nested calls see the same context
        await performDatabaseQuery();
    });
}
 
async function performDatabaseQuery() {
    const ctx = requestContext.getStore()!;
    // Access trace context without passing it explicitly
    console.log(`[Trace: ${ctx.traceId}] Executing query`);
}

The Stack Escape Hatch

Benefits of Multithreading

Multithreading is not merely about performance—it enables entire categories of application behaviors that would be impossible or impractical with single-threaded execution.

Core Benefits of Multithreading

•Responsiveness — In interactive applications, long-running operations can execute in background threads while the UI thread remains responsive. A file download doesn't freeze the application; a database query doesn't lock up the user interface.
•Parallelism on Multi-Core Systems — Modern CPUs have 4, 8, 16, or more cores. A single-threaded program uses only one core—wasting 75-95%+ of available computing power. Threads enable true parallel execution across all cores.
•Resource Utilization — While one thread waits for I/O (disk, network, user input), other threads can perform useful computation. The CPU time that would be wasted waiting becomes productive work.
•Simplified Program Structure — Some problems naturally decompose into concurrent activities. A web server handling multiple clients, or a game with physics, rendering, and AI—threads allow modeling each concern independently rather than interleaving them manually.
•Reduced Latency — By parallelizing independent operations, you can reduce end-to-end latency. Fetching data from three microservices concurrently takes the time of the slowest call, not the sum of all three.
•Throughput Scaling — Adding threads (up to the point of contention) can linearly increase throughput for parallelizable workloads. Processing 10,000 images? Eight threads can approach 8x speedup.

thread-benefits-example.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
import { Worker, isMainThread, parentPort, workerData } from 'worker_threads';
import { cpus } from 'os';
 
/**
 * Example: Image processing with worker threads
 * Demonstrates parallelism benefits on multi-core systems
 */
 
interface ImageTask {
    imageId: string;
    imageData: Buffer;
}
 
interface ProcessResult {
    imageId: string;
    processedData: Buffer;
    processingTimeMs: number;
}
 
// CPU-intensive image processing function
function processImage(imageData: Buffer): Buffer {
    // Simulate complex image processing (resize, filter, compress)
    // In reality: sharp, jimp, or native bindings
    let result = Buffer.alloc(imageData.length);
    for (let i = 0; i < imageData.length; i++) {
        result[i] = Math.floor(imageData[i] * 0.8); // Simplified transform
    }
    return result;
}
 
if (isMainThread) {
    // MAIN THREAD: Orchestrates parallel processing
    
    async function processImagesBatch(images: ImageTask[]): Promise<ProcessResult[]> {
        const numWorkers = cpus().length;
        const results: ProcessResult[] = [];
        const chunkSize = Math.ceil(images.length / numWorkers);
        
        console.log(`Processing ${images.length} images with ${numWorkers} workers`);
        const startTime = Date.now();
        
        const workerPromises = [];
        
        for (let i = 0; i < numWorkers; i++) {
            const chunk = images.slice(i * chunkSize, (i + 1) * chunkSize);
            if (chunk.length === 0) continue;
            
            const promise = new Promise<ProcessResult[]>((resolve, reject) => {
                const worker = new Worker(__filename, {
                    workerData: chunk
                });
                
                worker.on('message', resolve);
                worker.on('error', reject);
            });
            
            workerPromises.push(promise);
        }
        
        const workerResults = await Promise.all(workerPromises);
        const allResults = workerResults.flat();
        
        const totalTime = Date.now() - startTime;
        console.log(`Processed ${allResults.length} images in ${totalTime}ms`);
        console.log(`Average: ${(totalTime / allResults.length).toFixed(2)}ms per image`);
        console.log(`Speedup vs single-threaded: ~${numWorkers}x`);
        
        return allResults;
    }
    
} else {
    // WORKER THREAD: Processes assigned images
    
    const tasks: ImageTask[] = workerData;
    const results: ProcessResult[] = [];
    
    for (const task of tasks) {
        const start = Date.now();
        const processedData = processImage(task.imageData);
        const processingTimeMs = Date.now() - start;
        
        results.push({
            imageId: task.imageId,
            processedData,
            processingTimeMs
        });
    }
    
    // Send results back to main thread
    parentPort?.postMessage(results);
}

Amdahl's Law Reality Check

The Challenges: Why Threads Are Hard

Core Challenges of Multithreaded Programming

•Race Conditions — When the program's behavior depends on the relative timing of thread execution, results become unpredictable. The same code may work correctly 99.9% of the time and fail catastrophically 0.1% of the time—usually in production, under load.
•Deadlocks — When threads wait for each other in a cycle (A waits for B, B waits for A), none can proceed. The program freezes permanently. Detecting and preventing deadlocks requires careful resource ordering.
•Livelocks — Threads actively work but make no progress, constantly reacting to each other's actions. Like two people in a hallway who keep stepping aside in the same direction.
•Starvation — Some threads perpetually defer to others, never getting CPU time. Priority inversion can cause high-priority threads to wait for low-priority threads.
•Data Corruption — Without proper synchronization, concurrent writes to shared data can leave it in an inconsistent state. Half-written updates, torn reads, lost updates.
•Non-Deterministic Bugs — Thread bugs often cannot be reproduced reliably. Adding debug logging changes timing and hides the bug. The heisenbug problem—observing changes behavior.

race-condition-example.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
/**
 * Classic race condition: Lost Update Problem
 * 
 * Two threads incrementing a counter "simultaneously"
 * Expected: counter = 2000 after 1000 increments each
 * Actual: counter < 2000 due to lost updates
 */
 
// Shared mutable state - the source of our problems
let counter = 0;
 
// What we think happens:
//   Thread A reads counter (0)
//   Thread A increments (1)
//   Thread A writes counter (1)
//   Thread B reads counter (1)
//   Thread B increments (2)
//   Thread B writes counter (2)
//   Result: counter = 2
 
// What can actually happen (interleaved execution):
//   Thread A reads counter (0)
//   Thread B reads counter (0)     ← Both read BEFORE either writes
//   Thread A increments (1)
//   Thread B increments (1)         ← Both compute 0 + 1 = 1
//   Thread A writes counter (1)
//   Thread B writes counter (1)     ← Thread B overwrites A's update
//   Result: counter = 1 (lost one increment!)
 
// Visual timeline of the race condition:
/*
Time  │  Thread A          │  Thread B          │  counter value
──────┼─────────────────────┼─────────────────────┼────────────────
 t0   │  READ counter=0     │                     │       0
 t1   │                     │  READ counter=0     │       0
 t2   │  compute 0+1=1      │                     │       0
 t3   │                     │  compute 0+1=1      │       0
 t4   │  WRITE counter=1    │                     │       1
 t5   │                     │  WRITE counter=1    │       1  ← Should be 2!
*/
 
// The fix: Use atomic operations or locks
import { Worker, isMainThread, parentPort } from 'worker_threads';
 
// SOLUTION 1: Use atomic operations (best for simple counters)
const sharedBuffer = new SharedArrayBuffer(4);
const atomicCounter = new Int32Array(sharedBuffer);
 
function safeIncrement(): void {
    // Atomics.add is guaranteed to be atomic - no lost updates
    Atomics.add(atomicCounter, 0, 1);
}
 
// SOLUTION 2: Use mutex/lock (for complex multi-step operations)
import { Mutex } from 'async-mutex';
 
const mutex = new Mutex();
let protectedCounter = 0;
 
async function safeIncrementWithLock(): Promise<void> {
    const release = await mutex.acquire();
    try {
        // Only one thread can be in this section at a time
        const current = protectedCounter;
        protectedCounter = current + 1;
    } finally {
        release();
    }
}

The Humbling Truth

Summary: The Thread Mental Model

We've developed a comprehensive understanding of threads as lightweight execution units within processes. Let's consolidate the essential concepts:

Key Takeaways

•Threads are execution paths within a process — They share process resources (memory, files) while maintaining private execution context (stack, registers, PC).
•The TCB is lightweight — Thread Control Blocks consume far less memory than Process Control Blocks because most state is shared.
•Threads trade isolation for efficiency — Faster creation, cheaper context switches, easier data sharing—but bugs can corrupt the entire process.
•Thread implementations vary — Kernel threads (1:1), user threads (N:1), and hybrid (M:N) models each have distinct trade-offs.
•Memory sharing is double-edged — Shared heap and globals enable fast communication but require careful synchronization.
•Benefits are substantial — Responsiveness, parallelism, resource utilization, and structural clarity.
•Challenges are fundamental — Race conditions, deadlocks, and non-determinism make concurrent programming inherently difficult.

What's Next:

Page Complete

2 / 4