Operating SystemsProcess States

Process States

LevelBeginner

Duration60 mins

TopicProcess States

4 / 5

Waiting (Blocked) State: Suspended Until Event

The Pause That Enables Progress

The Waiting state (also called Blocked or Sleeping state) might seem like a process standing still—and in terms of CPU consumption, it is. But paradoxically, this pause is essential for system efficiency and proper program behavior.

Without the Waiting state, a process needing data from a slow disk would have to continuously poll, wasting CPU cycles checking "is data ready yet?" thousands of times per second. With the Waiting state, the process politely steps aside: "Wake me when data arrives." The CPU is freed for productive work, and the process resumes precisely when its needs are met.

This elegant mechanism—suspending processes until their requirements are satisfied—is fundamental to how modern operating systems achieve high throughput and responsiveness despite the vast speed disparity between CPUs and I/O devices.

What You Will Learn

By the end of this page, you will understand the Waiting state comprehensively—what causes processes to block, how the kernel tracks waiting processes, the different types of wait conditions, how and when processes are awakened, and the performance implications of blocking operations.

Defining the Waiting State

The Waiting state (also called Blocked or Sleeping state) represents a process that cannot proceed until some external condition is satisfied. Unlike Ready processes—which could run if given a CPU—a Waiting process cannot make progress regardless of CPU availability.

Formal Definition

A process is in the Waiting state when:

It has requested a resource or event that is not immediately available
It cannot proceed until that resource becomes available or event occurs
It is not in the scheduler's ready queue
It is registered on a wait queue for the expected event

Key Distinction: Waiting vs Ready

Ready State

•Waiting for: CPU allocation only
•Can run: Yes, immediately
•Queue: Ready queue (scheduler)
•Blocking on: Nothing
•Exit trigger: Scheduler selection

Waiting State

•Waiting for: External event/resource
•Can run: No, even with CPU
•Queue: Event-specific wait queue
•Blocking on: I/O, lock, timer, etc.
•Exit trigger: Event occurrence

Why Waiting Exists

The Waiting state solves the problem of CPU-I/O speed mismatch:

Operation	Typical Latency	CPU Cycles at 3GHz
L1 Cache Access	1ns	3
L3 Cache Access	10ns	30
Main Memory	100ns	300
SSD Read	100μs	300,000
HDD Read	10ms	30,000,000
Network Packet	1-100ms	3M-300M

A CPU waiting for disk could execute 30 million instructions in the same time. Having a process spin-wait would waste enormous capacity. The Waiting state frees the CPU to do 30 million useful instructions for other processes.

Sleeping Is Efficient

Don't think of 'blocked' or 'waiting' as bad. A process in Waiting state consumes zero CPU—it's the most efficient possible state. Systems often run best with many waiting processes, as it indicates I/O is being overlapped with CPU work. High CPU utilization with few waiting processes may indicate CPU-bound bottlenecks.

Causes of Blocking

Processes enter the Waiting state when they request something the operating system cannot immediately provide. These blocking conditions fall into several categories:

1. I/O Operations (Most Common)

Whenever a process needs data from or sends data to external devices:

I/O Blocking Operations

•Disk Read (read(), pread()) — Wait for data transfer from storage
•Disk Write (write(), pwrite()) — May block if buffers full (sync writes always block)
•Network Receive (recv(), recvfrom()) — Wait for packet arrival
•Network Send (send(), sendto()) — May block if socket buffer full
•Terminal Input (read on stdin) — Wait for user to type
•Pipe Read (read on pipe fd) — Wait for writer to produce data
•Device Control (ioctl()) — Wait for device operation completion

2. Synchronization Primitives

Multi-threaded and multi-process programs use locks and semaphores:

Mutex lock (pthread_mutex_lock()) — Wait if mutex held by another thread
Semaphore wait (sem_wait()) — Wait if semaphore count is zero
Condition variable (pthread_cond_wait()) — Wait for condition signal
Reader-writer lock — Writers wait for readers; readers may wait for writers
Futex (Linux) — Fast userspace mutex; blocks in kernel when contended

3. Inter-Process Communication

Message queue receive (msgrcv()) — Wait for message arrival
Shared memory wait — Usually via synchronization primitive
Signal wait (sigsuspend(), sigwait()) — Wait for signal delivery
UNIX domain socket — Similar behavior to network sockets

4. Process Coordination

wait()/waitpid() — Parent waits for child termination
join (pthread_join()) — Thread waits for another thread
Barrier (pthread_barrier_wait()) — All threads wait until all arrive

5. Time-Based Waiting

sleep()/nanosleep() — Wait for specified duration
poll()/select() with timeout — Wait for I/O or timeout
Timer expiration — Process waits for timer to fire

Blocking Operation Categories
Category	Example Syscall	Wait For	Typical Duration
Disk I/O	read() on file	Data transfer from disk	1-100ms
Network I/O	recv()	Packet arrival	1ms-30s+ (timeout)
User Input	read() on terminal	Keystroke	Seconds to hours
Mutex	pthread_mutex_lock()	Lock release	Microseconds to seconds
Child Process	wait()	Child exit	Seconds to hours
Timer	sleep()	Time passage	As specified

Blocking Is a Choice

Most blocking operations have non-blocking alternatives. You can open files with O_NONBLOCK, use poll/select/epoll to check readiness before I/O, or use trylock variants for mutexes. The trade-off is complexity: non-blocking code must handle 'not ready' cases explicitly, while blocking code has simpler linear flow.

The Transition: Running to Waiting

When a running process invokes a blocking operation, a precise sequence of kernel operations occurs to suspend it properly:

The Blocking Sequence

Blocking System Call Flow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
User Process                    Kernel
═══════════════════════════════════════════════════════════════
 
read(fd, buf, 1024)
     │
     ▼
    SYSCALL instruction ──────────► Enter kernel mode
                                    │
                                    ▼
                          Lookup file descriptor
                                    │
                                    ▼
                          Check: Data available in buffer?
                                    │
                        ┌───────────┴───────────┐
                        │                       │
                   Data YES                Data NO
                        │                       │
                        ▼                       ▼
                  Copy to user buf        Cannot satisfy now!
                  Return byte count              │
                        │                       ▼
                        │            Add process to wait queue
                        │            (specific to this I/O)
                        │                       │
                        │                       ▼
                        │            Set process state = WAITING
                        │                       │
                        │                       ▼
                        │            Remove from ready queue
                        │                       │
                        │                       ▼
                        │            call schedule()
                        │            ─────────────────────────
                        │            Select next ready process
                        │            Context switch to it
                        │                       │
                        ▼                       │
                  Resume user code        (Process suspended)

Critical Steps in Blocking

1. Wait Queue Registration

The kernel maintains wait queues for each resource that can cause blocking. When process blocks:

Kernel identifies appropriate wait queue (e.g., for this file, socket, mutex)
Adds process's PCB to that queue
This registration enables wake-up when event occurs

2. State Update

The PCB's state field changes from RUNNING to WAITING. Some systems distinguish:

INTERRUPTIBLE SLEEP: Can be woken by signals (most common)
UNINTERRUPTIBLE SLEEP (D state): Cannot be interrupted; usually I/O in critical section

3. Scheduler Invocation

Since current process can't continue, kernel calls schedule():

Select next process from ready queue
Context switch to that process
Original process is now "asleep" on wait queue

The Dreaded 'D' State

On Linux, processes in uninterruptible sleep show as 'D' in ps/top. These processes cannot be killed (even with SIGKILL) until the I/O completes. 'D' state processes stuck waiting on unresponsive NFS mounts or failing disks are notoriously difficult to clear—sometimes requiring reboot. This is why the 'D' state should be used only for critical, short-duration I/O.

Wait Queues: Organizing Sleeping Processes

The kernel doesn't maintain a single "waiting" list. Instead, it uses multiple wait queues, each associated with a specific resource or event. This allows efficient wake-up—only processes waiting for a particular event need to be examined.

Wait Queue Structure

Linux Wait Queue Structure (Simplified)
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Each wait queue has a head
struct wait_queue_head {
    spinlock_t lock;              // Protects the list
    struct list_head head;        // List of waiting entries
};
 
// Each waiting process has an entry
struct wait_queue_entry {
    unsigned int flags;           // Exclusive wake, etc.
    void *private;               // Usually points to task_struct
    wait_queue_func_t func;      // Wake-up callback
    struct list_head entry;      // Links in the queue
};
 
// Common wait queue instances:
//   - Each socket has a wait queue (for recv blocking)
//   - Each pipe has wait queues (read and write ends)
//   - Each mutex has a wait queue (for lock contention)
//   - Each file inode may have wait queues
//   - The scheduler has wait queues for sleep/timers
 
// Example: Two processes waiting on same pipe read
// 
// pipe_inode->wait_queue:
//   ┌────────────────┐    ┌────────────────┐
//   │ Process A      │───►│ Process B      │───► NULL
//   │ (waiting read) │    │ (waiting read) │
//   └────────────────┘    └────────────────┘

Types of Wait Queues

Device Wait Queues: Each device driver maintains queues for processes waiting on I/O from that device.

Filesystem Wait Queues: Files, directories, and filesystem structures have associated queues for blocking operations.

Socket Wait Queues: Each socket has separate queues for receive (waiting for data) and possibly send (waiting for buffer space).

Lock Wait Queues: Each mutex, semaphore, or other lock has a queue of processes waiting to acquire it.

Condition Variable Queues: Each condition variable has a queue of processes waiting for the condition.

Exclusive vs Non-Exclusive Wakeup

When an event occurs, who should wake up?

Non-exclusive wake (wake_up): All waiters are awakened. Use when all waiters might be able to proceed.

Exclusive wake (wake_up_interruptible): Only one waiter is awakened. Use when only one can succeed (e.g., lock acquisition).

Exclusive wakeup prevents the thundering herd problem—where all waiters wake, only one succeeds, and the rest re-block, wasting CPU time.

Wait Queue Types and Characteristics
Queue Type	Associated With	Wake Policy	Example
Device I/O	Driver/device	Non-exclusive or exclusive	Disk completing multiple reads
Pipe/FIFO	Pipe endpoints	Often exclusive (one reader)	Writer awakens one reader
Socket	Socket buffer	Non-exclusive (edge-triggered)	Network packet arrival
Mutex	Lock object	Exclusive (one winner)	Contended pthread_mutex
Semaphore	Semaphore	Exclusive or counted	Producer-consumer
Condition Variable	Condition+mutex	User-controlled	Thread signaling

Efficient Wait Queue Design

Good kernel design minimizes wait queue operations. Hashing (one queue per hash bucket) distributes wait queues to reduce contention. Futexes use address-based hashing so millions of mutexes don't need kernel structures until contention actually occurs. This lazy approach scales to applications with thousands of locks.

Wake-Up Mechanisms: Returning to Ready

When the event a process is waiting for occurs, the kernel must wake up the sleeping process, transitioning it from Waiting to Ready state. This wake-up is typically triggered by interrupt handlers or other processes.

Wake-Up Triggers

Common Wake-Up Sources

•Device Interrupt Handler — Disk completes read; network card receives packet; timer expires. Handler wakes processes waiting for that event.
•Another Process — Writer puts data in pipe → wakes waiting reader. Parent process → wakes waiting child via signal.
•Mutex Release — Thread releases lock → wakes one waiting thread.
•Signal Delivery — SIGINT, SIGTERM, etc. may wake interruptible sleepers.
•Scheduler Timer — Wakes processes whose sleep() duration has elapsed.
•Condition Signal — pthread_cond_signal()/broadcast() wakes waiting threads.

Wake-Up Sequence (Disk I/O Example)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Timeline:
 
T=0     Process P calls read(fd) for data not in cache
        P goes to WAITING state on disk's wait queue
        Scheduler runs Process Q instead
 
T=1     (Q is running, P is waiting)
        
T=2     Disk completes DMA transfer, raises interrupt
        
        ┌─────────────────────────────────────────────────┐
        │               INTERRUPT HANDLER                  │
        │                                                  │
        │  1. Acknowledge disk interrupt                   │
        │  2. Mark I/O buffer as complete                  │
        │  3. Find wait queue for this I/O                 │
        │  4. For each waiter on queue:                    │
        │     a. Remove from wait queue                    │
        │     b. Set state = READY                         │
        │     c. Add to scheduler ready queue              │
        │  5. If awakened process has higher priority:     │
        │     - Set need_resched flag                      │
        │  6. Return from interrupt                        │
        └─────────────────────────────────────────────────┘
 
T=3     Interrupt returns to Q's context
        But need_resched is set, so:
        - Q's state → READY (preempted)
        - P's state → RUNNING (P had higher priority)
        
        OR: Q continues, P runs when Q blocks/exhausts slice

Wake-Up Details

1. Locate Wait Queue

The interrupt handler knows which device/resource triggered and can find its associated wait queue.

2. Process Each Waiter

For each process on the queue:

Remove the wait queue entry
Change process state: WAITING → READY
Insert PCB into scheduler's ready queue

3. Check for Preemption

If awakened process has higher priority than currently running:

Set a flag indicating reschedule needed
On return from interrupt, scheduler will preempt

4. Actual Preemption (if needed)

If the awakened process is higher priority and preemption is enabled:

Currently running process goes to Ready
Awakened process dispatched immediately

This ensures interactive processes get quick response after I/O.

Wake-Up Does Not Mean Immediate Execution

Being awakened transitions a process to Ready state, not Running. The awakened process must still wait for the scheduler to dispatch it. On a busy system, this could take additional milliseconds. However, priority-based schedulers typically boost I/O-completing processes to reduce this delay for interactive workloads.

Interruptible vs Uninterruptible Sleep

Not all waiting is equal. The kernel distinguishes between processes that can be awakened by signals and those that cannot.

Interruptible Sleep (TASK_INTERRUPTIBLE)

Most blocking operations use interruptible sleep:

Process can be awakened by the awaited event OR by a signal
If awakened by signal, syscall returns early (often EINTR)
User can Ctrl+C to interrupt waiting process
Shown as 'S' (sleeping) in ps/top

Examples: read() on terminal, sleep(), wait(), network recv()

Uninterruptible Sleep (TASK_UNINTERRUPTIBLE)

Some critical operations cannot be interrupted:

Process can ONLY be awakened by the awaited event
Signals are queued but not delivered until awakened
Cannot be killed with SIGKILL while in this state
Shown as 'D' (disk sleep) in ps/top

Examples: Certain disk I/O, NFS operations, page fault handling

Sleep State Comparison
Characteristic	Interruptible (S)	Uninterruptible (D)
Signal delivery	Immediate, may wake	Queued until event completes
SIGKILL effect	Process terminated	Ignored until wake-up
Typical use	Most I/O, user input	Critical I/O, page faults
ps/top display	S (sleeping)	D (disk sleep)
Normal duration	Any length	Should be very brief
Stuck process risk	Low (can kill)	High (immortal until I/O)

Checking Process Wait State
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# View process states on Linux
$ ps aux | head -1
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT   START   TIME COMMAND
 
$ ps aux | grep -E '^USER|[SD]'
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT   START   TIME COMMAND
root       234  0.0  0.0      0     0 ?        D      Jan15   0:00 [kworker/0:1H]
mysql     1234  0.1  5.2  890123  5432 ?       S      Jan15   1:23 /usr/bin/mysqld
nginx     5678  0.0  0.2  12345   234 ?        S      Jan15   0:45 nginx: worker
 
# STAT column meanings:
#   S  = Interruptible sleep (waiting for event)
#   D  = Uninterruptible sleep (usually I/O)
#   R  = Running or runnable
#   Z  = Zombie (terminated but not reaped)
#   T  = Stopped (by signal or debugger)
 
# Processes stuck in 'D' state for long periods indicate I/O problems
# Common causes: failing disk, unresponsive NFS, kernel driver bug

TASK_KILLABLE: A Middle Ground (Linux)

Linux 2.6.25 introduced TASK_KILLABLE:

Like uninterruptible, but can be awakened by fatal signals (SIGKILL)
Allows killing stuck NFS processes without compromising I/O safety
Used by modern filesystem code

This addresses the historical complaint that 'D' state processes were immortal.

When You See Many 'D' State Processes

A system with many processes in 'D' state typically indicates I/O subsystem problems: overloaded or failed disk, network filesystem timeout, or driver issues. Investigate with iostat, iotop, and storage health tools. These processes consume no CPU but represent stuck work—potentially blocked user requests or application hangs.

Blocking and System Performance

The relationship between blocking and system performance is nuanced. Blocking itself isn't bad—but the patterns of blocking significantly affect throughput and latency.

Healthy Blocking Patterns

In a well-functioning system:

Processes block briefly for I/O, then resume quickly
Multiple processes block on different resources, enabling I/O parallelism
CPU stays busy with ready processes while others wait
I/O devices stay busy with submitted requests

Ideal Overlapping I/O Pattern
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Time ─────────────────────────────────────────────────────────────►
 
Process A: [RUN]━━━━[WAIT disk]━━━━━[RUN]━━━[WAIT net]━━━━[RUN]
Process B:      [RUN]━━[WAIT net]━━━━━[RUN]━━━━━━━━━━━━━━[WAIT disk]
Process C:           [RUN]━━━━━[WAIT lock]━━[RUN]━━━━━━━━━━━━[RUN]
 
CPU Usage: ████████████████████████████████████████████████████████
           (Always at least one process ready to run)
 
Disk I/O:       ████              ████              ████████
           (Requests in flight while CPU works)
 
Network I/O:         ████████              ████████
           (Network I/O overlapped with disk and CPU)
 
Result: High CPU utilization AND high I/O throughput
        Blocking enables OTHER work while waiting

Unhealthy Blocking Patterns

1. Serialized I/O

A single-threaded program that does:

read -> process -> write -> read -> process -> write

wastes time during each I/O operation. The disk is idle while CPU processes; CPU is idle while disk reads.

Fix: Use async I/O, multiple threads, or pipelining.

2. Lock Contention

Multiple threads blocked waiting for the same mutex:

Thread 1: Holding lock for 100ms
Thread 2-10: WAITING for lock, doing nothing

Nine threads are effectively idle.

Fix: Reduce critical section size, use finer-grained locking, use lock-free structures.

3. I/O Bottleneck

All work depends on a slow I/O resource:

All 100 web workers: WAITING for database query
Database: Overloaded, queuing requests

CPU is idle while I/O is saturated.

Fix: Add caching, optimize queries, scale I/O capacity.

Blocking Pattern Diagnosis
Symptom	Likely Cause	Investigation Tool
Low CPU, high iowait	Disk bottleneck	iostat, iotop
Low CPU, many 'S' processes	Lock contention	perf lock, mutrace
CPU spikes then idle	Serialized I/O	strace, ltrace
Many 'D' state processes	Storage/NFS problems	dmesg, mount, iostat -x
High latency, low throughput	Sequential blocking	Application profiling

iowait Is Not CPU Wait

The 'iowait' metric in top/vmstat shows CPU idle time where at least one process is waiting for I/O. It's NOT time where the CPU is 'waiting' on I/O—the CPU is idle and could run other processes. High iowait with no ready processes indicates I/O is the bottleneck. High iowait WITH ready processes shouldn't happen—those ready processes would run, consuming the idle time.

Avoiding Blocking: Asynchronous I/O Patterns

For high-performance applications, traditional blocking I/O can limit scalability. Several patterns allow processes to handle I/O without blocking:

1. Non-Blocking I/O with Polling

File descriptors can be set non-blocking:

fcntl(fd, F_SETFL, O_NONBLOCK);
result = read(fd, buf, size);  // Returns immediately with EAGAIN if no data

Problem: Must poll repeatedly, wasting CPU.

2. I/O Multiplexing (select/poll/epoll)

epoll_event_loop.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Handle thousands of connections with one thread
int epfd = epoll_create1(0);
 
// Add sockets to epoll set
struct epoll_event ev;
ev.events = EPOLLIN;  // Interested in read readiness
ev.data.fd = client_socket;
epoll_ctl(epfd, EPOLL_CTL_ADD, client_socket, &ev);
 
// Event loop - process blocks here if no I/O ready
while (1) {
    struct epoll_event events[MAX_EVENTS];
    
    // This DOES block - but only until ANY socket is ready
    int nfds = epoll_wait(epfd, events, MAX_EVENTS, timeout);
    
    // Now we know WHICH sockets are ready - no guessing
    for (int i = 0; i < nfds; i++) {
        int fd = events[i].data.fd;
        // This read won't block - we know data is available
        read(fd, buf, size);
        process_data(buf);
    }
}
 
// Result: One thread handles thousands of connections
// Blocks only when ALL connections are idle
// No per-connection thread overhead

3. Asynchronous I/O (aio/io_uring)

True async I/O submits requests and continues:

// Submit I/O request
io_uring_prep_read(sqe, fd, buf, size, offset);
io_uring_submit(ring);

// Do other work while I/O proceeds...
process_other_stuff();

// Later, check for completions
io_uring_wait_cqe(ring, &cqe);  // Get completed I/O

4. Thread Pool

Delegate blocking operations to worker threads:

Main thread: Accept requests, dispatch to pool
Worker threads: Block on I/O as needed (one thread per request)

Simpler to code but uses more memory per concurrent request.

When to Use Each Pattern

I/O Pattern Selection Guide
Pattern	Best For	Complexity	Concurrency
Blocking I/O	Simple apps, low concurrency	Low	1 per thread
Thread Pool	Moderate concurrency, simple code	Medium	1000s of threads
select/poll	Cross-platform, few connections	Medium	~1000 connections
epoll/kqueue	High concurrency servers	High	100K+ connections
io_uring	Maximum performance Linux	Very High	Millions ops/sec

Blocking Isn't Bad—Context Matters

Don't avoid blocking I/O out of cargo-cult optimization. For many applications, simple blocking code with thread-per-request is perfectly adequate and easier to maintain. Switch to async patterns when you've measured that blocking limits your specific workload—not before. Premature optimization toward complexity is itself a cost.

Summary: The Waiting State in Process Lifecycle

We've completed a comprehensive exploration of the Waiting (Blocked) state—where processes pause for external events. Let's consolidate the key concepts:

Key Takeaways

•Waiting means blocked on external event — A waiting process cannot proceed regardless of CPU availability; it needs something to happen first.
•Blocking causes include I/O, locks, and timers — Disk, network, user input, mutex contention, sleep calls, and IPC all cause blocking.
•Wait queues organize sleeping processes — Each blockable resource has associated queues; wake-up is targeted to relevant waiters only.
•Wake-up transitions to Ready, not Running — Awakened processes must still be dispatched by the scheduler to actually run.
•Interruptible vs uninterruptible matters — 'S' state can be signaled; 'D' state cannot be killed until I/O completes.
•Blocking enables I/O overlap — While one process waits, others run; blocking is the mechanism that makes multitasking efficient.
•Unhealthy blocking patterns hurt performance — Serial I/O, lock contention, and I/O bottlenecks indicate architectural issues.
•Async I/O avoids blocking for scalability — Event loops and io_uring handle many connections without proportional threads.

What's next:

The final process state we'll explore is Terminated—the end of a process's lifecycle. We'll examine what happens when processes exit, how resources are cleaned up, the role of parent processes in reaping children, and the infamous zombie state.

Page Complete

You now understand the Waiting state comprehensively—from blocking causes through wait queue mechanics, wake-up mechanisms, interruptible vs uninterruptible sleep, performance implications, and async I/O alternatives. This knowledge is essential for debugging performance issues and designing responsive systems.

4 / 5

Loading learning content...

Operating SystemsProcess States

Process States

LevelBeginner

Duration60 mins

TopicProcess States

4 / 5

Waiting (Blocked) State: Suspended Until Event

The Pause That Enables Progress

What You Will Learn

Defining the Waiting State

Formal Definition

A process is in the Waiting state when:

It has requested a resource or event that is not immediately available
It cannot proceed until that resource becomes available or event occurs
It is not in the scheduler's ready queue
It is registered on a wait queue for the expected event

Key Distinction: Waiting vs Ready

Ready State

•Waiting for: CPU allocation only
•Can run: Yes, immediately
•Queue: Ready queue (scheduler)
•Blocking on: Nothing
•Exit trigger: Scheduler selection

Waiting State

•Waiting for: External event/resource
•Can run: No, even with CPU
•Queue: Event-specific wait queue
•Blocking on: I/O, lock, timer, etc.
•Exit trigger: Event occurrence

Why Waiting Exists

The Waiting state solves the problem of CPU-I/O speed mismatch:

Operation	Typical Latency	CPU Cycles at 3GHz
L1 Cache Access	1ns	3
L3 Cache Access	10ns	30
Main Memory	100ns	300
SSD Read	100μs	300,000
HDD Read	10ms	30,000,000
Network Packet	1-100ms	3M-300M

Sleeping Is Efficient

Causes of Blocking

Processes enter the Waiting state when they request something the operating system cannot immediately provide. These blocking conditions fall into several categories:

1. I/O Operations (Most Common)

Whenever a process needs data from or sends data to external devices:

I/O Blocking Operations

•Disk Read (read(), pread()) — Wait for data transfer from storage
•Disk Write (write(), pwrite()) — May block if buffers full (sync writes always block)
•Network Receive (recv(), recvfrom()) — Wait for packet arrival
•Network Send (send(), sendto()) — May block if socket buffer full
•Terminal Input (read on stdin) — Wait for user to type
•Pipe Read (read on pipe fd) — Wait for writer to produce data
•Device Control (ioctl()) — Wait for device operation completion

2. Synchronization Primitives

Multi-threaded and multi-process programs use locks and semaphores:

Mutex lock (pthread_mutex_lock()) — Wait if mutex held by another thread
Semaphore wait (sem_wait()) — Wait if semaphore count is zero
Condition variable (pthread_cond_wait()) — Wait for condition signal
Reader-writer lock — Writers wait for readers; readers may wait for writers
Futex (Linux) — Fast userspace mutex; blocks in kernel when contended

3. Inter-Process Communication

Message queue receive (msgrcv()) — Wait for message arrival
Shared memory wait — Usually via synchronization primitive
Signal wait (sigsuspend(), sigwait()) — Wait for signal delivery
UNIX domain socket — Similar behavior to network sockets

4. Process Coordination

wait()/waitpid() — Parent waits for child termination
join (pthread_join()) — Thread waits for another thread
Barrier (pthread_barrier_wait()) — All threads wait until all arrive

5. Time-Based Waiting

sleep()/nanosleep() — Wait for specified duration
poll()/select() with timeout — Wait for I/O or timeout
Timer expiration — Process waits for timer to fire

Blocking Operation Categories
Category	Example Syscall	Wait For	Typical Duration
Disk I/O	read() on file	Data transfer from disk	1-100ms
Network I/O	recv()	Packet arrival	1ms-30s+ (timeout)
User Input	read() on terminal	Keystroke	Seconds to hours
Mutex	pthread_mutex_lock()	Lock release	Microseconds to seconds
Child Process	wait()	Child exit	Seconds to hours
Timer	sleep()	Time passage	As specified

Blocking Is a Choice

The Transition: Running to Waiting

When a running process invokes a blocking operation, a precise sequence of kernel operations occurs to suspend it properly:

The Blocking Sequence

Blocking System Call Flow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
User Process                    Kernel
═══════════════════════════════════════════════════════════════
 
read(fd, buf, 1024)
     │
     ▼
    SYSCALL instruction ──────────► Enter kernel mode
                                    │
                                    ▼
                          Lookup file descriptor
                                    │
                                    ▼
                          Check: Data available in buffer?
                                    │
                        ┌───────────┴───────────┐
                        │                       │
                   Data YES                Data NO
                        │                       │
                        ▼                       ▼
                  Copy to user buf        Cannot satisfy now!
                  Return byte count              │
                        │                       ▼
                        │            Add process to wait queue
                        │            (specific to this I/O)
                        │                       │
                        │                       ▼
                        │            Set process state = WAITING
                        │                       │
                        │                       ▼
                        │            Remove from ready queue
                        │                       │
                        │                       ▼
                        │            call schedule()
                        │            ─────────────────────────
                        │            Select next ready process
                        │            Context switch to it
                        │                       │
                        ▼                       │
                  Resume user code        (Process suspended)

Critical Steps in Blocking

1. Wait Queue Registration

The kernel maintains wait queues for each resource that can cause blocking. When process blocks:

Kernel identifies appropriate wait queue (e.g., for this file, socket, mutex)
Adds process's PCB to that queue
This registration enables wake-up when event occurs

2. State Update

The PCB's state field changes from RUNNING to WAITING. Some systems distinguish:

INTERRUPTIBLE SLEEP: Can be woken by signals (most common)
UNINTERRUPTIBLE SLEEP (D state): Cannot be interrupted; usually I/O in critical section

3. Scheduler Invocation

Since current process can't continue, kernel calls schedule():

Select next process from ready queue
Context switch to that process
Original process is now "asleep" on wait queue

The Dreaded 'D' State

Wait Queues: Organizing Sleeping Processes

Wait Queue Structure

Linux Wait Queue Structure (Simplified)
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Each wait queue has a head
struct wait_queue_head {
    spinlock_t lock;              // Protects the list
    struct list_head head;        // List of waiting entries
};
 
// Each waiting process has an entry
struct wait_queue_entry {
    unsigned int flags;           // Exclusive wake, etc.
    void *private;               // Usually points to task_struct
    wait_queue_func_t func;      // Wake-up callback
    struct list_head entry;      // Links in the queue
};
 
// Common wait queue instances:
//   - Each socket has a wait queue (for recv blocking)
//   - Each pipe has wait queues (read and write ends)
//   - Each mutex has a wait queue (for lock contention)
//   - Each file inode may have wait queues
//   - The scheduler has wait queues for sleep/timers
 
// Example: Two processes waiting on same pipe read
// 
// pipe_inode->wait_queue:
//   ┌────────────────┐    ┌────────────────┐
//   │ Process A      │───►│ Process B      │───► NULL
//   │ (waiting read) │    │ (waiting read) │
//   └────────────────┘    └────────────────┘

Types of Wait Queues

Device Wait Queues: Each device driver maintains queues for processes waiting on I/O from that device.

Filesystem Wait Queues: Files, directories, and filesystem structures have associated queues for blocking operations.

Socket Wait Queues: Each socket has separate queues for receive (waiting for data) and possibly send (waiting for buffer space).

Lock Wait Queues: Each mutex, semaphore, or other lock has a queue of processes waiting to acquire it.

Condition Variable Queues: Each condition variable has a queue of processes waiting for the condition.

Exclusive vs Non-Exclusive Wakeup

When an event occurs, who should wake up?

Non-exclusive wake (wake_up): All waiters are awakened. Use when all waiters might be able to proceed.

Exclusive wake (wake_up_interruptible): Only one waiter is awakened. Use when only one can succeed (e.g., lock acquisition).

Exclusive wakeup prevents the thundering herd problem—where all waiters wake, only one succeeds, and the rest re-block, wasting CPU time.

Wait Queue Types and Characteristics
Queue Type	Associated With	Wake Policy	Example
Device I/O	Driver/device	Non-exclusive or exclusive	Disk completing multiple reads
Pipe/FIFO	Pipe endpoints	Often exclusive (one reader)	Writer awakens one reader
Socket	Socket buffer	Non-exclusive (edge-triggered)	Network packet arrival
Mutex	Lock object	Exclusive (one winner)	Contended pthread_mutex
Semaphore	Semaphore	Exclusive or counted	Producer-consumer
Condition Variable	Condition+mutex	User-controlled	Thread signaling

Efficient Wait Queue Design

Wake-Up Mechanisms: Returning to Ready

Wake-Up Triggers

Common Wake-Up Sources

•Device Interrupt Handler — Disk completes read; network card receives packet; timer expires. Handler wakes processes waiting for that event.
•Another Process — Writer puts data in pipe → wakes waiting reader. Parent process → wakes waiting child via signal.
•Mutex Release — Thread releases lock → wakes one waiting thread.
•Signal Delivery — SIGINT, SIGTERM, etc. may wake interruptible sleepers.
•Scheduler Timer — Wakes processes whose sleep() duration has elapsed.
•Condition Signal — pthread_cond_signal()/broadcast() wakes waiting threads.

Wake-Up Sequence (Disk I/O Example)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Timeline:
 
T=0     Process P calls read(fd) for data not in cache
        P goes to WAITING state on disk's wait queue
        Scheduler runs Process Q instead
 
T=1     (Q is running, P is waiting)
        
T=2     Disk completes DMA transfer, raises interrupt
        
        ┌─────────────────────────────────────────────────┐
        │               INTERRUPT HANDLER                  │
        │                                                  │
        │  1. Acknowledge disk interrupt                   │
        │  2. Mark I/O buffer as complete                  │
        │  3. Find wait queue for this I/O                 │
        │  4. For each waiter on queue:                    │
        │     a. Remove from wait queue                    │
        │     b. Set state = READY                         │
        │     c. Add to scheduler ready queue              │
        │  5. If awakened process has higher priority:     │
        │     - Set need_resched flag                      │
        │  6. Return from interrupt                        │
        └─────────────────────────────────────────────────┘
 
T=3     Interrupt returns to Q's context
        But need_resched is set, so:
        - Q's state → READY (preempted)
        - P's state → RUNNING (P had higher priority)
        
        OR: Q continues, P runs when Q blocks/exhausts slice

Wake-Up Details

1. Locate Wait Queue

The interrupt handler knows which device/resource triggered and can find its associated wait queue.

2. Process Each Waiter

For each process on the queue:

Remove the wait queue entry
Change process state: WAITING → READY
Insert PCB into scheduler's ready queue

3. Check for Preemption

If awakened process has higher priority than currently running:

Set a flag indicating reschedule needed
On return from interrupt, scheduler will preempt

4. Actual Preemption (if needed)

If the awakened process is higher priority and preemption is enabled:

Currently running process goes to Ready
Awakened process dispatched immediately

This ensures interactive processes get quick response after I/O.

Wake-Up Does Not Mean Immediate Execution

Interruptible vs Uninterruptible Sleep

Not all waiting is equal. The kernel distinguishes between processes that can be awakened by signals and those that cannot.

Interruptible Sleep (TASK_INTERRUPTIBLE)

Most blocking operations use interruptible sleep:

Process can be awakened by the awaited event OR by a signal
If awakened by signal, syscall returns early (often EINTR)
User can Ctrl+C to interrupt waiting process
Shown as 'S' (sleeping) in ps/top

Examples: read() on terminal, sleep(), wait(), network recv()

Uninterruptible Sleep (TASK_UNINTERRUPTIBLE)

Some critical operations cannot be interrupted:

Process can ONLY be awakened by the awaited event
Signals are queued but not delivered until awakened
Cannot be killed with SIGKILL while in this state
Shown as 'D' (disk sleep) in ps/top

Examples: Certain disk I/O, NFS operations, page fault handling

Sleep State Comparison
Characteristic	Interruptible (S)	Uninterruptible (D)
Signal delivery	Immediate, may wake	Queued until event completes
SIGKILL effect	Process terminated	Ignored until wake-up
Typical use	Most I/O, user input	Critical I/O, page faults
ps/top display	S (sleeping)	D (disk sleep)
Normal duration	Any length	Should be very brief
Stuck process risk	Low (can kill)	High (immortal until I/O)

Checking Process Wait State
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# View process states on Linux
$ ps aux | head -1
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT   START   TIME COMMAND
 
$ ps aux | grep -E '^USER|[SD]'
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT   START   TIME COMMAND
root       234  0.0  0.0      0     0 ?        D      Jan15   0:00 [kworker/0:1H]
mysql     1234  0.1  5.2  890123  5432 ?       S      Jan15   1:23 /usr/bin/mysqld
nginx     5678  0.0  0.2  12345   234 ?        S      Jan15   0:45 nginx: worker
 
# STAT column meanings:
#   S  = Interruptible sleep (waiting for event)
#   D  = Uninterruptible sleep (usually I/O)
#   R  = Running or runnable
#   Z  = Zombie (terminated but not reaped)
#   T  = Stopped (by signal or debugger)
 
# Processes stuck in 'D' state for long periods indicate I/O problems
# Common causes: failing disk, unresponsive NFS, kernel driver bug

TASK_KILLABLE: A Middle Ground (Linux)

Linux 2.6.25 introduced TASK_KILLABLE:

Like uninterruptible, but can be awakened by fatal signals (SIGKILL)
Allows killing stuck NFS processes without compromising I/O safety
Used by modern filesystem code

This addresses the historical complaint that 'D' state processes were immortal.

When You See Many 'D' State Processes

Blocking and System Performance

The relationship between blocking and system performance is nuanced. Blocking itself isn't bad—but the patterns of blocking significantly affect throughput and latency.

Healthy Blocking Patterns

In a well-functioning system:

Processes block briefly for I/O, then resume quickly
Multiple processes block on different resources, enabling I/O parallelism
CPU stays busy with ready processes while others wait
I/O devices stay busy with submitted requests

Ideal Overlapping I/O Pattern
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Time ─────────────────────────────────────────────────────────────►
 
Process A: [RUN]━━━━[WAIT disk]━━━━━[RUN]━━━[WAIT net]━━━━[RUN]
Process B:      [RUN]━━[WAIT net]━━━━━[RUN]━━━━━━━━━━━━━━[WAIT disk]
Process C:           [RUN]━━━━━[WAIT lock]━━[RUN]━━━━━━━━━━━━[RUN]
 
CPU Usage: ████████████████████████████████████████████████████████
           (Always at least one process ready to run)
 
Disk I/O:       ████              ████              ████████
           (Requests in flight while CPU works)
 
Network I/O:         ████████              ████████
           (Network I/O overlapped with disk and CPU)
 
Result: High CPU utilization AND high I/O throughput
        Blocking enables OTHER work while waiting

Unhealthy Blocking Patterns

1. Serialized I/O

A single-threaded program that does:

read -> process -> write -> read -> process -> write

wastes time during each I/O operation. The disk is idle while CPU processes; CPU is idle while disk reads.

Fix: Use async I/O, multiple threads, or pipelining.

2. Lock Contention

Multiple threads blocked waiting for the same mutex:

Thread 1: Holding lock for 100ms
Thread 2-10: WAITING for lock, doing nothing

Nine threads are effectively idle.

Fix: Reduce critical section size, use finer-grained locking, use lock-free structures.

3. I/O Bottleneck

All work depends on a slow I/O resource:

All 100 web workers: WAITING for database query
Database: Overloaded, queuing requests

CPU is idle while I/O is saturated.

Fix: Add caching, optimize queries, scale I/O capacity.

Blocking Pattern Diagnosis
Symptom	Likely Cause	Investigation Tool
Low CPU, high iowait	Disk bottleneck	iostat, iotop
Low CPU, many 'S' processes	Lock contention	perf lock, mutrace
CPU spikes then idle	Serialized I/O	strace, ltrace
Many 'D' state processes	Storage/NFS problems	dmesg, mount, iostat -x
High latency, low throughput	Sequential blocking	Application profiling

iowait Is Not CPU Wait

Avoiding Blocking: Asynchronous I/O Patterns

For high-performance applications, traditional blocking I/O can limit scalability. Several patterns allow processes to handle I/O without blocking:

1. Non-Blocking I/O with Polling

File descriptors can be set non-blocking:

fcntl(fd, F_SETFL, O_NONBLOCK);
result = read(fd, buf, size);  // Returns immediately with EAGAIN if no data

Problem: Must poll repeatedly, wasting CPU.

2. I/O Multiplexing (select/poll/epoll)

epoll_event_loop.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Handle thousands of connections with one thread
int epfd = epoll_create1(0);
 
// Add sockets to epoll set
struct epoll_event ev;
ev.events = EPOLLIN;  // Interested in read readiness
ev.data.fd = client_socket;
epoll_ctl(epfd, EPOLL_CTL_ADD, client_socket, &ev);
 
// Event loop - process blocks here if no I/O ready
while (1) {
    struct epoll_event events[MAX_EVENTS];
    
    // This DOES block - but only until ANY socket is ready
    int nfds = epoll_wait(epfd, events, MAX_EVENTS, timeout);
    
    // Now we know WHICH sockets are ready - no guessing
    for (int i = 0; i < nfds; i++) {
        int fd = events[i].data.fd;
        // This read won't block - we know data is available
        read(fd, buf, size);
        process_data(buf);
    }
}
 
// Result: One thread handles thousands of connections
// Blocks only when ALL connections are idle
// No per-connection thread overhead

3. Asynchronous I/O (aio/io_uring)

True async I/O submits requests and continues:

// Submit I/O request
io_uring_prep_read(sqe, fd, buf, size, offset);
io_uring_submit(ring);

// Do other work while I/O proceeds...
process_other_stuff();

// Later, check for completions
io_uring_wait_cqe(ring, &cqe);  // Get completed I/O

4. Thread Pool

Delegate blocking operations to worker threads:

Main thread: Accept requests, dispatch to pool
Worker threads: Block on I/O as needed (one thread per request)

Simpler to code but uses more memory per concurrent request.

When to Use Each Pattern

I/O Pattern Selection Guide
Pattern	Best For	Complexity	Concurrency
Blocking I/O	Simple apps, low concurrency	Low	1 per thread
Thread Pool	Moderate concurrency, simple code	Medium	1000s of threads
select/poll	Cross-platform, few connections	Medium	~1000 connections
epoll/kqueue	High concurrency servers	High	100K+ connections
io_uring	Maximum performance Linux	Very High	Millions ops/sec

Blocking Isn't Bad—Context Matters

Summary: The Waiting State in Process Lifecycle

We've completed a comprehensive exploration of the Waiting (Blocked) state—where processes pause for external events. Let's consolidate the key concepts:

Key Takeaways

•Waiting means blocked on external event — A waiting process cannot proceed regardless of CPU availability; it needs something to happen first.
•Blocking causes include I/O, locks, and timers — Disk, network, user input, mutex contention, sleep calls, and IPC all cause blocking.
•Wait queues organize sleeping processes — Each blockable resource has associated queues; wake-up is targeted to relevant waiters only.
•Wake-up transitions to Ready, not Running — Awakened processes must still be dispatched by the scheduler to actually run.
•Interruptible vs uninterruptible matters — 'S' state can be signaled; 'D' state cannot be killed until I/O completes.
•Blocking enables I/O overlap — While one process waits, others run; blocking is the mechanism that makes multitasking efficient.
•Unhealthy blocking patterns hurt performance — Serial I/O, lock contention, and I/O bottlenecks indicate architectural issues.
•Async I/O avoids blocking for scalability — Event loops and io_uring handle many connections without proportional threads.

What's next:

Page Complete

4 / 5