Loading content...
At any given instant, every process in the system is in exactly one state. The state tells the kernel—and by extension, system administrators and debuggers—what the process is doing and what it's capable of doing. Is it actively executing instructions? Is it waiting for keyboard input? Is it finished but still lingering in memory?
The process state is arguably the most important field in the PCB because it determines how the scheduler treats the process. A process in the RUNNING state owns a CPU. A process in the BLOCKED state won't be considered for scheduling at all. A process in the ZOMBIE state has finished execution but can't release its PCB until its parent collects its exit status.
Understanding process states is essential for understanding everything from system responsiveness to debugging hung applications to preventing resource leaks. When you run ps and see cryptic letters like S, R, D, or Z, you're looking directly at process state information.
By the end of this page, you will master the process state model: the five fundamental states, the transitions between them, extended states used in real operating systems, and how state changes are implemented in the kernel. You'll understand why processes get stuck, how the scheduler uses state to make decisions, and how to diagnose process behavior using state information.
The classic process lifecycle is described by a five-state model. While real operating systems have additional states for specific scenarios, these five states capture the essential phases every process goes through:
Detailed State Descriptions:
| State | Description | PCB Location | Scheduler Treatment |
|---|---|---|---|
| New | Process is being created. Memory is being allocated, PCB initialized, executable loaded. | Just allocated, not yet in any queue | Not visible to scheduler yet |
| Ready | Process has everything it needs to run except CPU time. It's runnable but not running. | In the ready queue (or one of multiple priority queues) | Candidate for selection; will be dispatched when chosen |
| Running | Process is actively executing instructions on a CPU core. | Associated with a specific CPU | Currently executing; scheduler monitors time slice |
| Blocked/Waiting | Process cannot proceed until some event occurs (I/O completion, timer, signal, etc.) | In a wait queue specific to the awaited event | Not considered for scheduling until event occurs |
| Terminated | Process has finished (exit called or killed). Resources released except PCB with exit status. | In zombie list or marked for cleanup | Will never run again; awaiting parent's wait() |
Processes don't remain in one state forever. They transition between states as they execute, wait for resources, get scheduled, and eventually terminate. Understanding these transitions is crucial for understanding how the operating system manages processes.
Not all transitions are valid. A process CANNOT go directly from Blocked to Running—it must go through Ready first. A Terminated process can never become Ready again. Understanding which transitions are impossible helps debug scheduling anomalies.
Why can't Blocked → Running happen directly?
Consider the scenario: Process A is blocked waiting for disk I/O. The disk interrupt fires—the data is ready. Why not just let A continue immediately?
The answer involves fairness and scheduler integrity:
Another process might be running: The CPU isn't just waiting for A. Process B is using it. Interrupting B mid-instruction to run A would violate scheduling decisions already made.
Scheduler invariants: The scheduler expects to choose from the ready queue. Bypassing this breaks priority systems, fair-share accounting, and affinity rules.
Multiple wakeups: If 10 processes are waiting for the same event, they can't all go to Running simultaneously (on a single CPU).
Instead, the event moves A to Ready. When the scheduler runs next (possibly immediately after the interrupt), A will compete for CPU time through normal scheduling mechanisms.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
// Kernel pseudocode for state transitions // Transition: Ready → Running (Dispatch)void dispatch_process(struct task_struct *next) { struct task_struct *prev = current; // Save current process state if (prev->state == TASK_RUNNING) { prev->state = TASK_READY; // Back to ready queue } // Switch to next process next->state = TASK_RUNNING; context_switch(prev, next);} // Transition: Running → Blocked (Block/Sleep)void sleep_on_event(wait_queue_head_t *queue) { struct task_struct *current_task = current; // Atomically: set state and add to wait queue current_task->state = TASK_INTERRUPTIBLE; // Blocked state add_wait_queue(queue, ¤t_task->wait_entry); // Give up CPU - scheduler will pick another process schedule(); // Returns when we're woken up // We're back - remove from wait queue remove_wait_queue(queue, ¤t_task->wait_entry);} // Transition: Blocked → Ready (Wake)void wake_up_process(struct task_struct *task) { unsigned long flags; spin_lock_irqsave(&task->state_lock, flags); if (task->state == TASK_INTERRUPTIBLE || task->state == TASK_UNINTERRUPTIBLE) { // Move to ready state task->state = TASK_READY; // Add to scheduler's ready queue enqueue_task(task); // Potentially preempt current if woken task has higher priority check_preempt_curr(task); } spin_unlock_irqrestore(&task->state_lock, flags);} // Transition: Running → Terminated (Exit)void do_exit(int exit_code) { struct task_struct *current_task = current; // Set exit code in PCB current_task->exit_code = exit_code; // Release resources (files, memory, etc.) exit_mm(current_task); exit_files(current_task); exit_fs(current_task); // Reparent children to init forget_original_parent(current_task); // Enter zombie state current_task->state = TASK_ZOMBIE; // Notify parent do_notify_parent(current_task, current_task->exit_signal); // Give up CPU forever schedule(); // Never returns}The five-state model is a pedagogical simplification. Real operating systems have additional states to handle specific scenarios more precisely. Let's examine the extended state models used in practice.
Linux Process States
Linux defines the state field in task_struct with these values:
| State Code | Name | Meaning |
|---|---|---|
R | TASK_RUNNING | Running or on run queue (Ready) |
S | TASK_INTERRUPTIBLE | Sleeping (interruptible wait) |
D | TASK_UNINTERRUPTIBLE | Sleeping (uninterruptible wait, typically disk I/O) |
T | TASK_STOPPED | Stopped (by signal or debugger) |
t | TASK_TRACED | Being traced (debugger has control) |
Z | TASK_ZOMBIE | Zombie (terminated, waiting for parent) |
X | TASK_DEAD | Dead (should never be seen in ps) |
I | TASK_IDLE | Idle kernel thread (not waiting on events) |
P | TASK_PARKED | Parked (kernel thread is stopped) |
Key Distinctions:
TASK_INTERRUPTIBLE vs TASK_UNINTERRUPTIBLE: Both are 'blocked' states, but with different signal handling:
S (Interruptible): Process can be woken by signals. If you send SIGTERM, it will handle the signal after waking.D (Uninterruptible): Process ignores signals until the event completes. Used for critical kernel operations like disk I/O where interrupted operations could corrupt data.The D state is infamous because you cannot kill a process in this state—even kill -9 won't work. It must wait for the kernel operation to complete.
12345678910111213141516171819202122232425
// From include/linux/sched.h /* Used in task->state field: */#define TASK_RUNNING 0x0000#define TASK_INTERRUPTIBLE 0x0001#define TASK_UNINTERRUPTIBLE 0x0002#define __TASK_STOPPED 0x0004#define __TASK_TRACED 0x0008 /* Used in task->exit_state: */#define EXIT_DEAD 0x0010#define EXIT_ZOMBIE 0x0020#define EXIT_TRACE (EXIT_ZOMBIE | EXIT_DEAD) /* Used in task->state again: */#define TASK_PARKED 0x0040#define TASK_DEAD 0x0080#define TASK_WAKEKILL 0x0100#define TASK_WAKING 0x0200#define TASK_NOLOAD 0x0400#define TASK_NEW 0x0800#define TASK_STATE_MAX 0x1000 // Note: States are bitmasks, allowing combinations// TASK_KILLABLE = TASK_WAKEKILL | TASK_UNINTERRUPTIBLEThe process state isn't just an abstract concept—it's a concrete field in the PCB that the kernel reads and modifies frequently. Let's examine how this field is implemented and accessed.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
// Linux: How the state field is used // From include/linux/sched.hstruct task_struct { // ... other fields /* -1 unrunnable, 0 runnable, >0 stopped */ volatile long state; unsigned int flags; // PF_EXITING, PF_KTHREAD, etc. // ... other fields}; // Setting state (with WRITE_ONCE for safety)static inline void set_current_state(long state){ WRITE_ONCE(current->state, state); // Memory barrier to ensure state is visible to other CPUs smp_mb();} // Checking state safelystatic inline int task_is_running(struct task_struct *p){ return READ_ONCE(p->state) == TASK_RUNNING;} // Common pattern: Sleep until condition is metvoid wait_for_condition(void) { // Set state BEFORE checking condition (critical ordering!) set_current_state(TASK_INTERRUPTIBLE); while (!condition_met()) { // Check for pending signals if (signal_pending(current)) { set_current_state(TASK_RUNNING); return -EINTR; } // Give up CPU - state is already set schedule(); // Back from schedule - set state again for next iteration set_current_state(TASK_INTERRUPTIBLE); } // Done waiting - back to running set_current_state(TASK_RUNNING);}Setting state before checking the condition is critical. If you check the condition first, then set state, a wakeup could occur between those two steps and be lost. This race condition is a classic kernel bug pattern. The state must be set BEFORE checking, so any wakeup between set and check will succeed.
State as a Bitmask:
In Linux, process states are bitmasks rather than simple integers. This allows combinations:
// TASK_KILLABLE: Uninterruptible, but can be killed
#define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
// Check if task is in any sleeping state
#define task_is_stopped_or_traced(task)
((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)
Volatile and Memory Barriers:
The state field is marked volatile because:
Memory barriers (smp_mb()) ensure that state changes are visible across CPUs in the correct order. Without barriers, out-of-order execution could cause a wakeup to be lost or a process to run when it shouldn't.
When a process enters the BLOCKED state, it needs to be tracked so it can be woken when the awaited event occurs. This is where wait queues come in—data structures that associate sleeping processes with specific events.
What Is a Wait Queue?
A wait queue is a list of processes waiting for a particular event. Each type of event has its own wait queue:
When the event occurs (interrupt, kernel action, or another process), the wait queue is processed and waiting processes move to READY.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
// Linux Wait Queue Implementation (simplified) // Wait queue head - represents the eventtypedef struct wait_queue_head { spinlock_t lock; struct list_head task_list;} wait_queue_head_t; // Wait queue entry - represents one waiting tasktypedef struct wait_queue_entry { unsigned int flags; struct task_struct *private; // The waiting task wait_queue_func_t func; // Wake function struct list_head entry; // Linkage in queue} wait_queue_entry_t; // Initialize a wait queue (usually at compile time)#define DECLARE_WAIT_QUEUE_HEAD(name) \ wait_queue_head_t name = { \ .lock = __SPIN_LOCK_UNLOCKED(name.lock), \ .task_list = LIST_HEAD_INIT(name.task_list) \ } // The sleep macro - blocks until condition is true#define wait_event(wq_head, condition) \do { \ might_sleep(); \ if (!(condition)) \ __wait_event(wq_head, condition); \} while (0) // Internal implementation#define __wait_event(wq_head, condition) \ (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, \ 0, schedule()) // The wakeup function - wakes all or first waitervoid wake_up(wait_queue_head_t *wq_head) { unsigned long flags; spin_lock_irqsave(&wq_head->lock, flags); // Walk through all waiting entries list_for_each_entry(entry, &wq_head->task_list, entry) { // Call the wake function for each entry if (entry->func(entry, TASK_NORMAL, 0, NULL)) { // Task was successfully woken } } spin_unlock_irqrestore(&wq_head->lock, flags);} // Example: Network socket receive wait queuevoid tcp_recvmsg(...) { // ... // Wait for data to arrive while ((skb = skb_peek(&sk->receive_queue)) == NULL) { if (signal_pending(current)) return -EINTR; // Sleep on socket's wait queue wait_event_interruptible(sk->sk_wq, skb_peek(&sk->receive_queue) != NULL); } // Data available - process it} // Corresponding wakeup when data arrivesvoid tcp_data_ready(struct sock *sk) { // Wake up anyone waiting for data on this socket wake_up_interruptible_sync_poll(&sk->sk_wq, EPOLLIN | EPOLLPRI);}When many processes wait for the same event, waking all of them (thundering herd) wastes CPU. Linux supports exclusive wakeups where only one waiter is woken. This is critical for resources like mutexes where only one process can proceed anyway.
A zombie process is a process that has finished execution but still has an entry in the process table. It's "dead" in that it can never run again—its resources (memory, files, etc.) have been released—but it's not gone because its PCB persists.
The zombie state exists because Unix needs to inform parent processes how their children terminated. The exit status—whether the child succeeded, failed, or was killed by a signal—must be preserved until the parent asks for it via wait(). The PCB is the only safe place to store this information after the process's other resources are freed.
The Zombie Problem:
Normally, zombies are temporary. The parent calls wait(), reads the exit status, and the kernel releases the PCB. But if the parent never calls wait():
Zombie Accumulation:
A poorly written server that forks children but never waits can accumulate thousands of zombies. Eventually:
Detecting Zombies:
1234567891011121314151617181920212223
# Detecting zombie processes # List all zombie processesps aux | awk '$8 ~ /^Z/ {print}' # Count zombiesps aux | awk '$8 ~ /^Z/ {count++} END {print "Zombie count: " count}' # Find parent of zombies (to fix the real problem)ps -eo pid,ppid,stat,comm | awk '$3 ~ /^Z/ {print "Zombie PID:", $1, "Parent:", $2}' # Typical output showing zombies:# PID TTY STAT TIME COMMAND# 12345 ? Z 0:00 [defunct]# 12346 ? Z 0:00 [defunct] # In /proc/PID/status, zombie shows as:# State: Z (zombie) # On Linux, you can see more detail:cat /proc/12345/status | grep -E "^(State|PPid):"# State: Z (zombie)# PPid: 1000Preventing Zombies:
Proper process management prevents zombie accumulation:
Call wait() or waitpid(): The parent must collect exit status
Use SIGCHLD handler: Handle child death asynchronously:
signal(SIGCHLD, SIG_IGN); // Kernel auto-reaps children
// Or:
while (waitpid(-1, NULL, WNOHANG) > 0); // Reap all zombies
Double fork: Let child become orphan adopted by init:
if (fork() == 0) {
if (fork() == 0) {
// Grandchild: do actual work
}
_exit(0); // Child exits immediately
}
wait(NULL); // Parent reaps child quickly
// Grandchild is now adopted by init, which always waits
Orphans vs Zombies:
| Orphan | Zombie | |
|---|---|---|
| Parent Status | Parent died | Parent alive (not waiting) |
| Process Status | Still running | Terminated |
| Solution | Adopted by init | Parent must wait() |
| Resource Usage | Full process resources | PCB only |
Understanding how to view and interpret process state is essential for system administration and debugging. Let's examine the tools and techniques across different operating systems.
12345678910111213141516171819202122232425262728293031323334353637383940414243
# Linux: Viewing Process State # ps with state columnps aux | head -5# USER PID %CPU %MEM VSZ RSS TTY STAT TIME COMMAND# root 1 0.0 0.1 167916 11268 ? Ss Jan01 0:15 /sbin/init# root 2 0.0 0.0 0 0 ? S Jan01 0:00 [kthreadd]# root 3 0.0 0.0 0 0 ? I< Jan01 0:00 [rcu_gp]# root 4 0.0 0.0 0 0 ? I< Jan01 0:00 [rcu_par_gp] # STAT column meanings:# First character: Main state (R, S, D, T, Z, etc.)# Additional characters:# s = session leader# < = high priority (nice < 0)# N = low priority (nice > 0)# L = has pages locked in memory# + = foreground process group# l = multi-threaded (CLONE_THREAD) # Detailed state from /proccat /proc/1/status | grep -E "^(State|Threads|SigQ):"# State: S (sleeping)# Threads: 1# SigQ: 0/63636 # Trace state changes with stracestrace -e trace=none -e signal=all -p 1234# --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, ...} --- # Using top for real-time state monitoringtop -d 1 -b | grep -E "^(Tasks|%Cpu)"# Tasks: 256 total, 1 running, 255 sleeping, 0 stopped, 0 zombie# %Cpu(s): 2.3 us, 0.7 sy, 0.0 ni, 96.8 id, 0.2 wa, 0.0 hi, 0.0 si # Breakdown of all process statesps -eo stat | sort | uniq -c | sort -rn# 180 S# 45 Ss# 20 Sl# 5 R# 2 D# 1 Z| STAT | Meaning | Typical Cause |
|---|---|---|
R | Running or runnable | Actively computing or in run queue |
S | Interruptible sleep | Waiting for input, network, timer; common idle state |
Ss | Interruptible sleep, session leader | Shell, daemon process |
Sl | Interruptible sleep, multi-threaded | Java/Python process, database |
S+ | Foreground sleep | Interactive program waiting for input |
D | Uninterruptible sleep | Usually disk I/O; might indicate I/O problem if persistent |
T | Stopped | Ctrl+Z, or SIGSTOP, or debugger |
Z | Zombie | Exited but parent hasn't called wait() |
I | Idle kernel thread | Kernel housekeeping thread with no work |
A process stuck in D (uninterruptible sleep) for extended periods indicates a problem—usually I/O that isn't completing. This could be a failed disk, NFS server down, or kernel bug. Because D-state processes can't be killed, they require resolving the underlying I/O issue.
We've deeply explored process state—from the theoretical five-state model to practical implementation details. Let's consolidate the essential knowledge:
What's Next:
We've seen that the PCB's state field tells us where the process is in its lifecycle. But how does the process know where to resume execution? The next page examines the Program Counter—the field that stores the instruction address, enabling the kernel to pause and resume processes at exactly the right point in their code.
You now understand process states comprehensively: the theoretical model, the real-world implementations, the transition mechanisms, wait queues, zombies, and diagnostic techniques. This knowledge is fundamental to understanding scheduling, debugging hung processes, and designing robust process management.