Process Control Block - Learning Module

Loading content...

0/240

Process State: The Lifecycle Status Indicator

Where Is Your Process Right Now?

At any given instant, every process in the system is in exactly one state. The state tells the kernel—and by extension, system administrators and debuggers—what the process is doing and what it's capable of doing. Is it actively executing instructions? Is it waiting for keyboard input? Is it finished but still lingering in memory?

The process state is arguably the most important field in the PCB because it determines how the scheduler treats the process. A process in the RUNNING state owns a CPU. A process in the BLOCKED state won't be considered for scheduling at all. A process in the ZOMBIE state has finished execution but can't release its PCB until its parent collects its exit status.

Understanding process states is essential for understanding everything from system responsiveness to debugging hung applications to preventing resource leaks. When you run ps and see cryptic letters like S, R, D, or Z, you're looking directly at process state information.

What You Will Learn

By the end of this page, you will master the process state model: the five fundamental states, the transitions between them, extended states used in real operating systems, and how state changes are implemented in the kernel. You'll understand why processes get stuck, how the scheduler uses state to make decisions, and how to diagnose process behavior using state information.

The Five-State Process Model

The classic process lifecycle is described by a five-state model. While real operating systems have additional states for specific scenarios, these five states capture the essential phases every process goes through:

New (Created): The process is being created
Ready: The process can run but is waiting for CPU time
Running: The process is actively executing on a CPU
Blocked (Waiting): The process cannot proceed until an external event occurs
Terminated (Exit): The process has finished execution

Converting Mermaid diagram...

Detailed State Descriptions:

The Five Fundamental Process States
State	Description	PCB Location	Scheduler Treatment
New	Process is being created. Memory is being allocated, PCB initialized, executable loaded.	Just allocated, not yet in any queue	Not visible to scheduler yet
Ready	Process has everything it needs to run except CPU time. It's runnable but not running.	In the ready queue (or one of multiple priority queues)	Candidate for selection; will be dispatched when chosen
Running	Process is actively executing instructions on a CPU core.	Associated with a specific CPU	Currently executing; scheduler monitors time slice
Blocked/Waiting	Process cannot proceed until some event occurs (I/O completion, timer, signal, etc.)	In a wait queue specific to the awaited event	Not considered for scheduling until event occurs
Terminated	Process has finished (exit called or killed). Resources released except PCB with exit status.	In zombie list or marked for cleanup	Will never run again; awaiting parent's wait()

State Transitions in Detail

Processes don't remain in one state forever. They transition between states as they execute, wait for resources, get scheduled, and eventually terminate. Understanding these transitions is crucial for understanding how the operating system manages processes.

State Transitions Explained

•New → Ready (Admit): Creation complete. The process has valid memory, code is loaded, file descriptors inherited, and the PCB is initialized. The kernel puts the process on the ready queue, making it eligible for scheduling.
•Ready → Running (Dispatch): The scheduler selects this process to run. The dispatcher performs a context switch: loads the process's saved registers, switches to its address space, and jumps to its saved program counter.
•Running → Ready (Preempt): The process's time slice expires, or a higher-priority process becomes ready, or the process voluntarily yields. The kernel saves the process state and moves it back to the ready queue.
•Running → Blocked (Block): The process needs something unavailable: a disk read, network data, a lock held by another process, or a timer to expire. The kernel puts it in the appropriate wait queue.
•Blocked → Ready (Wake): The awaited event occurs. An interrupt handler or kernel thread moves the process from the wait queue to the ready queue. It will run again when the scheduler selects it.
•Running → Terminated (Exit): The process calls exit(), returns from main(), or is killed by a signal. The kernel releases most resources but keeps the PCB with the exit status until the parent calls wait().

Impossible Transitions

Not all transitions are valid. A process CANNOT go directly from Blocked to Running—it must go through Ready first. A Terminated process can never become Ready again. Understanding which transitions are impossible helps debug scheduling anomalies.

Why can't Blocked → Running happen directly?

Consider the scenario: Process A is blocked waiting for disk I/O. The disk interrupt fires—the data is ready. Why not just let A continue immediately?

The answer involves fairness and scheduler integrity:

Another process might be running: The CPU isn't just waiting for A. Process B is using it. Interrupting B mid-instruction to run A would violate scheduling decisions already made.
Scheduler invariants: The scheduler expects to choose from the ready queue. Bypassing this breaks priority systems, fair-share accounting, and affinity rules.
Multiple wakeups: If 10 processes are waiting for the same event, they can't all go to Running simultaneously (on a single CPU).

Instead, the event moves A to Ready. When the scheduler runs next (possibly immediately after the interrupt), A will compete for CPU time through normal scheduling mechanisms.

state_transition_kernel_code.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
// Kernel pseudocode for state transitions
 
// Transition: Ready → Running (Dispatch)
void dispatch_process(struct task_struct *next) {
    struct task_struct *prev = current;
    
    // Save current process state
    if (prev->state == TASK_RUNNING) {
        prev->state = TASK_READY;  // Back to ready queue
    }
    
    // Switch to next process
    next->state = TASK_RUNNING;
    context_switch(prev, next);
}
 
// Transition: Running → Blocked (Block/Sleep)
void sleep_on_event(wait_queue_head_t *queue) {
    struct task_struct *current_task = current;
    
    // Atomically: set state and add to wait queue
    current_task->state = TASK_INTERRUPTIBLE;  // Blocked state
    add_wait_queue(queue, &current_task->wait_entry);
    
    // Give up CPU - scheduler will pick another process
    schedule();  // Returns when we're woken up
    
    // We're back - remove from wait queue
    remove_wait_queue(queue, &current_task->wait_entry);
}
 
// Transition: Blocked → Ready (Wake)
void wake_up_process(struct task_struct *task) {
    unsigned long flags;
    
    spin_lock_irqsave(&task->state_lock, flags);
    
    if (task->state == TASK_INTERRUPTIBLE || 
        task->state == TASK_UNINTERRUPTIBLE) {
        
        // Move to ready state
        task->state = TASK_READY;
        
        // Add to scheduler's ready queue
        enqueue_task(task);
        
        // Potentially preempt current if woken task has higher priority
        check_preempt_curr(task);
    }
    
    spin_unlock_irqrestore(&task->state_lock, flags);
}
 
// Transition: Running → Terminated (Exit)
void do_exit(int exit_code) {
    struct task_struct *current_task = current;
    
    // Set exit code in PCB
    current_task->exit_code = exit_code;
    
    // Release resources (files, memory, etc.)
    exit_mm(current_task);
    exit_files(current_task);
    exit_fs(current_task);
    
    // Reparent children to init
    forget_original_parent(current_task);
    
    // Enter zombie state
    current_task->state = TASK_ZOMBIE;
    
    // Notify parent
    do_notify_parent(current_task, current_task->exit_signal);
    
    // Give up CPU forever
    schedule();  // Never returns
}

Extended State Models in Real Operating Systems

The five-state model is a pedagogical simplification. Real operating systems have additional states to handle specific scenarios more precisely. Let's examine the extended state models used in practice.

Linux Process States

Linux defines the state field in task_struct with these values:

Linux Process States (from /proc/PID/status)
State Code	Name	Meaning
`R`	TASK_RUNNING	Running or on run queue (Ready)
`S`	TASK_INTERRUPTIBLE	Sleeping (interruptible wait)
`D`	TASK_UNINTERRUPTIBLE	Sleeping (uninterruptible wait, typically disk I/O)
`T`	TASK_STOPPED	Stopped (by signal or debugger)
`t`	TASK_TRACED	Being traced (debugger has control)
`Z`	TASK_ZOMBIE	Zombie (terminated, waiting for parent)
`X`	TASK_DEAD	Dead (should never be seen in ps)
`I`	TASK_IDLE	Idle kernel thread (not waiting on events)
`P`	TASK_PARKED	Parked (kernel thread is stopped)

Key Distinctions:

TASK_INTERRUPTIBLE vs TASK_UNINTERRUPTIBLE: Both are 'blocked' states, but with different signal handling:

S (Interruptible): Process can be woken by signals. If you send SIGTERM, it will handle the signal after waking.
D (Uninterruptible): Process ignores signals until the event completes. Used for critical kernel operations like disk I/O where interrupted operations could corrupt data.

The D state is infamous because you cannot kill a process in this state—even kill -9 won't work. It must wait for the kernel operation to complete.

linux_state_values.h
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// From include/linux/sched.h
 
/* Used in task->state field: */
#define TASK_RUNNING            0x0000
#define TASK_INTERRUPTIBLE      0x0001
#define TASK_UNINTERRUPTIBLE    0x0002
#define __TASK_STOPPED          0x0004
#define __TASK_TRACED           0x0008
 
/* Used in task->exit_state: */
#define EXIT_DEAD               0x0010
#define EXIT_ZOMBIE             0x0020
#define EXIT_TRACE              (EXIT_ZOMBIE | EXIT_DEAD)
 
/* Used in task->state again: */
#define TASK_PARKED             0x0040
#define TASK_DEAD               0x0080
#define TASK_WAKEKILL           0x0100
#define TASK_WAKING             0x0200
#define TASK_NOLOAD             0x0400
#define TASK_NEW                0x0800
#define TASK_STATE_MAX          0x1000
 
// Note: States are bitmasks, allowing combinations
// TASK_KILLABLE = TASK_WAKEKILL | TASK_UNINTERRUPTIBLE

Implementation: The State Field in the PCB

The process state isn't just an abstract concept—it's a concrete field in the PCB that the kernel reads and modifies frequently. Let's examine how this field is implemented and accessed.

linux_task_state.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// Linux: How the state field is used
 
// From include/linux/sched.h
struct task_struct {
    // ... other fields
    
    /* -1 unrunnable, 0 runnable, >0 stopped */
    volatile long state;
    
    unsigned int flags;  // PF_EXITING, PF_KTHREAD, etc.
    
    // ... other fields
};
 
// Setting state (with WRITE_ONCE for safety)
static inline void set_current_state(long state)
{
    WRITE_ONCE(current->state, state);
    // Memory barrier to ensure state is visible to other CPUs
    smp_mb();
}
 
// Checking state safely
static inline int task_is_running(struct task_struct *p)
{
    return READ_ONCE(p->state) == TASK_RUNNING;
}
 
// Common pattern: Sleep until condition is met
void wait_for_condition(void) {
    // Set state BEFORE checking condition (critical ordering!)
    set_current_state(TASK_INTERRUPTIBLE);
    
    while (!condition_met()) {
        // Check for pending signals
        if (signal_pending(current)) {
            set_current_state(TASK_RUNNING);
            return -EINTR;
        }
        
        // Give up CPU - state is already set
        schedule();
        
        // Back from schedule - set state again for next iteration
        set_current_state(TASK_INTERRUPTIBLE);
    }
    
    // Done waiting - back to running
    set_current_state(TASK_RUNNING);
}

The Lost Wakeup Problem

Setting state before checking the condition is critical. If you check the condition first, then set state, a wakeup could occur between those two steps and be lost. This race condition is a classic kernel bug pattern. The state must be set BEFORE checking, so any wakeup between set and check will succeed.

State as a Bitmask:

In Linux, process states are bitmasks rather than simple integers. This allows combinations:

// TASK_KILLABLE: Uninterruptible, but can be killed
#define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)

// Check if task is in any sleeping state
#define task_is_stopped_or_traced(task) 
    ((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)

Volatile and Memory Barriers:

The state field is marked volatile because:

Another CPU might modify it (via wakeup)
The scheduler reads it frequently
Interrupt handlers change it

Memory barriers (smp_mb()) ensure that state changes are visible across CPUs in the correct order. Without barriers, out-of-order execution could cause a wakeup to be lost or a process to run when it shouldn't.

Converting Mermaid diagram...

Wait Queues and Event-Based Wakeup

When a process enters the BLOCKED state, it needs to be tracked so it can be woken when the awaited event occurs. This is where wait queues come in—data structures that associate sleeping processes with specific events.

What Is a Wait Queue?

A wait queue is a list of processes waiting for a particular event. Each type of event has its own wait queue:

Disk read completion: Processes waiting for specific blocks
Network socket: Processes waiting for data on that socket
Timer expiration: Processes sleeping for a duration
Lock acquisition: Processes waiting for a mutex or semaphore
Child termination: Parent processes waiting for children

When the event occurs (interrupt, kernel action, or another process), the wait queue is processed and waiting processes move to READY.

linux_wait_queue.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
// Linux Wait Queue Implementation (simplified)
 
// Wait queue head - represents the event
typedef struct wait_queue_head {
    spinlock_t lock;
    struct list_head task_list;
} wait_queue_head_t;
 
// Wait queue entry - represents one waiting task
typedef struct wait_queue_entry {
    unsigned int flags;
    struct task_struct *private;  // The waiting task
    wait_queue_func_t func;       // Wake function
    struct list_head entry;       // Linkage in queue
} wait_queue_entry_t;
 
// Initialize a wait queue (usually at compile time)
#define DECLARE_WAIT_QUEUE_HEAD(name) \
    wait_queue_head_t name = { \
        .lock = __SPIN_LOCK_UNLOCKED(name.lock), \
        .task_list = LIST_HEAD_INIT(name.task_list) \
    }
 
// The sleep macro - blocks until condition is true
#define wait_event(wq_head, condition) \
do { \
    might_sleep(); \
    if (!(condition)) \
        __wait_event(wq_head, condition); \
} while (0)
 
// Internal implementation
#define __wait_event(wq_head, condition) \
    (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, \
                        0, schedule())
 
// The wakeup function - wakes all or first waiter
void wake_up(wait_queue_head_t *wq_head) {
    unsigned long flags;
    
    spin_lock_irqsave(&wq_head->lock, flags);
    
    // Walk through all waiting entries
    list_for_each_entry(entry, &wq_head->task_list, entry) {
        // Call the wake function for each entry
        if (entry->func(entry, TASK_NORMAL, 0, NULL)) {
            // Task was successfully woken
        }
    }
    
    spin_unlock_irqrestore(&wq_head->lock, flags);
}
 
// Example: Network socket receive wait queue
void tcp_recvmsg(...) {
    // ...
    
    // Wait for data to arrive
    while ((skb = skb_peek(&sk->receive_queue)) == NULL) {
        if (signal_pending(current))
            return -EINTR;
        
        // Sleep on socket's wait queue
        wait_event_interruptible(sk->sk_wq, 
            skb_peek(&sk->receive_queue) != NULL);
    }
    
    // Data available - process it
}
 
// Corresponding wakeup when data arrives
void tcp_data_ready(struct sock *sk) {
    // Wake up anyone waiting for data on this socket
    wake_up_interruptible_sync_poll(&sk->sk_wq, EPOLLIN | EPOLLPRI);
}

Exclusive vs Non-Exclusive Wakeups

When many processes wait for the same event, waking all of them (thundering herd) wastes CPU. Linux supports exclusive wakeups where only one waiter is woken. This is critical for resources like mutexes where only one process can proceed anyway.

Converting Mermaid diagram...

Zombie Processes: The Undead State

A zombie process is a process that has finished execution but still has an entry in the process table. It's "dead" in that it can never run again—its resources (memory, files, etc.) have been released—but it's not gone because its PCB persists.

Why Do Zombies Exist?

The zombie state exists because Unix needs to inform parent processes how their children terminated. The exit status—whether the child succeeded, failed, or was killed by a signal—must be preserved until the parent asks for it via wait(). The PCB is the only safe place to store this information after the process's other resources are freed.

The Zombie Problem:

Normally, zombies are temporary. The parent calls wait(), reads the exit status, and the kernel releases the PCB. But if the parent never calls wait():

The zombie PCB remains indefinitely
The PID cannot be reused
The system accumulates zombie entries in the process table

Zombie Accumulation:

A poorly written server that forks children but never waits can accumulate thousands of zombies. Eventually:

PID space is exhausted (no new processes can be created)
Process table entries are wasted
System monitoring tools show misleading process counts

Detecting Zombies:

detecting_zombies.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Detecting zombie processes
 
# List all zombie processes
ps aux | awk '$8 ~ /^Z/ {print}'
 
# Count zombies
ps aux | awk '$8 ~ /^Z/ {count++} END {print "Zombie count: " count}'
 
# Find parent of zombies (to fix the real problem)
ps -eo pid,ppid,stat,comm | awk '$3 ~ /^Z/ {print "Zombie PID:", $1, "Parent:", $2}'
 
# Typical output showing zombies:
#   PID TTY      STAT   TIME COMMAND
# 12345 ?        Z      0:00 [defunct]
# 12346 ?        Z      0:00 [defunct]
 
# In /proc/PID/status, zombie shows as:
# State:  Z (zombie)
 
# On Linux, you can see more detail:
cat /proc/12345/status | grep -E "^(State|PPid):"
# State:  Z (zombie)
# PPid:   1000

Preventing Zombies:

Proper process management prevents zombie accumulation:

Call wait() or waitpid(): The parent must collect exit status

Use SIGCHLD handler: Handle child death asynchronously:

signal(SIGCHLD, SIG_IGN);  // Kernel auto-reaps children
// Or:
while (waitpid(-1, NULL, WNOHANG) > 0);  // Reap all zombies

Double fork: Let child become orphan adopted by init:

if (fork() == 0) {
    if (fork() == 0) {
        // Grandchild: do actual work
    }
    _exit(0);  // Child exits immediately
}
wait(NULL);  // Parent reaps child quickly
// Grandchild is now adopted by init, which always waits

Orphans vs Zombies:

	Orphan	Zombie
Parent Status	Parent died	Parent alive (not waiting)
Process Status	Still running	Terminated
Solution	Adopted by init	Parent must wait()
Resource Usage	Full process resources	PCB only

Viewing Process State in Practice

Understanding how to view and interpret process state is essential for system administration and debugging. Let's examine the tools and techniques across different operating systems.

viewing_process_state.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# Linux: Viewing Process State
 
# ps with state column
ps aux | head -5
# USER       PID %CPU %MEM    VSZ   RSS TTY      STAT   TIME COMMAND
# root         1  0.0  0.1 167916 11268 ?        Ss   Jan01   0:15 /sbin/init
# root         2  0.0  0.0      0     0 ?        S    Jan01   0:00 [kthreadd]
# root         3  0.0  0.0      0     0 ?        I<   Jan01   0:00 [rcu_gp]
# root         4  0.0  0.0      0     0 ?        I<   Jan01   0:00 [rcu_par_gp]
 
# STAT column meanings:
# First character: Main state (R, S, D, T, Z, etc.)
# Additional characters:
#   s = session leader
#   < = high priority (nice < 0)
#   N = low priority (nice > 0)
#   L = has pages locked in memory
#   + = foreground process group
#   l = multi-threaded (CLONE_THREAD)
 
# Detailed state from /proc
cat /proc/1/status | grep -E "^(State|Threads|SigQ):"
# State:  S (sleeping)
# Threads:        1
# SigQ:   0/63636
 
# Trace state changes with strace
strace -e trace=none -e signal=all -p 1234
# --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, ...} ---
 
# Using top for real-time state monitoring
top -d 1 -b | grep -E "^(Tasks|%Cpu)"
# Tasks: 256 total,   1 running, 255 sleeping,   0 stopped,   0 zombie
# %Cpu(s):  2.3 us,  0.7 sy,  0.0 ni, 96.8 id,  0.2 wa,  0.0 hi,  0.0 si
 
# Breakdown of all process states
ps -eo stat | sort | uniq -c | sort -rn
#     180 S
#      45 Ss
#      20 Sl
#       5 R
#       2 D
#       1 Z

Interpreting Common STAT Output
STAT	Meaning	Typical Cause
`R`	Running or runnable	Actively computing or in run queue
`S`	Interruptible sleep	Waiting for input, network, timer; common idle state
`Ss`	Interruptible sleep, session leader	Shell, daemon process
`Sl`	Interruptible sleep, multi-threaded	Java/Python process, database
`S+`	Foreground sleep	Interactive program waiting for input
`D`	Uninterruptible sleep	Usually disk I/O; might indicate I/O problem if persistent
`T`	Stopped	Ctrl+Z, or SIGSTOP, or debugger
`Z`	Zombie	Exited but parent hasn't called wait()
`I`	Idle kernel thread	Kernel housekeeping thread with no work

Red Flag: Stuck in D State

A process stuck in D (uninterruptible sleep) for extended periods indicates a problem—usually I/O that isn't completing. This could be a failed disk, NFS server down, or kernel bug. Because D-state processes can't be killed, they require resolving the underlying I/O issue.

Summary: Process State Management

We've deeply explored process state—from the theoretical five-state model to practical implementation details. Let's consolidate the essential knowledge:

Key Takeaways

•Every process is in exactly one state — The state field in the PCB tells the kernel whether the process can run, is running, is waiting, or is finished.
•The five-state model captures the essentials — New, Ready, Running, Blocked, Terminated. Real OSes add more states for specific scenarios.
•State transitions follow strict rules — Blocked must go through Ready before Running. Terminated never returns to Ready. Understanding valid transitions helps debug scheduling issues.
•Wait queues link blocked processes to events — Each type of wait has its own queue. When the event occurs, the kernel wakes processes from the appropriate queue.
•Zombies exist to preserve exit status — A process can't fully disappear until its parent collects its exit code. Zombie accumulation indicates parent process bugs.
•State is visible through tools — ps, top, /proc, and debuggers expose state information. The STAT column reveals exactly what each process is doing.

What's Next:

We've seen that the PCB's state field tells us where the process is in its lifecycle. But how does the process know where to resume execution? The next page examines the Program Counter—the field that stores the instruction address, enabling the kernel to pause and resume processes at exactly the right point in their code.

Page Complete

You now understand process states comprehensively: the theoretical model, the real-world implementations, the transition mechanisms, wait queues, zombies, and diagnostic techniques. This knowledge is fundamental to understanding scheduling, debugging hung processes, and designing robust process management.

Process State: The Lifecycle Status Indicator

Where Is Your Process Right Now?

What You Will Learn

The Five-State Process Model

New (Created): The process is being created
Ready: The process can run but is waiting for CPU time
Running: The process is actively executing on a CPU
Blocked (Waiting): The process cannot proceed until an external event occurs
Terminated (Exit): The process has finished execution

Converting Mermaid diagram...

Detailed State Descriptions:

The Five Fundamental Process States
State	Description	PCB Location	Scheduler Treatment
New	Process is being created. Memory is being allocated, PCB initialized, executable loaded.	Just allocated, not yet in any queue	Not visible to scheduler yet
Ready	Process has everything it needs to run except CPU time. It's runnable but not running.	In the ready queue (or one of multiple priority queues)	Candidate for selection; will be dispatched when chosen
Running	Process is actively executing instructions on a CPU core.	Associated with a specific CPU	Currently executing; scheduler monitors time slice
Blocked/Waiting	Process cannot proceed until some event occurs (I/O completion, timer, signal, etc.)	In a wait queue specific to the awaited event	Not considered for scheduling until event occurs
Terminated	Process has finished (exit called or killed). Resources released except PCB with exit status.	In zombie list or marked for cleanup	Will never run again; awaiting parent's wait()

State Transitions in Detail

State Transitions Explained

•New → Ready (Admit): Creation complete. The process has valid memory, code is loaded, file descriptors inherited, and the PCB is initialized. The kernel puts the process on the ready queue, making it eligible for scheduling.
•Ready → Running (Dispatch): The scheduler selects this process to run. The dispatcher performs a context switch: loads the process's saved registers, switches to its address space, and jumps to its saved program counter.
•Running → Ready (Preempt): The process's time slice expires, or a higher-priority process becomes ready, or the process voluntarily yields. The kernel saves the process state and moves it back to the ready queue.
•Running → Blocked (Block): The process needs something unavailable: a disk read, network data, a lock held by another process, or a timer to expire. The kernel puts it in the appropriate wait queue.
•Blocked → Ready (Wake): The awaited event occurs. An interrupt handler or kernel thread moves the process from the wait queue to the ready queue. It will run again when the scheduler selects it.
•Running → Terminated (Exit): The process calls exit(), returns from main(), or is killed by a signal. The kernel releases most resources but keeps the PCB with the exit status until the parent calls wait().

Impossible Transitions

Why can't Blocked → Running happen directly?

Consider the scenario: Process A is blocked waiting for disk I/O. The disk interrupt fires—the data is ready. Why not just let A continue immediately?

The answer involves fairness and scheduler integrity:

Another process might be running: The CPU isn't just waiting for A. Process B is using it. Interrupting B mid-instruction to run A would violate scheduling decisions already made.
Scheduler invariants: The scheduler expects to choose from the ready queue. Bypassing this breaks priority systems, fair-share accounting, and affinity rules.
Multiple wakeups: If 10 processes are waiting for the same event, they can't all go to Running simultaneously (on a single CPU).

Instead, the event moves A to Ready. When the scheduler runs next (possibly immediately after the interrupt), A will compete for CPU time through normal scheduling mechanisms.

state_transition_kernel_code.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
// Kernel pseudocode for state transitions
 
// Transition: Ready → Running (Dispatch)
void dispatch_process(struct task_struct *next) {
    struct task_struct *prev = current;
    
    // Save current process state
    if (prev->state == TASK_RUNNING) {
        prev->state = TASK_READY;  // Back to ready queue
    }
    
    // Switch to next process
    next->state = TASK_RUNNING;
    context_switch(prev, next);
}
 
// Transition: Running → Blocked (Block/Sleep)
void sleep_on_event(wait_queue_head_t *queue) {
    struct task_struct *current_task = current;
    
    // Atomically: set state and add to wait queue
    current_task->state = TASK_INTERRUPTIBLE;  // Blocked state
    add_wait_queue(queue, &current_task->wait_entry);
    
    // Give up CPU - scheduler will pick another process
    schedule();  // Returns when we're woken up
    
    // We're back - remove from wait queue
    remove_wait_queue(queue, &current_task->wait_entry);
}
 
// Transition: Blocked → Ready (Wake)
void wake_up_process(struct task_struct *task) {
    unsigned long flags;
    
    spin_lock_irqsave(&task->state_lock, flags);
    
    if (task->state == TASK_INTERRUPTIBLE || 
        task->state == TASK_UNINTERRUPTIBLE) {
        
        // Move to ready state
        task->state = TASK_READY;
        
        // Add to scheduler's ready queue
        enqueue_task(task);
        
        // Potentially preempt current if woken task has higher priority
        check_preempt_curr(task);
    }
    
    spin_unlock_irqrestore(&task->state_lock, flags);
}
 
// Transition: Running → Terminated (Exit)
void do_exit(int exit_code) {
    struct task_struct *current_task = current;
    
    // Set exit code in PCB
    current_task->exit_code = exit_code;
    
    // Release resources (files, memory, etc.)
    exit_mm(current_task);
    exit_files(current_task);
    exit_fs(current_task);
    
    // Reparent children to init
    forget_original_parent(current_task);
    
    // Enter zombie state
    current_task->state = TASK_ZOMBIE;
    
    // Notify parent
    do_notify_parent(current_task, current_task->exit_signal);
    
    // Give up CPU forever
    schedule();  // Never returns
}

Extended State Models in Real Operating Systems

Linux Process States

Linux defines the state field in task_struct with these values:

Linux Process States (from /proc/PID/status)
State Code	Name	Meaning
`R`	TASK_RUNNING	Running or on run queue (Ready)
`S`	TASK_INTERRUPTIBLE	Sleeping (interruptible wait)
`D`	TASK_UNINTERRUPTIBLE	Sleeping (uninterruptible wait, typically disk I/O)
`T`	TASK_STOPPED	Stopped (by signal or debugger)
`t`	TASK_TRACED	Being traced (debugger has control)
`Z`	TASK_ZOMBIE	Zombie (terminated, waiting for parent)
`X`	TASK_DEAD	Dead (should never be seen in ps)
`I`	TASK_IDLE	Idle kernel thread (not waiting on events)
`P`	TASK_PARKED	Parked (kernel thread is stopped)

Key Distinctions:

TASK_INTERRUPTIBLE vs TASK_UNINTERRUPTIBLE: Both are 'blocked' states, but with different signal handling:

S (Interruptible): Process can be woken by signals. If you send SIGTERM, it will handle the signal after waking.
D (Uninterruptible): Process ignores signals until the event completes. Used for critical kernel operations like disk I/O where interrupted operations could corrupt data.

The D state is infamous because you cannot kill a process in this state—even kill -9 won't work. It must wait for the kernel operation to complete.

linux_state_values.h
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// From include/linux/sched.h
 
/* Used in task->state field: */
#define TASK_RUNNING            0x0000
#define TASK_INTERRUPTIBLE      0x0001
#define TASK_UNINTERRUPTIBLE    0x0002
#define __TASK_STOPPED          0x0004
#define __TASK_TRACED           0x0008
 
/* Used in task->exit_state: */
#define EXIT_DEAD               0x0010
#define EXIT_ZOMBIE             0x0020
#define EXIT_TRACE              (EXIT_ZOMBIE | EXIT_DEAD)
 
/* Used in task->state again: */
#define TASK_PARKED             0x0040
#define TASK_DEAD               0x0080
#define TASK_WAKEKILL           0x0100
#define TASK_WAKING             0x0200
#define TASK_NOLOAD             0x0400
#define TASK_NEW                0x0800
#define TASK_STATE_MAX          0x1000
 
// Note: States are bitmasks, allowing combinations
// TASK_KILLABLE = TASK_WAKEKILL | TASK_UNINTERRUPTIBLE

Implementation: The State Field in the PCB

The process state isn't just an abstract concept—it's a concrete field in the PCB that the kernel reads and modifies frequently. Let's examine how this field is implemented and accessed.

linux_task_state.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// Linux: How the state field is used
 
// From include/linux/sched.h
struct task_struct {
    // ... other fields
    
    /* -1 unrunnable, 0 runnable, >0 stopped */
    volatile long state;
    
    unsigned int flags;  // PF_EXITING, PF_KTHREAD, etc.
    
    // ... other fields
};
 
// Setting state (with WRITE_ONCE for safety)
static inline void set_current_state(long state)
{
    WRITE_ONCE(current->state, state);
    // Memory barrier to ensure state is visible to other CPUs
    smp_mb();
}
 
// Checking state safely
static inline int task_is_running(struct task_struct *p)
{
    return READ_ONCE(p->state) == TASK_RUNNING;
}
 
// Common pattern: Sleep until condition is met
void wait_for_condition(void) {
    // Set state BEFORE checking condition (critical ordering!)
    set_current_state(TASK_INTERRUPTIBLE);
    
    while (!condition_met()) {
        // Check for pending signals
        if (signal_pending(current)) {
            set_current_state(TASK_RUNNING);
            return -EINTR;
        }
        
        // Give up CPU - state is already set
        schedule();
        
        // Back from schedule - set state again for next iteration
        set_current_state(TASK_INTERRUPTIBLE);
    }
    
    // Done waiting - back to running
    set_current_state(TASK_RUNNING);
}

The Lost Wakeup Problem

State as a Bitmask:

In Linux, process states are bitmasks rather than simple integers. This allows combinations:

// TASK_KILLABLE: Uninterruptible, but can be killed
#define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)

// Check if task is in any sleeping state
#define task_is_stopped_or_traced(task) 
    ((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)

Volatile and Memory Barriers:

The state field is marked volatile because:

Another CPU might modify it (via wakeup)
The scheduler reads it frequently
Interrupt handlers change it

Converting Mermaid diagram...

Wait Queues and Event-Based Wakeup

What Is a Wait Queue?

A wait queue is a list of processes waiting for a particular event. Each type of event has its own wait queue:

Disk read completion: Processes waiting for specific blocks
Network socket: Processes waiting for data on that socket
Timer expiration: Processes sleeping for a duration
Lock acquisition: Processes waiting for a mutex or semaphore
Child termination: Parent processes waiting for children

When the event occurs (interrupt, kernel action, or another process), the wait queue is processed and waiting processes move to READY.

linux_wait_queue.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
// Linux Wait Queue Implementation (simplified)
 
// Wait queue head - represents the event
typedef struct wait_queue_head {
    spinlock_t lock;
    struct list_head task_list;
} wait_queue_head_t;
 
// Wait queue entry - represents one waiting task
typedef struct wait_queue_entry {
    unsigned int flags;
    struct task_struct *private;  // The waiting task
    wait_queue_func_t func;       // Wake function
    struct list_head entry;       // Linkage in queue
} wait_queue_entry_t;
 
// Initialize a wait queue (usually at compile time)
#define DECLARE_WAIT_QUEUE_HEAD(name) \
    wait_queue_head_t name = { \
        .lock = __SPIN_LOCK_UNLOCKED(name.lock), \
        .task_list = LIST_HEAD_INIT(name.task_list) \
    }
 
// The sleep macro - blocks until condition is true
#define wait_event(wq_head, condition) \
do { \
    might_sleep(); \
    if (!(condition)) \
        __wait_event(wq_head, condition); \
} while (0)
 
// Internal implementation
#define __wait_event(wq_head, condition) \
    (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, \
                        0, schedule())
 
// The wakeup function - wakes all or first waiter
void wake_up(wait_queue_head_t *wq_head) {
    unsigned long flags;
    
    spin_lock_irqsave(&wq_head->lock, flags);
    
    // Walk through all waiting entries
    list_for_each_entry(entry, &wq_head->task_list, entry) {
        // Call the wake function for each entry
        if (entry->func(entry, TASK_NORMAL, 0, NULL)) {
            // Task was successfully woken
        }
    }
    
    spin_unlock_irqrestore(&wq_head->lock, flags);
}
 
// Example: Network socket receive wait queue
void tcp_recvmsg(...) {
    // ...
    
    // Wait for data to arrive
    while ((skb = skb_peek(&sk->receive_queue)) == NULL) {
        if (signal_pending(current))
            return -EINTR;
        
        // Sleep on socket's wait queue
        wait_event_interruptible(sk->sk_wq, 
            skb_peek(&sk->receive_queue) != NULL);
    }
    
    // Data available - process it
}
 
// Corresponding wakeup when data arrives
void tcp_data_ready(struct sock *sk) {
    // Wake up anyone waiting for data on this socket
    wake_up_interruptible_sync_poll(&sk->sk_wq, EPOLLIN | EPOLLPRI);
}

Exclusive vs Non-Exclusive Wakeups

Converting Mermaid diagram...

Zombie Processes: The Undead State

Why Do Zombies Exist?

The Zombie Problem:

Normally, zombies are temporary. The parent calls wait(), reads the exit status, and the kernel releases the PCB. But if the parent never calls wait():

The zombie PCB remains indefinitely
The PID cannot be reused
The system accumulates zombie entries in the process table

Zombie Accumulation:

A poorly written server that forks children but never waits can accumulate thousands of zombies. Eventually:

PID space is exhausted (no new processes can be created)
Process table entries are wasted
System monitoring tools show misleading process counts

Detecting Zombies:

detecting_zombies.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Detecting zombie processes
 
# List all zombie processes
ps aux | awk '$8 ~ /^Z/ {print}'
 
# Count zombies
ps aux | awk '$8 ~ /^Z/ {count++} END {print "Zombie count: " count}'
 
# Find parent of zombies (to fix the real problem)
ps -eo pid,ppid,stat,comm | awk '$3 ~ /^Z/ {print "Zombie PID:", $1, "Parent:", $2}'
 
# Typical output showing zombies:
#   PID TTY      STAT   TIME COMMAND
# 12345 ?        Z      0:00 [defunct]
# 12346 ?        Z      0:00 [defunct]
 
# In /proc/PID/status, zombie shows as:
# State:  Z (zombie)
 
# On Linux, you can see more detail:
cat /proc/12345/status | grep -E "^(State|PPid):"
# State:  Z (zombie)
# PPid:   1000

Preventing Zombies:

Proper process management prevents zombie accumulation:

Call wait() or waitpid(): The parent must collect exit status

Use SIGCHLD handler: Handle child death asynchronously:

signal(SIGCHLD, SIG_IGN);  // Kernel auto-reaps children
// Or:
while (waitpid(-1, NULL, WNOHANG) > 0);  // Reap all zombies

Double fork: Let child become orphan adopted by init:

if (fork() == 0) {
    if (fork() == 0) {
        // Grandchild: do actual work
    }
    _exit(0);  // Child exits immediately
}
wait(NULL);  // Parent reaps child quickly
// Grandchild is now adopted by init, which always waits

Orphans vs Zombies:

	Orphan	Zombie
Parent Status	Parent died	Parent alive (not waiting)
Process Status	Still running	Terminated
Solution	Adopted by init	Parent must wait()
Resource Usage	Full process resources	PCB only

Viewing Process State in Practice

Understanding how to view and interpret process state is essential for system administration and debugging. Let's examine the tools and techniques across different operating systems.

viewing_process_state.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# Linux: Viewing Process State
 
# ps with state column
ps aux | head -5
# USER       PID %CPU %MEM    VSZ   RSS TTY      STAT   TIME COMMAND
# root         1  0.0  0.1 167916 11268 ?        Ss   Jan01   0:15 /sbin/init
# root         2  0.0  0.0      0     0 ?        S    Jan01   0:00 [kthreadd]
# root         3  0.0  0.0      0     0 ?        I<   Jan01   0:00 [rcu_gp]
# root         4  0.0  0.0      0     0 ?        I<   Jan01   0:00 [rcu_par_gp]
 
# STAT column meanings:
# First character: Main state (R, S, D, T, Z, etc.)
# Additional characters:
#   s = session leader
#   < = high priority (nice < 0)
#   N = low priority (nice > 0)
#   L = has pages locked in memory
#   + = foreground process group
#   l = multi-threaded (CLONE_THREAD)
 
# Detailed state from /proc
cat /proc/1/status | grep -E "^(State|Threads|SigQ):"
# State:  S (sleeping)
# Threads:        1
# SigQ:   0/63636
 
# Trace state changes with strace
strace -e trace=none -e signal=all -p 1234
# --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, ...} ---
 
# Using top for real-time state monitoring
top -d 1 -b | grep -E "^(Tasks|%Cpu)"
# Tasks: 256 total,   1 running, 255 sleeping,   0 stopped,   0 zombie
# %Cpu(s):  2.3 us,  0.7 sy,  0.0 ni, 96.8 id,  0.2 wa,  0.0 hi,  0.0 si
 
# Breakdown of all process states
ps -eo stat | sort | uniq -c | sort -rn
#     180 S
#      45 Ss
#      20 Sl
#       5 R
#       2 D
#       1 Z

Interpreting Common STAT Output
STAT	Meaning	Typical Cause
`R`	Running or runnable	Actively computing or in run queue
`S`	Interruptible sleep	Waiting for input, network, timer; common idle state
`Ss`	Interruptible sleep, session leader	Shell, daemon process
`Sl`	Interruptible sleep, multi-threaded	Java/Python process, database
`S+`	Foreground sleep	Interactive program waiting for input
`D`	Uninterruptible sleep	Usually disk I/O; might indicate I/O problem if persistent
`T`	Stopped	Ctrl+Z, or SIGSTOP, or debugger
`Z`	Zombie	Exited but parent hasn't called wait()
`I`	Idle kernel thread	Kernel housekeeping thread with no work

Red Flag: Stuck in D State

Summary: Process State Management

We've deeply explored process state—from the theoretical five-state model to practical implementation details. Let's consolidate the essential knowledge:

Key Takeaways

•Every process is in exactly one state — The state field in the PCB tells the kernel whether the process can run, is running, is waiting, or is finished.
•The five-state model captures the essentials — New, Ready, Running, Blocked, Terminated. Real OSes add more states for specific scenarios.
•State transitions follow strict rules — Blocked must go through Ready before Running. Terminated never returns to Ready. Understanding valid transitions helps debug scheduling issues.
•Wait queues link blocked processes to events — Each type of wait has its own queue. When the event occurs, the kernel wakes processes from the appropriate queue.
•Zombies exist to preserve exit status — A process can't fully disappear until its parent collects its exit code. Zombie accumulation indicates parent process bugs.
•State is visible through tools — ps, top, /proc, and debuggers expose state information. The STAT column reveals exactly what each process is doing.

What's Next:

Page Complete