Loading learning content...
Every process on a Linux system exists in one of several well-defined states. These states determine whether a process is eligible to run, waiting for something, stopped for debugging, or in the process of exiting. Understanding process states is essential for debugging hung processes, analyzing system performance, and comprehending how the scheduler manages the CPU.
When you run ps aux or top and see processes marked 'R', 'S', 'D', 'T', or 'Z', you're observing the visible manifestation of the kernel's internal state machine.
By the end of this page, you will understand all Linux process states, the transitions between them, how processes enter and exit each state, and how to diagnose common state-related problems like 'D' state processes and zombies.
The task_struct->state field holds the current process state. Linux defines several primary states that form the core of the state machine:
| State | ps Display | Value | Description |
|---|---|---|---|
| TASK_RUNNING | R | 0 | Runnable or currently running on a CPU |
| TASK_INTERRUPTIBLE | S | 1 | Sleeping, can be woken by signals |
| TASK_UNINTERRUPTIBLE | D | 2 | Sleeping, cannot be interrupted |
| __TASK_STOPPED | T | 4 | Stopped by job control signal (SIGSTOP) |
| __TASK_TRACED | t | 8 | Stopped by debugger (ptrace) |
| TASK_DEAD / EXIT_ZOMBIE | Z | 16/32 | Terminated but not yet reaped by parent |
| EXIT_DEAD | X | 64 | Final cleanup state (rarely visible) |
123456789101112131415161718192021222324252627282930
/* From include/linux/sched.h */ /* Primary states */#define TASK_RUNNING 0x00000000#define TASK_INTERRUPTIBLE 0x00000001#define TASK_UNINTERRUPTIBLE 0x00000002#define __TASK_STOPPED 0x00000004#define __TASK_TRACED 0x00000008 /* Exit states */#define EXIT_DEAD 0x00000010#define EXIT_ZOMBIE 0x00000020 /* Modifier flags (combined with primary states) */#define TASK_PARKED 0x00000040#define TASK_DEAD 0x00000080#define TASK_WAKEKILL 0x00000100#define TASK_WAKING 0x00000200#define TASK_NOLOAD 0x00000400#define TASK_NEW 0x00000800#define TASK_RTLOCK_WAIT 0x00001000 /* Compound states for convenience */#define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)#define TASK_IDLE (TASK_UNINTERRUPTIBLE | TASK_NOLOAD) /* * Note: The state field is 'volatile' because it may be changed * by interrupt handlers or other CPUs at any time. */TASK_RUNNING is somewhat misleadingly named—it means the task is runnable, not necessarily running. A TASK_RUNNING process is either currently executing on a CPU or waiting in a runqueue for its turn.
12345678910111213141516171819202122232425262728
/* * TASK_RUNNING means: * 1. Task is on a CPU's runqueue (waiting to run), OR * 2. Task is currently executing on a CPU * * There's no separate "RUNNING" state for on-CPU tasks. * The kernel knows which task is running by checking rq->curr. */ /* Waking a task sets it to TASK_RUNNING */static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p){ p->state = TASK_RUNNING; /* Now eligible for scheduling */ /* ... */} /* New tasks start TASK_RUNNING after creation */void wake_up_new_task(struct task_struct *p){ p->state = TASK_RUNNING; activate_task(rq, p, ...); /* Add to runqueue */} /* * In 'ps' output: * R = TASK_RUNNING (runnable) * R+ = TASK_RUNNING, foreground process group */Distinguishing 'runnable' from 'running' would complicate state transitions without benefit. The scheduler simply checks rq->curr to know which task is running. This simplifies the state machine and avoids race conditions when tasks are preempted.
Sleeping states represent tasks waiting for some event—I/O completion, timer expiration, lock acquisition, or explicit sleep calls. The key difference is signal handling:
123456789101112131415161718192021222324
/* Common pattern: interruptible sleep */set_current_state(TASK_INTERRUPTIBLE);while (!condition) { schedule(); /* Sleep until woken */ /* Check why we woke up */ if (signal_pending(current)) { set_current_state(TASK_RUNNING); return -ERESTARTSYS; /* Let userspace handle signal */ } set_current_state(TASK_INTERRUPTIBLE);}set_current_state(TASK_RUNNING); /* Uninterruptible sleep - used when waking early would corrupt state */set_current_state(TASK_UNINTERRUPTIBLE);schedule(); /* Will not return until explicitly woken */ /* TASK_KILLABLE: Middle ground - wake for fatal signals only */set_current_state(TASK_KILLABLE);schedule();if (fatal_signal_pending(current)) { /* Let SIGKILL terminate us, but ignore SIGINT etc. */}TASK_UNINTERRUPTIBLE processes cannot be killed by any signal, creating 'unkillable' processes. This commonly occurs with hung NFS mounts or failing disk I/O. The process appears 'stuck' and only resolves when the I/O completes or the system reboots. TASK_KILLABLE was introduced to allow SIGKILL to terminate such processes when safe.
Stopped states represent tasks that have been explicitly halted, either by job control signals or by a debugger.
123456789101112131415161718192021222324252627282930313233343536
/* __TASK_STOPPED: Job control *//* * Entered via: SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU * Exited via: SIGCONT * * Example: User presses Ctrl+Z in terminal * 1. Kernel sends SIGTSTP to foreground process group * 2. Default handler sets state to __TASK_STOPPED * 3. Process stays stopped until SIGCONT (fg/bg command) */ /* Shell implementation of Ctrl+Z */if (WIFSTOPPED(status)) { /* Child stopped - move to background job list */ printf("[%d]+ Stopped\n", job_number);} /* __TASK_TRACED: Debugger control *//* * Entered via: ptrace(PTRACE_ATTACH), breakpoint hit * Exited via: ptrace(PTRACE_CONT), PTRACE_DETACH * * GDB uses this to control debuggee execution */ /* GDB's breakpoint: *//* 1. GDB attaches with PTRACE_ATTACH, task becomes TRACED *//* 2. GDB writes INT3 (breakpoint) at target address *//* 3. GDB calls PTRACE_CONT, task resumes as RUNNING *//* 4. Task hits INT3, kernel sets TRACED, notifies GDB *//* 5. GDB examines state, calls PTRACE_CONT to resume */ /* * Note: TRACED has higher precedence than STOPPED. * If a debugger has a stopped task, it shows as 't' not 'T'. */| Aspect | __TASK_STOPPED (T) | __TASK_TRACED (t) |
|---|---|---|
| Trigger | SIGSTOP, Ctrl+Z | ptrace(), breakpoint |
| Resume method | SIGCONT signal | PTRACE_CONT |
| Controller | Any process with permission | Only the tracing process |
| Common use | Job control (bg/fg) | Debugging (gdb, strace) |
When a process terminates, it doesn't immediately disappear. It enters an exit state where it waits for its parent to collect its termination status.
1234567891011121314151617181920212223242526272829303132333435
/* Process exit sequence */void do_exit(long code){ struct task_struct *tsk = current; /* Phase 1: Resource cleanup */ exit_signals(tsk); /* Pending signals */ exit_mm(tsk); /* Release address space */ exit_files(tsk); /* Close file descriptors */ exit_fs(tsk); /* Release fs_struct */ exit_creds(tsk); /* Release credentials */ /* Phase 2: Notify parent */ exit_notify(tsk); /* Send SIGCHLD to parent */ /* Phase 3: Become zombie */ tsk->exit_state = EXIT_ZOMBIE; tsk->exit_code = code; /* Task stays zombie until parent calls wait() */ /* Phase 4: Schedule away - never returns */ do_task_dead();} /* * EXIT_ZOMBIE: Task has exited but parent hasn't waited yet. * - Most resources are freed (mm, files, etc.) * - task_struct remains so parent can retrieve exit_code * - Shows as 'Z' in ps output * * EXIT_DEAD: Parent has called wait(), final cleanup. * - Very brief state, rarely observed * - task_struct is freed, PID recycled */Zombies are harmless individually—they consume only a small task_struct (~7KB) and a PID. But if a parent never calls wait(), zombies accumulate. Thousands of zombies can exhaust the PID space or consume significant memory. This usually indicates a bug in the parent process. Use ps aux | grep Z to find zombies.
1234567891011121314151617181920212223242526
/* * What happens if a parent dies before waiting on children? * Children become "orphans" and are reparented to: * * 1. A "subreaper" process (if one exists in the hierarchy) * - Set via prctl(PR_SET_CHILD_SUBREAPER) * - Used by container init processes (e.g., Docker) * * 2. PID 1 (init/systemd) as the fallback * - init periodically calls wait() to reap orphan zombies * * This prevents permanent zombies when parents crash. */ /* Container example: Docker sets subreaper */prctl(PR_SET_CHILD_SUBREAPER, 1);/* Now this process will adopt orphaned grandchildren */ /* How to kill zombies? *//* * You can't kill a zombie - it's already dead! * Solutions: * 1. Fix the parent to call wait() * 2. Kill the parent (zombie reparented to init, reaped) * 3. Wait for parent to exit naturally */Process states form a directed graph where transitions are triggered by specific events. Understanding valid transitions helps debug unexpected process behavior:
1234567891011121314151617181920212223242526272829303132
┌─────────────────────────────────────────────┐ │ │ ▼ │ ┌──────────┐ fork() ┌──────────────┐ │ │ TASK_NEW ├──────────►│ TASK_RUNNING │◄──────────────────────┤ └──────────┘ └──────┬───────┘ │ │ │ ┌──────────────────────┼──────────────────────┐ │ │ │ │ │ ▼ ▼ ▼ │ ┌──────────────┐ ┌────────────────┐ ┌───────────────┐ │ │ INTERRUPTIBLE│ │ UNINTERRUPTIBLE│ │ __TASK_STOPPED│ │ │ (S) │ │ (D) │ │ (T) │ │ └──────┬───────┘ └───────┬────────┘ └───────┬───────┘ │ │ │ │ │ │ signal/ │ event │ SIGCONT │ │ wakeup │ complete │ │ │ │ │ │ └─────────────────────┴──────────────────────┴─────────┘ │ │ exit() ▼ ┌─────────────┐ │ EXIT_ZOMBIE │ │ (Z) │ └──────┬──────┘ │ parent wait() ▼ ┌─────────────┐ │ EXIT_DEAD │ ──► task_struct freed │ (X) │ └─────────────┘Process states are visible through standard tools. Here's how to diagnose common state-related problems:
123456789101112131415161718192021222324252627282930
# View process statesps aux | head -1# USER PID %CPU %MEM ... STAT ...# STAT column shows state: R, S, D, T, Z, etc. # Find all 'D' state processes (often indicates I/O problems)ps aux | awk '$8 ~ /D/' # Find zombiesps aux | awk '$8 ~ /Z/' # Detailed state info via /proccat /proc/[pid]/status | grep State# State: S (sleeping) # See what a sleeping process is waiting forcat /proc/[pid]/wchan# wait_woken # Full stack trace of kernel waitcat /proc/[pid]/stack# [<0>] do_wait+0x...# [<0>] __x64_sys_wait4+0x... # Count processes by stateps -eo state | sort | uniq -c# 45 D# 312 S# 8 R# 2 Z| Problem | Symptoms | Diagnosis | Solution |
|---|---|---|---|
| D state accumulation | Processes stuck, unkillable | Check dmesg for I/O errors | Fix underlying I/O (disk, NFS) |
| Zombie accumulation | Many Z processes | Identify parent (PPID) | Fix parent's wait() logic |
| CPU-bound R state | 100% CPU, unresponsive | strace, perf top | Add sleeps, fix algorithm |
| Excessive S state | Low throughput | Profile wakeup reasons | Reduce blocking calls |
You have completed the Linux Process Management module! You now understand task_struct, process creation, scheduling classes, CFS implementation, and process states—the foundational knowledge for Linux kernel development and advanced system debugging.