Wait And Waitpid - Learning Module

Loading content...

0/227

Blocking vs Non-blocking

The Responsiveness Problem

By default, wait() and waitpid() are blocking calls—the parent process is suspended until a child terminates. This simple behavior is correct for many use cases: a shell waiting for a foreground command, a build system waiting for a compiler, or a test harness waiting for test processes.

But what happens when:

A server needs to handle new requests while children run?
A GUI application must remain responsive while background tasks execute?
A daemon manages dozens of children with unpredictable lifespans?
An event loop must multiplex waiting on children, sockets, and timers?

In these scenarios, blocking is catastrophic. A web server that calls wait() stops accepting new connections. A GUI that blocks becomes frozen. This page explores the non-blocking wait mechanism and the architectural patterns that maintain responsiveness.

What You Will Learn

By the end of this page, you will understand the difference between blocking and non-blocking waits, master the WNOHANG flag and its semantics, learn polling patterns and their tradeoffs, understand signal-driven child reaping, and know how to integrate process waiting with event loops.

Understanding Blocking Waits

When a parent calls wait() or waitpid() without the WNOHANG flag, and no children have terminated, the kernel performs a blocking operation:

The parent's state changes from Running to Blocked/Waiting
The parent is removed from the CPU's run queue
The kernel records that this process is waiting for child termination
The scheduler dispatches a different process
When any child terminates, the kernel wakes the parent
The parent returns to the Ready queue and eventually resumes

What "Blocked" Really Means:

A blocked process:

Consumes zero CPU time while waiting
Cannot respond to any events (user input, network, etc.)
Cannot execute any code
Is efficiently managed by the kernel (no busy-waiting)

This is both a strength and a weakness. Efficiency is high, but responsiveness is zero.

Converting Mermaid diagram...

When Blocking is Appropriate:

Blocking waits are the right choice when the parent has nothing else to do until the child completes. Examples:

// Shell: wait for foreground command
pid_t pid = fork();
if (pid == 0) {
    exec(command);  // Child runs command
}
wait(&status);  // Shell waits—nothing else to do
show_prompt();  // Only after child completes

// Build step: sequential compilation
for (int i = 0; i < num_files; i++) {
    compile_file(files[i]);  // forks and waits internally
}
// All files compiled in order

When Blocking is Problematic:

Blocking fails when the parent must remain responsive:

// WRONG: Server becomes unresponsive
while (running) {
    connection = accept(socket);      // Accept new connection
    pid = fork();                      // Fork handler
    if (pid > 0) {
        wait(&status);                 // BUG: Blocks until child done!
        // Server cannot accept new connections while handling this one
    }
}

This server handles only one connection at a time—completely defeating the purpose of forking.

Blocking in the Wrong Place

A blocking wait() inside an event loop or request handler is almost always a bug. If you need to wait for children while also handling other events, you must use non-blocking waits, signal handlers, or event loop integration.

The WNOHANG Flag

The WNOHANG flag ("Wait, No Hang") transforms waitpid() from a blocking to a non-blocking call:

pid_t waitpid(pid_t pid, int *status, int options);

// Blocking (default):
waitpid(pid, &status, 0);        // Blocks until child terminates

// Non-blocking:
waitpid(pid, &status, WNOHANG);  // Returns immediately

Return Value Semantics with WNOHANG:

waitpid() Return Values with WNOHANG
Return Value	Meaning	Action to Take
`> 0` (PID)	A child with this PID has terminated	Process the status, child is reaped
`0`	No children have terminated (yet)	Do other work, try again later
`-1` with `ECHILD`	No children exist	All children already reaped
`-1` with other errno	An error occurred	Handle the error

wnohang_basic.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <errno.h>
 
/**
 * Demonstrates WNOHANG for non-blocking wait
 */
int main() {
    pid_t child = fork();
    
    if (child == 0) {
        // Child: simulate some work
        printf("Child: Working for 3 seconds...\n");
        sleep(3);
        printf("Child: Done, exiting\n");
        exit(42);
    }
    
    // Parent: poll for child completion without blocking
    printf("Parent: Child started (PID %d)\n", child);
    
    int status;
    pid_t result;
    int polls = 0;
    
    while (1) {
        result = waitpid(child, &status, WNOHANG);
        
        if (result > 0) {
            // Child has terminated
            printf("Parent: Child terminated!\n");
            if (WIFEXITED(status)) {
                printf("Parent: Exit code = %d\n", WEXITSTATUS(status));
            }
            break;
        } else if (result == 0) {
            // Child still running
            polls++;
            printf("Parent: Child still running (poll #%d)\n", polls);
            printf("Parent: Doing other work...\n");
            usleep(500000);  // 0.5 second - could do real work here
        } else {
            // Error (-1)
            if (errno == ECHILD) {
                printf("Parent: No child to wait for\n");
            } else {
                perror("waitpid");
            }
            break;
        }
    }
    
    printf("Parent: Exiting after %d polls\n", polls);
    return 0;
}
 
/*
 * Expected Output:
 * Parent: Child started (PID 12345)
 * Child: Working for 3 seconds...
 * Parent: Child still running (poll #1)
 * Parent: Doing other work...
 * Parent: Child still running (poll #2)
 * Parent: Doing other work...
 * Parent: Child still running (poll #3)
 * Parent: Doing other work...
 * Parent: Child still running (poll #4)
 * Parent: Doing other work...
 * Parent: Child still running (poll #5)
 * Parent: Doing other work...
 * Child: Done, exiting
 * Parent: Child still running (poll #6)
 * Parent: Doing other work...
 * Parent: Child terminated!
 * Parent: Exit code = 42
 * Parent: Exiting after 6 polls
 */

The Critical Insight:

With WNOHANG, the parent can check for child termination without suspending execution. If no child has terminated, the call returns 0 immediately, and the parent can continue doing useful work.

This enables patterns like:

Polling: Periodically check if children have finished
Event-driven: Check in response to SIGCHLD
Integrated loops: Check as part of larger event multiplexing

Combining WNOHANG with Other Flags:

You can OR multiple flags together:

// Check for termination OR stopping
waitpid(pid, &status, WNOHANG | WUNTRACED);

// Check for termination, stopping, OR continuing
waitpid(pid, &status, WNOHANG | WUNTRACED | WCONTINUED);

WNOHANG Changes the Return Value Semantics

Remember: with WNOHANG, a return value of 0 is NOT an error—it means 'no child has terminated yet.' Without WNOHANG, waitpid() never returns 0 (it either returns a PID or -1). Don't confuse these two behaviors.

Polling Patterns and Tradeoffs

Pattern 1: Simple Polling Loop

Check periodically with a sleep between checks:

simple_polling.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/**
 * Simple polling: check child status at regular intervals
 * 
 * Tradeoffs:
 * - Simple to implement
 * - Wastes CPU if interval too short
 * - High latency if interval too long
 * - Cannot respond immediately to termination
 */
void monitor_children_polling(pid_t *children, int count) {
    int remaining = count;
    
    while (remaining > 0) {
        for (int i = 0; i < count; i++) {
            if (children[i] == 0) continue;  // Already reaped
            
            int status;
            pid_t result = waitpid(children[i], &status, WNOHANG);
            
            if (result > 0) {
                printf("Child %d terminated\n", children[i]);
                children[i] = 0;  // Mark as reaped
                remaining--;
            }
        }
        
        if (remaining > 0) {
            // Do other work here, or just sleep
            usleep(100000);  // 100ms polling interval
        }
    }
}

Pattern 2: Work-Interleaved Polling

Check for child completion between units of work:

work_interleaved.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/**
 * Work-interleaved polling: check between work units
 * 
 * Better than pure sleep-based polling because:
 * - Actually does useful work
 * - Check frequency tied to work pace
 * - No explicit sleep/timer management
 */
void server_with_child_monitoring(int listen_fd) {
    while (running) {
        // Check for terminated children before handling request
        reap_terminated_children();  // Uses WNOHANG
        
        // Handle one client request (may block briefly on accept)
        int client_fd = accept(listen_fd, NULL, NULL);
        if (client_fd < 0) continue;
        
        pid_t handler = fork();
        if (handler == 0) {
            handle_client(client_fd);  // Child handles request
            _exit(0);
        }
        close(client_fd);  // Parent closes client socket
        
        // Could also check here for fairness
    }
}
 
void reap_terminated_children() {
    int status;
    pid_t pid;
    
    // Reap ALL terminated children (not just one)
    while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
        log_child_completion(pid, status);
    }
}

Polling Advantages

•Simple to implement and understand
•Predictable behavior
•No signal complexity
•Works in any thread/context
•Easy to combine with other work
•No special handler registration

Polling Disadvantages

•Latency: child completion not immediate
•CPU waste if polling too frequently
•Miss events if polling too slowly
•Doesn't scale to many children
•Interval tuning is application-specific
•Not truly event-driven

Choosing a Polling Interval

The polling interval is a tradeoff between latency and CPU usage. For short-lived children (< 1 second), 10-50ms is reasonable. For longer tasks, 100-500ms is often sufficient. Very long-running children (minutes+) might only need checks every few seconds.

Signal-Driven Child Reaping

Polling is simple but imprecise. The kernel already knows exactly when children terminate and can tell us immediately via the SIGCHLD signal. Using signals for child reaping combines the responsiveness of blocking waits with the non-blocking nature we need.

How SIGCHLD Works:

Child process terminates (calls exit() or is killed)
Child enters zombie state
Kernel sends SIGCHLD to the parent
Parent's signal handler runs (if registered)
Handler calls waitpid() with WNOHANG to reap the child
Parent continues normal execution

Why WNOHANG in the Signal Handler?

Even in a signal handler, we use WNOHANG because:

Multiple children might terminate before the handler runs (signals can coalesce)
The handler must reap ALL terminated children, not just one
We don't want the handler to block for any reason

sigchld_handler.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/wait.h>
#include <errno.h>
 
// Global counter for statistics (in practice, use atomic or proper sync)
volatile sig_atomic_t children_reaped = 0;
 
/**
 * SIGCHLD handler: reap all terminated children
 * 
 * Key requirements:
 * 1. Use WNOHANG - never block in a signal handler
 * 2. Reap ALL children - signals can coalesce
 * 3. Preserve errno - system calls might set it
 * 4. Be async-signal-safe - only call safe functions
 */
void sigchld_handler(int sig) {
    // Preserve errno (signal handlers can interrupt system calls)
    int saved_errno = errno;
    
    int status;
    pid_t pid;
    
    // Loop to reap ALL terminated children
    // Critical: if 3 children exit before handler runs,
    // we only get ONE SIGCHLD but must reap all 3
    while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
        children_reaped++;
        
        // Note: printf is NOT async-signal-safe!
        // This is for demonstration only.
        // In production, use write() or set a flag.
        if (WIFEXITED(status)) {
            // Would log: Child pid exited with WEXITSTATUS(status)
        } else if (WIFSIGNALED(status)) {
            // Would log: Child pid killed by signal WTERMSIG(status)
        }
    }
    
    // Restore errno
    errno = saved_errno;
}
 
int main() {
    // Install SIGCHLD handler
    struct sigaction sa;
    sa.sa_handler = sigchld_handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = SA_RESTART | SA_NOCLDSTOP;
    // SA_RESTART: restart interrupted system calls
    // SA_NOCLDSTOP: don't signal for stopped children, only terminated
    
    if (sigaction(SIGCHLD, &sa, NULL) < 0) {
        perror("sigaction");
        exit(1);
    }
    
    printf("Parent: Starting (PID %d)\n", getpid());
    
    // Create several children with different lifespans
    for (int i = 0; i < 5; i++) {
        pid_t pid = fork();
        if (pid == 0) {
            sleep(i + 1);  // Children exit at different times
            _exit(i * 10);
        }
        printf("Parent: Created child %d (PID %d)\n", i, pid);
    }
    
    // Main loop - does work while children run
    // SIGCHLD handler reaps children in the background
    printf("\nParent: Doing main work...\n");
    for (int i = 0; i < 10; i++) {
        printf("Parent: Main loop iteration %d (reaped: %d)\n", 
               i, children_reaped);
        sleep(1);
    }
    
    printf("\nParent: Finished. Total children reaped: %d\n", children_reaped);
    return 0;
}

Critical Signal Handler Requirements:

Use WNOHANG: Never block in a signal handler
Reap in a loop: Multiple children may have terminated before the handler runs. Standard UNIX signals don't queue—if three SIGCHLD signals arrive while the handler is blocked, you may only get one invocation.
Preserve errno: The signal handler might interrupt code that was about to check errno. Save and restore it.
Async-signal-safety: Only call functions that are async-signal-safe. printf() is NOT safe (in the example it's for demonstration). Use write() for logging or set a flag for later processing.
SA_RESTART flag: When a signal interrupts a blocking call (like read()), SA_RESTART causes the call to be automatically restarted instead of failing with EINTR.
SA_NOCLDSTOP flag: Only receive SIGCHLD for termination, not for stopping (SIGSTOP/SIGTSTP). This avoids unnecessary handler invocations.

Signal Coalescing

If 10 children terminate simultaneously, you might receive only 1 SIGCHLD. This is why the loop 'while ((pid = waitpid(-1, ..., WNOHANG)) > 0)' is essential—it reaps ALL available zombies, not just one. Never call waitpid() just once in a signal handler.

Event Loop Integration

Modern servers use event loops (select, poll, epoll, kqueue) to wait on multiple I/O sources simultaneously. Integrating child process management into these loops requires careful design.

The Challenge:

select() and poll() wait on file descriptors, not process IDs. You can't directly add a "wait for this child" item to your event loop.

Solution 1: Self-Pipe Trick

Create a pipe that the SIGCHLD handler writes to. The event loop includes the pipe's read end. When a child terminates, the handler writes a byte, waking the event loop.

self_pipe.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/wait.h>
#include <sys/select.h>
#include <fcntl.h>
#include <errno.h>
 
// Self-pipe for waking event loop on SIGCHLD
int sigchld_pipe[2];
 
/**
 * Signal handler: write to self-pipe to wake event loop
 * Writing a single byte is async-signal-safe
 */
void sigchld_handler(int sig) {
    int saved_errno = errno;
    write(sigchld_pipe[1], "C", 1);  // 'C' for child, any byte works
    errno = saved_errno;
}
 
/**
 * Make a file descriptor non-blocking
 */
void make_nonblocking(int fd) {
    int flags = fcntl(fd, F_GETFL, 0);
    fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}
 
/**
 * Process all terminated children
 */
void reap_children() {
    int status;
    pid_t pid;
    while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
        printf("Event loop: Reaped child %d\n", pid);
        if (WIFEXITED(status)) {
            printf("  Exit code: %d\n", WEXITSTATUS(status));
        }
    }
}
 
/**
 * Drain the self-pipe (consume notification bytes)
 */
void drain_pipe() {
    char buf[16];
    while (read(sigchld_pipe[0], buf, sizeof(buf)) > 0) {
        // Discard bytes - we just needed the wake-up
    }
}
 
int main() {
    // Create self-pipe
    if (pipe(sigchld_pipe) < 0) {
        perror("pipe");
        exit(1);
    }
    make_nonblocking(sigchld_pipe[0]);
    make_nonblocking(sigchld_pipe[1]);
    
    // Install signal handler
    struct sigaction sa = {0};
    sa.sa_handler = sigchld_handler;
    sa.sa_flags = SA_RESTART | SA_NOCLDSTOP;
    sigaction(SIGCHLD, &sa, NULL);
    
    // Spawn some children
    for (int i = 0; i < 3; i++) {
        pid_t pid = fork();
        if (pid == 0) {
            sleep(i + 1);
            printf("Child %d: exiting\n", getpid());
            _exit(i);
        }
        printf("Spawned child %d\n", pid);
    }
    
    // Event loop
    printf("\nEntering event loop...\n");
    int iterations = 0;
    
    while (iterations < 10) {
        fd_set readfds;
        FD_ZERO(&readfds);
        FD_SET(sigchld_pipe[0], &readfds);  // Watch for child signals
        // Would also add: FD_SET(socket_fd, &readfds);
        
        struct timeval timeout = {1, 0};  // 1 second timeout
        
        int ready = select(sigchld_pipe[0] + 1, &readfds, NULL, NULL, &timeout);
        
        if (ready > 0 && FD_ISSET(sigchld_pipe[0], &readfds)) {
            printf("Event loop: SIGCHLD notification received\n");
            drain_pipe();
            reap_children();
        } else if (ready == 0) {
            printf("Event loop: Timeout, doing other work...\n");
        }
        
        iterations++;
    }
    
    printf("Event loop completed\n");
    return 0;
}

Solution 2: signalfd() (Linux-specific)

Linux provides signalfd(), which creates a file descriptor that becomes readable when a signal arrives. This integrates directly with epoll/select without the self-pipe complexity:

signalfd_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/signalfd.h>
#include <sys/epoll.h>
#include <sys/wait.h>
 
int main() {
    // Block SIGCHLD (signalfd requires the signal to be blocked)
    sigset_t mask;
    sigemptyset(&mask);
    sigaddset(&mask, SIGCHLD);
    sigprocmask(SIG_BLOCK, &mask, NULL);
    
    // Create signalfd for SIGCHLD
    int sfd = signalfd(-1, &mask, SFD_NONBLOCK);
    if (sfd < 0) {
        perror("signalfd");
        exit(1);
    }
    
    // Create epoll instance
    int epfd = epoll_create1(0);
    struct epoll_event ev = {.events = EPOLLIN, .data.fd = sfd};
    epoll_ctl(epfd, EPOLL_CTL_ADD, sfd, &ev);
    
    // Spawn children
    for (int i = 0; i < 3; i++) {
        if (fork() == 0) {
            sleep(i + 1);
            _exit(i);
        }
    }
    
    // Event loop with epoll
    struct epoll_event events[10];
    
    for (int i = 0; i < 10; i++) {
        int n = epoll_wait(epfd, events, 10, 1000);
        
        for (int j = 0; j < n; j++) {
            if (events[j].data.fd == sfd) {
                // SIGCHLD received
                struct signalfd_siginfo si;
                read(sfd, &si, sizeof(si));
                
                printf("Child %d terminated\n", si.ssi_pid);
                
                // Still need to reap all children
                int status;
                pid_t pid;
                while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
                    printf("Reaped %d (status %d)\n", pid, WEXITSTATUS(status));
                }
            }
        }
    }
    
    close(sfd);
    close(epfd);
    return 0;
}

Event Loop Integration Options

•Self-pipe trick — Portable, works on all Unix systems, slight overhead
•signalfd() — Linux-specific, cleaner integration, no pipe overhead
•kqueue EVFILT_SIGNAL — BSD/macOS equivalent to signalfd
•libevent/libuv — High-level libraries that handle this for you

Best Practices and Patterns

Let's consolidate the key patterns and best practices for blocking vs. non-blocking waits:

When to Use Each Approach
Scenario	Recommended Approach	Rationale
Shell waiting for foreground command	Blocking `wait()`	Nothing else to do until command completes
Build system running sequential tasks	Blocking `waitpid()` for specific child	Tasks must complete in order
Parallel build with N workers	SIGCHLD + WNOHANG	Start new tasks as workers finish
Pre-forking server	SIGCHLD handler or self-pipe	Must accept connections while handlers run
Event loop (epoll/select)	signalfd or self-pipe	Unified event handling mechanism
Simple daemon with few children	Periodic polling	Simple, sufficient for few children

Key Implementation Rules

•Always reap ALL children in SIGCHLD handler — Loop with WNOHANG until waitpid returns 0 or -1
•Never block in signal handlers — Use write() to self-pipe, set flags, or signalfd
•Preserve errno in signal handlers — Save at start, restore at end
•Use SA_NOCLDSTOP when appropriate — Avoid spurious signals for stopped children
•Consider SA_RESTART — Avoid EINTR handling complexity for most syscalls
•Don't forget EINTR in main code — Even with SA_RESTART, some calls aren't restartable
•Match fork with reap — Every forked child must eventually be waited for or will zombie

Double-Fork for Orphaning

If you want to spawn a truly independent process (daemon), use double-fork: the parent forks, the intermediate child forks again and exits immediately, and the grandchild is orphaned to init. The parent waits only for the intermediate child (which exits quickly). This avoids both blocking and zombies.

Summary: Blocking vs Non-blocking

This page has explored the fundamental distinction between blocking and non-blocking waits, providing you with the tools to build responsive systems that properly manage child processes.

Key Takeaways

•Blocking waits suspend the parent — Simple but makes the process unresponsive
•WNOHANG returns immediately — Returns 0 if no child has terminated yet
•Polling is simple but imprecise — Trade off between latency and CPU usage
•SIGCHLD enables event-driven reaping — Respond immediately to child termination
•Signal handlers must loop with WNOHANG — Multiple children can terminate before handler runs
•Event loops need integration — Use self-pipe, signalfd, or high-level libraries
•Choose the approach for your use case — No single pattern fits all scenarios

What's Next:

We've covered waiting for a single child, but real programs often spawn many children. The next page explores handling multiple children: tracking PIDs, waiting for specific children, and managing parallel worker pools.

Page Complete

You now understand the crucial difference between blocking and non-blocking waits, and have multiple strategies for keeping your applications responsive while managing child processes. Next, we'll tackle the complexities of handling multiple children.