Loading learning content...
Not every process ends its life gracefully. Programs crash. Bugs trigger undefined behavior. Users press Ctrl+C. Administrators kill runaway processes. The kernel terminates programs that consume too much memory. These are all forms of abnormal termination—when a process ends for reasons other than a deliberate exit() call or return from main().
Understanding abnormal termination is crucial for several reasons:
By the end of this page, you will understand: the complete taxonomy of abnormal termination causes, how signals terminate processes, the mechanics of segmentation faults and other fatal errors, how the abort() function works, kernel-initiated termination (OOM killer), core dump generation, and how to design programs that handle abnormal conditions gracefully.
Abnormal termination can be categorized by its source and cause. Understanding this taxonomy helps in diagnosing issues and designing robust systems.
| Category | Source | Examples | Exit Status Behavior |
|---|---|---|---|
| Signal-Induced | External or Internal | SIGTERM, SIGKILL, SIGINT (Ctrl+C) | 128 + signal_number (shell convention) |
| Hardware Exceptions | CPU/MMU | SIGSEGV, SIGBUS, SIGFPE | Typically generates core dump |
| Programmatic Abort | Application | abort(), assert() failure | SIGABRT, usually with core dump |
| Kernel-Initiated | Operating System | OOM killer, resource limits | SIGKILL or SIGXCPU/SIGXFSZ |
| Parent-Child Protocol | Parent Process | SIGHUP when terminal closes | Depends on signal handling |
Signal-Based Termination Model
Most abnormal terminations in Unix-like systems work through the signal mechanism. When a fatal condition occurs:
This unified model means that understanding signals is the key to understanding abnormal termination.
Several signals have termination as their default action. Understanding each is essential for proper process management.
SIGTERM (15) - Polite Termination Request
SIGTERM is the standard signal for requesting process termination. It's "polite" because:
kill -TERM 1234 # Send SIGTERM to PID 1234
kill 1234 # Same, SIGTERM is the default
SIGKILL (9) - Forcible Termination
SIGKILL cannot be caught, blocked, or ignored. When delivered:
kill -KILL 1234 # Hard kill
kill -9 1234 # Same
Always try SIGTERM before SIGKILL. SIGKILL prevents any cleanup, potentially leaving temporary files, database locks, or corrupted state. Use the pattern: SIGTERM → wait → SIGKILL. Many service managers (systemd, Docker) implement this pattern with configurable timeouts.
SIGINT (2) - Interactive Interrupt
Generated when the user presses Ctrl+C at the terminal. Key characteristics:
SIGQUIT (3) - Quit with Core Dump
Generated by Ctrl+\ at the terminal. Similar to SIGINT but:
SIGHUP (1) - Hangup
Originally meant "terminal hung up" (modem disconnection). Now used for:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111
#include <stdio.h>#include <stdlib.h>#include <signal.h>#include <unistd.h>#include <string.h>#include <stdbool.h> volatile sig_atomic_t shutdown_requested = 0; // Signal handler for graceful shutdownvoid shutdown_handler(int signum) { // Only async-signal-safe operations here! const char* msg = "\nShutdown signal received...\n"; write(STDOUT_FILENO, msg, strlen(msg)); shutdown_requested = 1;} // Handler for SIGQUIT - save state before terminatingvoid quit_handler(int signum) { const char* msg = "\nSIGQUIT: Saving state before exit...\n"; write(STDOUT_FILENO, msg, strlen(msg)); // In real code: dump state to file for debugging // Note: This handler allows the core dump to occur // Reset to default handler to get core dump signal(SIGQUIT, SIG_DFL); raise(SIGQUIT);} void setup_signal_handlers() { struct sigaction sa; // Setup SIGTERM and SIGINT for graceful shutdown memset(&sa, 0, sizeof(sa)); sa.sa_handler = shutdown_handler; sigemptyset(&sa.sa_mask); sa.sa_flags = 0; if (sigaction(SIGTERM, &sa, NULL) == -1) { perror("sigaction SIGTERM"); exit(1); } if (sigaction(SIGINT, &sa, NULL) == -1) { perror("sigaction SIGINT"); exit(1); } // Special handling for SIGQUIT sa.sa_handler = quit_handler; if (sigaction(SIGQUIT, &sa, NULL) == -1) { perror("sigaction SIGQUIT"); exit(1); } // SIGHUP - reload config or ignore signal(SIGHUP, SIG_IGN); // Note: SIGKILL cannot be caught or ignored // signal(SIGKILL, handler); // This would fail} void cleanup() { printf("Performing cleanup...\n"); printf("- Closing database connections\n"); printf("- Flushing caches\n"); printf("- Removing temporary files\n"); printf("- Notifying peers of shutdown\n"); printf("Cleanup complete.\n");} int main() { setup_signal_handlers(); printf("Server running. PID: %d\n", getpid()); printf("Send SIGTERM or SIGINT to shutdown gracefully.\n"); printf("Send SIGQUIT for core dump.\n"); printf("Send SIGKILL to terminate immediately.\n\n"); // Main loop - check shutdown flag periodically while (!shutdown_requested) { printf("Working... (try Ctrl+C or kill %d)\n", getpid()); sleep(2); } // Graceful shutdown cleanup(); printf("Server shutdown complete.\n"); return EXIT_SUCCESS;} /* * Demonstration: * * $ ./server * Server running. PID: 12345 * Working... * Working... * ^C * Shutdown signal received... * Performing cleanup... * - Closing database connections * - Flushing caches * ... * Server shutdown complete. * * $ echo $? * 0 */Some signals don't originate from software but from the CPU itself. When the processor encounters an illegal operation, it raises a hardware exception that the kernel converts into a signal.
SIGSEGV (11) - Segmentation Fault
The most common crash cause. Triggered when a process attempts:
The CPU's Memory Management Unit (MMU) detects the invalid access, raises an exception, and the kernel delivers SIGSEGV.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
#include <stdio.h>#include <stdlib.h>#include <signal.h>#include <string.h> // Example 1: NULL pointer dereferencevoid null_pointer_crash() { int* ptr = NULL; *ptr = 42; // SIGSEGV: writing to address 0} // Example 2: Array out of bounds (when it hits unmapped memory)void buffer_overflow_crash() { char buffer[10]; for (int i = 0; i < 10000; i++) { buffer[i] = 'A'; // Eventually hits unmapped memory }} // Example 3: Stack overflow via infinite recursionvoid infinite_recursion() { char buffer[1024]; // Stack allocation infinite_recursion(); // Eventually exhausts stack space} // Example 4: Use after freevoid use_after_free_crash() { int* ptr = malloc(sizeof(int)); *ptr = 42; free(ptr); // ptr is now a dangling pointer *ptr = 100; // Undefined behavior, may cause SIGSEGV} // Example 5: Write to read-only memoryvoid write_to_readonly() { char* str = "Hello"; // String literal, stored in read-only section str[0] = 'J'; // SIGSEGV: writing to read-only memory} // Signal handler to catch SIGSEGV (for demonstration only)void segfault_handler(int signum) { const char* msg = "Caught SIGSEGV! Program will exit.\n"; write(2, msg, strlen(msg)); // Cannot safely continue after SIGSEGV // Must reset handler and re-raise or exit signal(SIGSEGV, SIG_DFL); raise(SIGSEGV); // Re-raise to get core dump} int main() { printf("SIGSEGV demonstration\n"); printf("Uncomment one of the crash functions to see the effect.\n\n"); // Optional: Install handler (for demonstration) // signal(SIGSEGV, segfault_handler); // Uncomment one to trigger: // null_pointer_crash(); // buffer_overflow_crash(); // infinite_recursion(); // use_after_free_crash(); // write_to_readonly(); printf("No crash triggered.\n"); return 0;}While you can install a handler for SIGSEGV, you cannot safely continue execution after receiving it. The program state is corrupted. Best practice: log diagnostic information (carefully, using only async-signal-safe functions), then terminate. Tools like Address Sanitizer (ASan) catch these bugs during development.
SIGBUS (7) - Bus Error
Similar to SIGSEGV but indicates a different class of memory errors:
SIGFPE (8) - Floating Point Exception
Despite the name, SIGFPE covers integer arithmetic errors too:
SIGILL (4) - Illegal Instruction
Raised when the CPU encounters an invalid or privileged instruction:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
#include <stdio.h>#include <signal.h>#include <fenv.h>#include <string.h>#include <unistd.h> void fpe_handler(int signum) { const char* msg = "\nCaught SIGFPE: Arithmetic error!\n"; write(STDERR_FILENO, msg, strlen(msg)); _exit(1);} int main() { // Install handler signal(SIGFPE, fpe_handler); printf("Demonstrating arithmetic exceptions...\n\n"); // Example 1: Integer division by zero // This will raise SIGFPE on most systems int x = 10; int y = 0; printf("About to divide %d by %d...\n", x, y); // Uncommenting will cause SIGFPE: // int result = x / y; // printf("Result: %d\n", result); // Example 2: Enable floating-point exceptions // (Normally masked and result in NaN/Inf) feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW); printf("Floating-point exceptions enabled.\n"); double a = 1.0; double b = 0.0; printf("About to compute %f / %f...\n", a, b); // This will now raise SIGFPE instead of returning infinity: // double fresult = a / b; printf("No crash triggered.\n"); return 0;}The abort() function provides a programmatic way to trigger abnormal termination. It's used when a program detects an unrecoverable internal error—a situation where continuing would cause worse problems than crashing.
How abort() Works:
The double-raise mechanism ensures that abort() always terminates the process—even if SIGABRT is caught.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
#include <stdio.h>#include <stdlib.h>#include <signal.h>#include <string.h>#include <unistd.h>#include <stdbool.h> // Invariant checking macro#define INVARIANT(condition, message) \ do { \ if (!(condition)) { \ fprintf(stderr, "INVARIANT VIOLATION: %s\n", message); \ fprintf(stderr, " At: %s:%d in %s()\n", \ __FILE__, __LINE__, __func__); \ abort(); \ } \ } while (0) // Custom assert with more info#define ASSERT(condition) \ do { \ if (!(condition)) { \ fprintf(stderr, "ASSERTION FAILED: %s\n", #condition); \ fprintf(stderr, " At: %s:%d in %s()\n", \ __FILE__, __LINE__, __func__); \ abort(); \ } \ } while (0) // Example: Attempting to catch SIGABRTstatic volatile bool handler_called = false; void sigabrt_handler(int signum) { const char* msg = "SIGABRT handler called!\n"; write(STDERR_FILENO, msg, strlen(msg)); handler_called = true; // Returning from this handler causes abort() to re-raise SIGABRT // with default handler, ensuring termination} // Example usagevoid process_data(int* data, size_t len) { INVARIANT(data != NULL, "data pointer must not be NULL"); INVARIANT(len > 0, "data length must be positive"); // Process the data... printf("Processing %zu elements...\n", len);} void check_system_state() { int available_memory = 1024; // Example int required_memory = 2048; // Example if (available_memory < required_memory) { fprintf(stderr, "FATAL: Insufficient memory (%d < %d)\n", available_memory, required_memory); abort(); // Cannot continue safely }} int main() { printf("Demonstrating abort() behavior...\n\n"); // Install SIGABRT handler signal(SIGABRT, sigabrt_handler); // Example 1: Normal operation with invariant checks int data[] = {1, 2, 3, 4, 5}; process_data(data, 5); // OK // Example 2: This would trigger invariant violation // process_data(NULL, 0); // Abort! // Example 3: Explicit abort for unrecoverable error // check_system_state(); // Might abort // Example 4: Standard library assert (define NDEBUG to disable) // assert(1 == 0); // Abort! // Example 5: Direct abort printf("\nAbout to call abort()...\n"); // abort(); // Uncomment to see abort behavior printf("No abort triggered.\n"); return 0;}Use abort() for internal consistency failures—situations that indicate bugs rather than user errors. Examples: corrupted data structures, failed invariants, reaching code paths that should be impossible. For user-facing errors (invalid input, missing files), use exit() with an appropriate status code instead.
The assert() Macro
The standard assert() macro is the most common way to trigger abort():
#include <assert.h>
void process(int* ptr) {
assert(ptr != NULL); // Aborts if ptr is NULL
// ... use ptr ...
}
Key properties of assert():
NDEBUG is defined (typically in release builds)assert() is a development tool. Production code should handle errors gracefully, not abort.
Sometimes the kernel itself decides to terminate a process. This happens to protect system stability when processes misbehave or exceed resource limits.
The OOM Killer (Out of Memory Killer)
When the system runs critically low on memory and cannot allocate more for a requesting process, the Linux kernel invokes the OOM killer to terminate processes and free memory. The OOM killer:
The goal is to kill the minimum number of processes to free enough memory while preserving system stability.
123456789101112131415161718192021222324
#!/bin/bash# Examining and adjusting OOM killer behavior # View OOM score for a process (higher = more likely to be killed)cat /proc/$$/oom_score # View the OOM score adjustment (-1000 to 1000)cat /proc/$$/oom_score_adj # Make current process less likely to be killed# (Requires root for negative values)echo -500 > /proc/$$/oom_score_adj # Make a process immune to OOM killer (dangerous!)# echo -1000 > /proc/PID/oom_score_adj # Check which process was last killed by OOMdmesg | grep -i "killed process" # Example output:# Out of memory: Killed process 12345 (memory_hog) total-vm:8388608kB # Monitor OOM events in real-timedmesg -w | grep -i oomThe OOM killer uses SIGKILL, which cannot be caught. There's no opportunity for cleanup. Critical services can be killed unexpectedly. To protect important processes: set oom_score_adj to a negative value (requires root), ensure adequate swap space, or configure cgroups memory limits to kill specific containers before the OOM killer acts.
Resource Limit Violations
Unix systems allow setting resource limits per process. When these are exceeded, the kernel sends signals:
| Limit | Signal | Description |
|---|---|---|
| RLIMIT_CPU | SIGXCPU | CPU time limit exceeded |
| RLIMIT_FSIZE | SIGXFSZ | File size limit exceeded |
| RLIMIT_CORE | (no signal) | Controls core dump size |
| RLIMIT_DATA | (allocation fails) | Data segment size limit |
| RLIMIT_STACK | SIGSEGV | Stack size limit exceeded |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
#include <stdio.h>#include <stdlib.h>#include <sys/resource.h>#include <signal.h>#include <unistd.h>#include <string.h> void sigxcpu_handler(int signum) { const char* msg = "\nCPU time limit exceeded! Finishing up...\n"; write(STDERR_FILENO, msg, strlen(msg)); // Signal received once at soft limit // SIGKILL comes at hard limit _exit(1);} void print_limits() { struct rlimit rl; getrlimit(RLIMIT_CPU, &rl); printf("CPU time limit: soft=%ld, hard=%ld seconds\n", rl.rlim_cur, rl.rlim_max); getrlimit(RLIMIT_FSIZE, &rl); printf("File size limit: soft=%ld, hard=%ld bytes\n", rl.rlim_cur, rl.rlim_max); getrlimit(RLIMIT_CORE, &rl); printf("Core file size: soft=%ld, hard=%ld bytes\n", rl.rlim_cur, rl.rlim_max); getrlimit(RLIMIT_STACK, &rl); printf("Stack size: soft=%ld, hard=%ld bytes\n", rl.rlim_cur, rl.rlim_max);} void set_cpu_limit(int seconds) { struct rlimit rl; rl.rlim_cur = seconds; // Soft limit: SIGXCPU rl.rlim_max = seconds + 5; // Hard limit: SIGKILL if (setrlimit(RLIMIT_CPU, &rl) == -1) { perror("setrlimit RLIMIT_CPU"); exit(1); } printf("Set CPU limit to %d seconds (SIGKILL at %d)\n", seconds, seconds + 5);} void cpu_intensive_work() { double x = 1.1; while (1) { x *= 1.0000001; for (int i = 0; i < 1000000; i++) { x = x * 1.0000001 / 1.0000001; } }} int main() { printf("Resource Limit Demonstration\n"); printf("============================\n\n"); print_limits(); printf("\n"); // Set up SIGXCPU handler signal(SIGXCPU, sigxcpu_handler); // Set a 2-second CPU time limit set_cpu_limit(2); printf("\nStarting CPU-intensive work...\n"); printf("Will receive SIGXCPU after 2 seconds\n\n"); cpu_intensive_work(); // Never reached return 0;} /* * Output: * Resource Limit Demonstration * ============================ * * CPU time limit: soft=-1, hard=-1 seconds (unlimited) * ... * Set CPU limit to 2 seconds (SIGKILL at 7) * * Starting CPU-intensive work... * * CPU time limit exceeded! Finishing up... */When a process terminates due to certain signals, the kernel can create a core dump—a file containing the process's memory image at the time of death. Core dumps are invaluable for debugging crashes.
Signals That Generate Core Dumps:
123456789101112131415161718192021222324252627282930313233343536373839404142434445
#!/bin/bash# Core dump configuration and usage # Check current core dump settingsulimit -c# 0 = disabled, unlimited = no size limit # Enable core dumps for current shell sessionulimit -c unlimited # Check core pattern (where core files go)cat /proc/sys/kernel/core_pattern# Examples:# core -> ./core# core.%p -> ./core.1234 (with PID)# /tmp/core.%e.%p -> /tmp/core.myprogram.1234# |/usr/lib/systemd/... -> Piped to coredumpctl (systemd) # Set custom core pattern (requires root)# %p = PID, %e = executable name, %t = timestampecho "/tmp/cores/core.%e.%p.%t" | sudo tee /proc/sys/kernel/core_pattern # Create a test crashcat > /tmp/crash_test.c << 'EOF'#include <signal.h>int main() { raise(SIGSEGV); // Deliberate crash return 0;}EOF gcc -g -o /tmp/crash_test /tmp/crash_test.c/tmp/crash_test # Analyze the core dumpls -la core* /tmp/cores/core* # Use gdb to analyzegdb /tmp/crash_test core.XXXXX # In gdb:# bt - backtrace# info registers - CPU registers# x/20x $sp - examine stack# list - show source codeCore dumps are most useful when the program is compiled with debug symbols (-g flag). Without symbols, you only see memory addresses. With symbols, you see function names, line numbers, and variable values. Keep debug builds available for production debugging, even if you deploy optimized binaries.
systemd-coredump Integration:
Modern Linux systems using systemd often pipe core dumps to systemd-coredump, which:
coredumpctl utility for analysis123456789101112131415161718192021222324252627282930
#!/bin/bash# Using coredumpctl on systemd systems # List recent core dumpscoredumpctl list # View information about most recent dumpcoredumpctl info # View info for specific PID or executablecoredumpctl info 12345coredumpctl info /usr/bin/myprogram # Launch debugger on most recent dumpcoredumpctl debug # Launch debugger on specific dumpcoredumpctl debug MATCH # Export core to filecoredumpctl dump -o /tmp/mycore.core # Example output of coredumpctl list:## TIME PID UID GID SIG COREFILE EXE# Thu 2024-01-15 10:23:45 EST 12345 1000 1000 11 present /usr/bin/test# Thu 2024-01-15 09:15:32 EST 12000 1000 1000 6 present /usr/bin/myapp # Clean up old core dumpssudo journalctl --vacuum-size=500MRobust software must handle abnormal termination gracefully. While we can't always prevent crashes, we can minimize their impact through careful design.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156
#include <stdio.h>#include <stdlib.h>#include <signal.h>#include <unistd.h>#include <string.h>#include <errno.h>#include <fcntl.h> // Global state for signal handler communicationstatic volatile sig_atomic_t shutdown_flag = 0;static volatile sig_atomic_t checkpoint_flag = 0; // Signal handlers (minimal work, async-signal-safe only)void terminate_handler(int signum) { shutdown_flag = 1;} void checkpoint_handler(int signum) { checkpoint_flag = 1;} // Save state to a temporary file, then atomically renameint save_checkpoint(int iteration, double value) { char temp_name[] = "/tmp/checkpoint.XXXXXX"; int fd = mkstemp(temp_name); if (fd < 0) { perror("mkstemp"); return -1; } // Write checkpoint data char buffer[256]; int len = snprintf(buffer, sizeof(buffer), "iteration=%d\nvalue=%.15f\n", iteration, value); if (write(fd, buffer, len) != len) { perror("write"); close(fd); unlink(temp_name); return -1; } // Ensure data is on disk if (fsync(fd) < 0) { perror("fsync"); close(fd); unlink(temp_name); return -1; } close(fd); // Atomically replace old checkpoint if (rename(temp_name, "/tmp/checkpoint.dat") < 0) { perror("rename"); unlink(temp_name); return -1; } printf("Checkpoint saved: iteration=%d, value=%.6f\n", iteration, value); return 0;} // Load checkpoint if availableint load_checkpoint(int* iteration, double* value) { FILE* f = fopen("/tmp/checkpoint.dat", "r"); if (!f) { if (errno == ENOENT) { *iteration = 0; *value = 1.0; return 0; // No checkpoint, start fresh } perror("fopen checkpoint"); return -1; } if (fscanf(f, "iteration=%d\n", iteration) != 1 || fscanf(f, "value=%lf\n", value) != 1) { fclose(f); fprintf(stderr, "Corrupted checkpoint\n"); return -1; } fclose(f); printf("Restored from checkpoint: iteration=%d, value=%.6f\n", *iteration, *value); return 0;} void setup_signal_handlers() { struct sigaction sa; memset(&sa, 0, sizeof(sa)); // SIGTERM/SIGINT: graceful shutdown sa.sa_handler = terminate_handler; sigaction(SIGTERM, &sa, NULL); sigaction(SIGINT, &sa, NULL); // SIGUSR1: checkpoint request sa.sa_handler = checkpoint_handler; sigaction(SIGUSR1, &sa, NULL); // Ignore SIGPIPE signal(SIGPIPE, SIG_IGN);} int main() { int iteration; double value; printf("Robust computation example\n"); printf("PID: %d\n", getpid()); printf("Send SIGUSR1 to checkpoint, SIGTERM to shutdown\n\n"); setup_signal_handlers(); // Restore from checkpoint if available if (load_checkpoint(&iteration, &value) < 0) { fprintf(stderr, "Failed to load checkpoint\n"); return 1; } // Main processing loop while (!shutdown_flag && iteration < 1000000) { // Do some work value = value * 1.000001; iteration++; // Check for checkpoint request if (checkpoint_flag) { checkpoint_flag = 0; save_checkpoint(iteration, value); } // Periodic checkpoint every 100000 iterations if (iteration % 100000 == 0) { save_checkpoint(iteration, value); } // Simulate work if (iteration % 50000 == 0) { printf("Progress: iteration=%d, value=%.6f\n", iteration, value); } usleep(100); // Throttle for demo } // Clean shutdown printf("\nShutting down...\n"); save_checkpoint(iteration, value); printf("Final state saved. Goodbye.\n"); return 0;}Some systems (notably databases like CouchDB) embrace 'crash-only' design: there's no explicit shutdown procedure. You just kill the process. This works because all state transitions are crash-safe. Recovery after a crash is the same as recovery after a 'clean' stop. This eliminates an entire category of bugs related to shutdown races.
We've explored the complete landscape of abnormal process termination—from user-initiated signals to hardware exceptions to kernel intervention. Let's consolidate the essential knowledge:
What's Next:
Now that we understand both normal and abnormal termination, we need to examine what information a terminating process communicates to its parent: the return status. Exit status values have specific meanings, conventions, and mechanisms for encoding termination reasons. Understanding these is essential for shell scripting, process orchestration, and debugging.
You now understand the complete taxonomy of abnormal termination, how signals mediate the process, how hardware exceptions work, and strategies for building robust software that handles crashes gracefully. This knowledge is fundamental for systems programming and debugging production issues.