Operating SystemsProcess Termination

Process Termination

LevelIntermediate

Duration60 mins

TopicProcess Termination

2 / 5

Abnormal Termination

When Processes Die Unexpectedly

Not every process ends its life gracefully. Programs crash. Bugs trigger undefined behavior. Users press Ctrl+C. Administrators kill runaway processes. The kernel terminates programs that consume too much memory. These are all forms of abnormal termination—when a process ends for reasons other than a deliberate exit() call or return from main().

Understanding abnormal termination is crucial for several reasons:

Debugging: When your program crashes, you need to understand what happened and why
Robustness: You must design programs that handle termination signals gracefully
System Administration: You need to monitor, diagnose, and manage misbehaving processes
Security: Abnormal termination can indicate attacks, memory corruption, or critical failures

What You Will Learn

By the end of this page, you will understand: the complete taxonomy of abnormal termination causes, how signals terminate processes, the mechanics of segmentation faults and other fatal errors, how the abort() function works, kernel-initiated termination (OOM killer), core dump generation, and how to design programs that handle abnormal conditions gracefully.

Taxonomy of Abnormal Termination

Abnormal termination can be categorized by its source and cause. Understanding this taxonomy helps in diagnosing issues and designing robust systems.

Categories of Abnormal Termination
Category	Source	Examples	Exit Status Behavior
Signal-Induced	External or Internal	SIGTERM, SIGKILL, SIGINT (Ctrl+C)	128 + signal_number (shell convention)
Hardware Exceptions	CPU/MMU	SIGSEGV, SIGBUS, SIGFPE	Typically generates core dump
Programmatic Abort	Application	abort(), assert() failure	SIGABRT, usually with core dump
Kernel-Initiated	Operating System	OOM killer, resource limits	SIGKILL or SIGXCPU/SIGXFSZ
Parent-Child Protocol	Parent Process	SIGHUP when terminal closes	Depends on signal handling

Signal-Based Termination Model

Most abnormal terminations in Unix-like systems work through the signal mechanism. When a fatal condition occurs:

The condition is detected (by hardware, kernel, or software)
A signal is generated and delivered to the process
The default signal handler (or custom handler) executes
If the default action is termination, the process ends

This unified model means that understanding signals is the key to understanding abnormal termination.

Converting Mermaid diagram...

Termination Signals In Depth

Several signals have termination as their default action. Understanding each is essential for proper process management.

SIGTERM (15) - Polite Termination Request

SIGTERM is the standard signal for requesting process termination. It's "polite" because:

The process can catch it and perform cleanup
The process can even ignore it (though this is usually bad practice)
It gives the process a chance to save state, close connections, etc.

kill -TERM 1234   # Send SIGTERM to PID 1234
kill 1234         # Same, SIGTERM is the default

SIGKILL (9) - Forcible Termination

SIGKILL cannot be caught, blocked, or ignored. When delivered:

The process terminates immediately
No cleanup code runs
Resources are forcibly released by the kernel
Exit status will be 137 (128 + 9)

kill -KILL 1234   # Hard kill
kill -9 1234      # Same

SIGKILL: Last Resort Only

Always try SIGTERM before SIGKILL. SIGKILL prevents any cleanup, potentially leaving temporary files, database locks, or corrupted state. Use the pattern: SIGTERM → wait → SIGKILL. Many service managers (systemd, Docker) implement this pattern with configurable timeouts.

SIGINT (2) - Interactive Interrupt

Generated when the user presses Ctrl+C at the terminal. Key characteristics:

Default action is termination
Can be caught to prompt for confirmation or save work
Only delivered to the foreground process group

SIGQUIT (3) - Quit with Core Dump

Generated by Ctrl+\ at the terminal. Similar to SIGINT but:

Default action is termination WITH core dump
Useful for debugging: generates crash dump of running program
Often used when SIGINT doesn't work

SIGHUP (1) - Hangup

Originally meant "terminal hung up" (modem disconnection). Now used for:

Session leader termination (when terminal closes)
Daemon configuration reload (by convention)
Termination of processes tied to a terminal session

signal_handling_demo.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
#include <stdbool.h>
 
volatile sig_atomic_t shutdown_requested = 0;
 
// Signal handler for graceful shutdown
void shutdown_handler(int signum) {
    // Only async-signal-safe operations here!
    const char* msg = "\nShutdown signal received...\n";
    write(STDOUT_FILENO, msg, strlen(msg));
    shutdown_requested = 1;
}
 
// Handler for SIGQUIT - save state before terminating
void quit_handler(int signum) {
    const char* msg = "\nSIGQUIT: Saving state before exit...\n";
    write(STDOUT_FILENO, msg, strlen(msg));
    
    // In real code: dump state to file for debugging
    // Note: This handler allows the core dump to occur
    
    // Reset to default handler to get core dump
    signal(SIGQUIT, SIG_DFL);
    raise(SIGQUIT);
}
 
void setup_signal_handlers() {
    struct sigaction sa;
    
    // Setup SIGTERM and SIGINT for graceful shutdown
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = shutdown_handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = 0;
    
    if (sigaction(SIGTERM, &sa, NULL) == -1) {
        perror("sigaction SIGTERM");
        exit(1);
    }
    
    if (sigaction(SIGINT, &sa, NULL) == -1) {
        perror("sigaction SIGINT");
        exit(1);
    }
    
    // Special handling for SIGQUIT
    sa.sa_handler = quit_handler;
    if (sigaction(SIGQUIT, &sa, NULL) == -1) {
        perror("sigaction SIGQUIT");
        exit(1);
    }
    
    // SIGHUP - reload config or ignore
    signal(SIGHUP, SIG_IGN);
    
    // Note: SIGKILL cannot be caught or ignored
    // signal(SIGKILL, handler);  // This would fail
}
 
void cleanup() {
    printf("Performing cleanup...\n");
    printf("- Closing database connections\n");
    printf("- Flushing caches\n");
    printf("- Removing temporary files\n");
    printf("- Notifying peers of shutdown\n");
    printf("Cleanup complete.\n");
}
 
int main() {
    setup_signal_handlers();
    
    printf("Server running. PID: %d\n", getpid());
    printf("Send SIGTERM or SIGINT to shutdown gracefully.\n");
    printf("Send SIGQUIT for core dump.\n");
    printf("Send SIGKILL to terminate immediately.\n\n");
    
    // Main loop - check shutdown flag periodically
    while (!shutdown_requested) {
        printf("Working... (try Ctrl+C or kill %d)\n", getpid());
        sleep(2);
    }
    
    // Graceful shutdown
    cleanup();
    printf("Server shutdown complete.\n");
    
    return EXIT_SUCCESS;
}
 
/*
 * Demonstration:
 * 
 * $ ./server
 * Server running. PID: 12345
 * Working...
 * Working...
 * ^C
 * Shutdown signal received...
 * Performing cleanup...
 * - Closing database connections
 * - Flushing caches
 * ...
 * Server shutdown complete.
 * 
 * $ echo $?
 * 0
 */

Complete Termination Signal Reference

•SIGTERM (15): Termination request - catchable, default for 'kill'
•SIGKILL (9): Forced termination - cannot be caught or ignored
•SIGINT (2): Interactive interrupt (Ctrl+C) - catchable
•SIGQUIT (3): Quit with core dump (Ctrl+\) - catchable
•SIGHUP (1): Terminal hangup - often used for config reload
•SIGPIPE (13): Write to pipe with no readers - catchable
•SIGALRM (14): Alarm timer expired - catchable
•SIGUSR1/SIGUSR2 (10/12): User-defined signals

Hardware Exceptions and Fatal Errors

Some signals don't originate from software but from the CPU itself. When the processor encounters an illegal operation, it raises a hardware exception that the kernel converts into a signal.

SIGSEGV (11) - Segmentation Fault

The most common crash cause. Triggered when a process attempts:

Reading from or writing to an invalid memory address
Accessing memory without proper permissions
Dereferencing a NULL pointer
Stack overflow
Buffer overflow beyond mapped memory

The CPU's Memory Management Unit (MMU) detects the invalid access, raises an exception, and the kernel delivers SIGSEGV.

segfault_examples.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
 
// Example 1: NULL pointer dereference
void null_pointer_crash() {
    int* ptr = NULL;
    *ptr = 42;  // SIGSEGV: writing to address 0
}
 
// Example 2: Array out of bounds (when it hits unmapped memory)
void buffer_overflow_crash() {
    char buffer[10];
    for (int i = 0; i < 10000; i++) {
        buffer[i] = 'A';  // Eventually hits unmapped memory
    }
}
 
// Example 3: Stack overflow via infinite recursion
void infinite_recursion() {
    char buffer[1024];  // Stack allocation
    infinite_recursion();  // Eventually exhausts stack space
}
 
// Example 4: Use after free
void use_after_free_crash() {
    int* ptr = malloc(sizeof(int));
    *ptr = 42;
    free(ptr);
    // ptr is now a dangling pointer
    *ptr = 100;  // Undefined behavior, may cause SIGSEGV
}
 
// Example 5: Write to read-only memory
void write_to_readonly() {
    char* str = "Hello";  // String literal, stored in read-only section
    str[0] = 'J';  // SIGSEGV: writing to read-only memory
}
 
// Signal handler to catch SIGSEGV (for demonstration only)
void segfault_handler(int signum) {
    const char* msg = "Caught SIGSEGV! Program will exit.\n";
    write(2, msg, strlen(msg));
    
    // Cannot safely continue after SIGSEGV
    // Must reset handler and re-raise or exit
    signal(SIGSEGV, SIG_DFL);
    raise(SIGSEGV);  // Re-raise to get core dump
}
 
int main() {
    printf("SIGSEGV demonstration\n");
    printf("Uncomment one of the crash functions to see the effect.\n\n");
    
    // Optional: Install handler (for demonstration)
    // signal(SIGSEGV, segfault_handler);
    
    // Uncomment one to trigger:
    // null_pointer_crash();
    // buffer_overflow_crash();
    // infinite_recursion();
    // use_after_free_crash();
    // write_to_readonly();
    
    printf("No crash triggered.\n");
    return 0;
}

SIGSEGV Cannot Be Meaningfully Handled

While you can install a handler for SIGSEGV, you cannot safely continue execution after receiving it. The program state is corrupted. Best practice: log diagnostic information (carefully, using only async-signal-safe functions), then terminate. Tools like Address Sanitizer (ASan) catch these bugs during development.

SIGBUS (7) - Bus Error

Similar to SIGSEGV but indicates a different class of memory errors:

Accessing improperly aligned memory (on architectures that require alignment)
Accessing memory-mapped files beyond their actual size
Other physical memory access problems

SIGFPE (8) - Floating Point Exception

Despite the name, SIGFPE covers integer arithmetic errors too:

Integer division by zero
Integer overflow (when enabled)
Floating-point exceptions (invalid operation, divide-by-zero, overflow, underflow)

SIGILL (4) - Illegal Instruction

Raised when the CPU encounters an invalid or privileged instruction:

Corrupted code segment
Executing data as code (often indicates exploit attempts)
Using instructions not supported by the CPU

arithmetic_crash.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <stdio.h>
#include <signal.h>
#include <fenv.h>
#include <string.h>
#include <unistd.h>
 
void fpe_handler(int signum) {
    const char* msg = "\nCaught SIGFPE: Arithmetic error!\n";
    write(STDERR_FILENO, msg, strlen(msg));
    _exit(1);
}
 
int main() {
    // Install handler
    signal(SIGFPE, fpe_handler);
    
    printf("Demonstrating arithmetic exceptions...\n\n");
    
    // Example 1: Integer division by zero
    // This will raise SIGFPE on most systems
    int x = 10;
    int y = 0;
    
    printf("About to divide %d by %d...\n", x, y);
    
    // Uncommenting will cause SIGFPE:
    // int result = x / y;
    // printf("Result: %d\n", result);
    
    // Example 2: Enable floating-point exceptions
    // (Normally masked and result in NaN/Inf)
    feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);
    
    printf("Floating-point exceptions enabled.\n");
    
    double a = 1.0;
    double b = 0.0;
    
    printf("About to compute %f / %f...\n", a, b);
    
    // This will now raise SIGFPE instead of returning infinity:
    // double fresult = a / b;
    
    printf("No crash triggered.\n");
    return 0;
}

The abort() Function

The abort() function provides a programmatic way to trigger abnormal termination. It's used when a program detects an unrecoverable internal error—a situation where continuing would cause worse problems than crashing.

How abort() Works:

Flushes all open stdio streams (like exit())
Raises SIGABRT signal to the calling process
If SIGABRT is caught and the handler returns, abort() resets to default and raises SIGABRT again
Default SIGABRT action: terminate with core dump

The double-raise mechanism ensures that abort() always terminates the process—even if SIGABRT is caught.

abort_usage.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
#include <unistd.h>
#include <stdbool.h>
 
// Invariant checking macro
#define INVARIANT(condition, message) \
    do { \
        if (!(condition)) { \
            fprintf(stderr, "INVARIANT VIOLATION: %s\n", message); \
            fprintf(stderr, "  At: %s:%d in %s()\n", \
                    __FILE__, __LINE__, __func__); \
            abort(); \
        } \
    } while (0)
 
// Custom assert with more info
#define ASSERT(condition) \
    do { \
        if (!(condition)) { \
            fprintf(stderr, "ASSERTION FAILED: %s\n", #condition); \
            fprintf(stderr, "  At: %s:%d in %s()\n", \
                    __FILE__, __LINE__, __func__); \
            abort(); \
        } \
    } while (0)
 
// Example: Attempting to catch SIGABRT
static volatile bool handler_called = false;
 
void sigabrt_handler(int signum) {
    const char* msg = "SIGABRT handler called!\n";
    write(STDERR_FILENO, msg, strlen(msg));
    handler_called = true;
    // Returning from this handler causes abort() to re-raise SIGABRT
    // with default handler, ensuring termination
}
 
// Example usage
void process_data(int* data, size_t len) {
    INVARIANT(data != NULL, "data pointer must not be NULL");
    INVARIANT(len > 0, "data length must be positive");
    
    // Process the data...
    printf("Processing %zu elements...\n", len);
}
 
void check_system_state() {
    int available_memory = 1024;  // Example
    int required_memory = 2048;   // Example
    
    if (available_memory < required_memory) {
        fprintf(stderr, "FATAL: Insufficient memory (%d < %d)\n",
                available_memory, required_memory);
        abort();  // Cannot continue safely
    }
}
 
int main() {
    printf("Demonstrating abort() behavior...\n\n");
    
    // Install SIGABRT handler
    signal(SIGABRT, sigabrt_handler);
    
    // Example 1: Normal operation with invariant checks
    int data[] = {1, 2, 3, 4, 5};
    process_data(data, 5);  // OK
    
    // Example 2: This would trigger invariant violation
    // process_data(NULL, 0);  // Abort!
    
    // Example 3: Explicit abort for unrecoverable error
    // check_system_state();  // Might abort
    
    // Example 4: Standard library assert (define NDEBUG to disable)
    // assert(1 == 0);  // Abort!
    
    // Example 5: Direct abort
    printf("\nAbout to call abort()...\n");
    // abort();  // Uncomment to see abort behavior
    
    printf("No abort triggered.\n");
    return 0;
}

When to Use abort()

Use abort() for internal consistency failures—situations that indicate bugs rather than user errors. Examples: corrupted data structures, failed invariants, reaching code paths that should be impossible. For user-facing errors (invalid input, missing files), use exit() with an appropriate status code instead.

The assert() Macro

The standard assert() macro is the most common way to trigger abort():

#include <assert.h>

void process(int* ptr) {
    assert(ptr != NULL);  // Aborts if ptr is NULL
    // ... use ptr ...
}

Key properties of assert():

Disabled when NDEBUG is defined (typically in release builds)
Prints file, line number, and failed expression before aborting
Calls abort(), generating a core dump
Do NOT use for runtime error handling in production code

assert() is a development tool. Production code should handle errors gracefully, not abort.

Kernel-Initiated Termination

Sometimes the kernel itself decides to terminate a process. This happens to protect system stability when processes misbehave or exceed resource limits.

The OOM Killer (Out of Memory Killer)

When the system runs critically low on memory and cannot allocate more for a requesting process, the Linux kernel invokes the OOM killer to terminate processes and free memory. The OOM killer:

Calculates a "badness score" for each process
Considers memory usage, CPU time, privileges, and oom_score_adj
Selects the process with the highest score
Sends SIGKILL to terminate it

The goal is to kill the minimum number of processes to free enough memory while preserving system stability.

oom_score_demo.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/bin/bash
# Examining and adjusting OOM killer behavior
 
# View OOM score for a process (higher = more likely to be killed)
cat /proc/$$/oom_score
 
# View the OOM score adjustment (-1000 to 1000)
cat /proc/$$/oom_score_adj
 
# Make current process less likely to be killed
# (Requires root for negative values)
echo -500 > /proc/$$/oom_score_adj
 
# Make a process immune to OOM killer (dangerous!)
# echo -1000 > /proc/PID/oom_score_adj
 
# Check which process was last killed by OOM
dmesg | grep -i "killed process"
 
# Example output:
# Out of memory: Killed process 12345 (memory_hog) total-vm:8388608kB
 
# Monitor OOM events in real-time
dmesg -w | grep -i oom

OOM Killer Consequences

The OOM killer uses SIGKILL, which cannot be caught. There's no opportunity for cleanup. Critical services can be killed unexpectedly. To protect important processes: set oom_score_adj to a negative value (requires root), ensure adequate swap space, or configure cgroups memory limits to kill specific containers before the OOM killer acts.

Resource Limit Violations

Unix systems allow setting resource limits per process. When these are exceeded, the kernel sends signals:

Limit	Signal	Description
RLIMIT_CPU	SIGXCPU	CPU time limit exceeded
RLIMIT_FSIZE	SIGXFSZ	File size limit exceeded
RLIMIT_CORE	(no signal)	Controls core dump size
RLIMIT_DATA	(allocation fails)	Data segment size limit
RLIMIT_STACK	SIGSEGV	Stack size limit exceeded

resource_limits.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
#include <stdio.h>
#include <stdlib.h>
#include <sys/resource.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
 
void sigxcpu_handler(int signum) {
    const char* msg = "\nCPU time limit exceeded! Finishing up...\n";
    write(STDERR_FILENO, msg, strlen(msg));
    // Signal received once at soft limit
    // SIGKILL comes at hard limit
    _exit(1);
}
 
void print_limits() {
    struct rlimit rl;
    
    getrlimit(RLIMIT_CPU, &rl);
    printf("CPU time limit: soft=%ld, hard=%ld seconds\n",
           rl.rlim_cur, rl.rlim_max);
    
    getrlimit(RLIMIT_FSIZE, &rl);
    printf("File size limit: soft=%ld, hard=%ld bytes\n",
           rl.rlim_cur, rl.rlim_max);
    
    getrlimit(RLIMIT_CORE, &rl);
    printf("Core file size: soft=%ld, hard=%ld bytes\n",
           rl.rlim_cur, rl.rlim_max);
    
    getrlimit(RLIMIT_STACK, &rl);
    printf("Stack size: soft=%ld, hard=%ld bytes\n",
           rl.rlim_cur, rl.rlim_max);
}
 
void set_cpu_limit(int seconds) {
    struct rlimit rl;
    
    rl.rlim_cur = seconds;       // Soft limit: SIGXCPU
    rl.rlim_max = seconds + 5;   // Hard limit: SIGKILL
    
    if (setrlimit(RLIMIT_CPU, &rl) == -1) {
        perror("setrlimit RLIMIT_CPU");
        exit(1);
    }
    
    printf("Set CPU limit to %d seconds (SIGKILL at %d)\n",
           seconds, seconds + 5);
}
 
void cpu_intensive_work() {
    double x = 1.1;
    while (1) {
        x *= 1.0000001;
        for (int i = 0; i < 1000000; i++) {
            x = x * 1.0000001 / 1.0000001;
        }
    }
}
 
int main() {
    printf("Resource Limit Demonstration\n");
    printf("============================\n\n");
    
    print_limits();
    printf("\n");
    
    // Set up SIGXCPU handler
    signal(SIGXCPU, sigxcpu_handler);
    
    // Set a 2-second CPU time limit
    set_cpu_limit(2);
    
    printf("\nStarting CPU-intensive work...\n");
    printf("Will receive SIGXCPU after 2 seconds\n\n");
    
    cpu_intensive_work();
    
    // Never reached
    return 0;
}
 
/*
 * Output:
 * Resource Limit Demonstration
 * ============================
 * 
 * CPU time limit: soft=-1, hard=-1 seconds  (unlimited)
 * ...
 * Set CPU limit to 2 seconds (SIGKILL at 7)
 * 
 * Starting CPU-intensive work...
 * 
 * CPU time limit exceeded! Finishing up...
 */

Core Dumps: Post-Mortem Debugging

When a process terminates due to certain signals, the kernel can create a core dump—a file containing the process's memory image at the time of death. Core dumps are invaluable for debugging crashes.

Signals That Generate Core Dumps:

SIGABRT (abort)
SIGBUS (bus error)
SIGFPE (floating-point exception)
SIGILL (illegal instruction)
SIGQUIT (quit)
SIGSEGV (segmentation fault)
SIGSYS (bad system call)
SIGTRAP (trace/breakpoint trap)
SIGXCPU (CPU time exceeded, sometimes)
SIGXFSZ (file size exceeded, sometimes)

core_dump_setup.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#!/bin/bash
# Core dump configuration and usage
 
# Check current core dump settings
ulimit -c
# 0 = disabled, unlimited = no size limit
 
# Enable core dumps for current shell session
ulimit -c unlimited
 
# Check core pattern (where core files go)
cat /proc/sys/kernel/core_pattern
# Examples:
# core                    -> ./core
# core.%p                 -> ./core.1234 (with PID)
# /tmp/core.%e.%p         -> /tmp/core.myprogram.1234
# |/usr/lib/systemd/...   -> Piped to coredumpctl (systemd)
 
# Set custom core pattern (requires root)
# %p = PID, %e = executable name, %t = timestamp
echo "/tmp/cores/core.%e.%p.%t" | sudo tee /proc/sys/kernel/core_pattern
 
# Create a test crash
cat > /tmp/crash_test.c << 'EOF'
#include <signal.h>
int main() {
    raise(SIGSEGV);  // Deliberate crash
    return 0;
}
EOF
 
gcc -g -o /tmp/crash_test /tmp/crash_test.c
/tmp/crash_test
 
# Analyze the core dump
ls -la core* /tmp/cores/core*
 
# Use gdb to analyze
gdb /tmp/crash_test core.XXXXX
 
# In gdb:
#   bt            - backtrace
#   info registers - CPU registers
#   x/20x $sp     - examine stack
#   list          - show source code

Debug Builds for Useful Core Dumps

Core dumps are most useful when the program is compiled with debug symbols (-g flag). Without symbols, you only see memory addresses. With symbols, you see function names, line numbers, and variable values. Keep debug builds available for production debugging, even if you deploy optimized binaries.

systemd-coredump Integration:

Modern Linux systems using systemd often pipe core dumps to systemd-coredump, which:

Compresses and stores core dumps in /var/lib/systemd/coredump/
Records metadata (signal, PID, executable, timestamp)
Provides the coredumpctl utility for analysis
Automatically prunes old dumps to save space

coredumpctl_usage.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/bin/bash
# Using coredumpctl on systemd systems
 
# List recent core dumps
coredumpctl list
 
# View information about most recent dump
coredumpctl info
 
# View info for specific PID or executable
coredumpctl info 12345
coredumpctl info /usr/bin/myprogram
 
# Launch debugger on most recent dump
coredumpctl debug
 
# Launch debugger on specific dump
coredumpctl debug MATCH
 
# Export core to file
coredumpctl dump -o /tmp/mycore.core
 
# Example output of coredumpctl list:
#
# TIME                            PID  UID  GID SIG COREFILE EXE
# Thu 2024-01-15 10:23:45 EST   12345 1000 1000  11 present  /usr/bin/test
# Thu 2024-01-15 09:15:32 EST   12000 1000 1000   6 present  /usr/bin/myapp
 
# Clean up old core dumps
sudo journalctl --vacuum-size=500M

Designing for Abnormal Termination

Robust software must handle abnormal termination gracefully. While we can't always prevent crashes, we can minimize their impact through careful design.

Key Defensive Strategies

•Signal Handlers for Cleanup: Catch SIGTERM/SIGINT and perform essential cleanup before exiting. Don't try to catch SIGSEGV for recovery—it rarely works.
•Write-Ahead Logging (WAL): Log intended operations before performing them. If the process crashes mid-operation, the log enables recovery.
•Atomic Operations: Use atomic file operations (rename, link) instead of in-place modification. Crashes during write leave the original intact.
•Crash-Only Design: Design services to recover safely from abrupt termination. Assume you'll crash and design for it.
•Process Supervision: Use supervisors (systemd, supervisord) to restart crashed processes automatically.
•Checkpoint/Restart: Periodically save state to allow resumption after crashes, especially for long-running computations.

graceful_shutdown_pattern.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
 
// Global state for signal handler communication
static volatile sig_atomic_t shutdown_flag = 0;
static volatile sig_atomic_t checkpoint_flag = 0;
 
// Signal handlers (minimal work, async-signal-safe only)
void terminate_handler(int signum) {
    shutdown_flag = 1;
}
 
void checkpoint_handler(int signum) {
    checkpoint_flag = 1;
}
 
// Save state to a temporary file, then atomically rename
int save_checkpoint(int iteration, double value) {
    char temp_name[] = "/tmp/checkpoint.XXXXXX";
    int fd = mkstemp(temp_name);
    if (fd < 0) {
        perror("mkstemp");
        return -1;
    }
    
    // Write checkpoint data
    char buffer[256];
    int len = snprintf(buffer, sizeof(buffer), 
                       "iteration=%d\nvalue=%.15f\n", 
                       iteration, value);
    
    if (write(fd, buffer, len) != len) {
        perror("write");
        close(fd);
        unlink(temp_name);
        return -1;
    }
    
    // Ensure data is on disk
    if (fsync(fd) < 0) {
        perror("fsync");
        close(fd);
        unlink(temp_name);
        return -1;
    }
    close(fd);
    
    // Atomically replace old checkpoint
    if (rename(temp_name, "/tmp/checkpoint.dat") < 0) {
        perror("rename");
        unlink(temp_name);
        return -1;
    }
    
    printf("Checkpoint saved: iteration=%d, value=%.6f\n", 
           iteration, value);
    return 0;
}
 
// Load checkpoint if available
int load_checkpoint(int* iteration, double* value) {
    FILE* f = fopen("/tmp/checkpoint.dat", "r");
    if (!f) {
        if (errno == ENOENT) {
            *iteration = 0;
            *value = 1.0;
            return 0;  // No checkpoint, start fresh
        }
        perror("fopen checkpoint");
        return -1;
    }
    
    if (fscanf(f, "iteration=%d\n", iteration) != 1 ||
        fscanf(f, "value=%lf\n", value) != 1) {
        fclose(f);
        fprintf(stderr, "Corrupted checkpoint\n");
        return -1;
    }
    
    fclose(f);
    printf("Restored from checkpoint: iteration=%d, value=%.6f\n",
           *iteration, *value);
    return 0;
}
 
void setup_signal_handlers() {
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    
    // SIGTERM/SIGINT: graceful shutdown
    sa.sa_handler = terminate_handler;
    sigaction(SIGTERM, &sa, NULL);
    sigaction(SIGINT, &sa, NULL);
    
    // SIGUSR1: checkpoint request
    sa.sa_handler = checkpoint_handler;
    sigaction(SIGUSR1, &sa, NULL);
    
    // Ignore SIGPIPE
    signal(SIGPIPE, SIG_IGN);
}
 
int main() {
    int iteration;
    double value;
    
    printf("Robust computation example\n");
    printf("PID: %d\n", getpid());
    printf("Send SIGUSR1 to checkpoint, SIGTERM to shutdown\n\n");
    
    setup_signal_handlers();
    
    // Restore from checkpoint if available
    if (load_checkpoint(&iteration, &value) < 0) {
        fprintf(stderr, "Failed to load checkpoint\n");
        return 1;
    }
    
    // Main processing loop
    while (!shutdown_flag && iteration < 1000000) {
        // Do some work
        value = value * 1.000001;
        iteration++;
        
        // Check for checkpoint request
        if (checkpoint_flag) {
            checkpoint_flag = 0;
            save_checkpoint(iteration, value);
        }
        
        // Periodic checkpoint every 100000 iterations
        if (iteration % 100000 == 0) {
            save_checkpoint(iteration, value);
        }
        
        // Simulate work
        if (iteration % 50000 == 0) {
            printf("Progress: iteration=%d, value=%.6f\n", 
                   iteration, value);
        }
        
        usleep(100);  // Throttle for demo
    }
    
    // Clean shutdown
    printf("\nShutting down...\n");
    save_checkpoint(iteration, value);
    printf("Final state saved. Goodbye.\n");
    
    return 0;
}

The Crash-Only Philosophy

Some systems (notably databases like CouchDB) embrace 'crash-only' design: there's no explicit shutdown procedure. You just kill the process. This works because all state transitions are crash-safe. Recovery after a crash is the same as recovery after a 'clean' stop. This eliminates an entire category of bugs related to shutdown races.

Summary: Abnormal Termination Mastery

We've explored the complete landscape of abnormal process termination—from user-initiated signals to hardware exceptions to kernel intervention. Let's consolidate the essential knowledge:

Key Takeaways

•Most abnormal terminations work through signals: SIGSEGV for crashes, SIGTERM for shutdown requests, SIGKILL for forced termination
•SIGTERM is catchable; SIGKILL is not: Always handle SIGTERM for cleanup, use SIGKILL only as last resort
•Hardware exceptions (SIGSEGV, SIGFPE, SIGBUS) indicate bugs—catch them for logging only, not recovery
•abort() guarantees termination even if SIGABRT is caught—used for internal error detection
•The OOM killer uses SIGKILL: No cleanup possible; protect critical processes with oom_score_adj
•Core dumps enable post-mortem debugging: Configure properly and compile with debug symbols
•Design for crashes: Write-ahead logging, atomic operations, and checkpoint/restart make crashes survivable
•Shell exit codes 128+N indicate signal N: e.g., 139 = 128+11 = SIGSEGV

What's Next:

Now that we understand both normal and abnormal termination, we need to examine what information a terminating process communicates to its parent: the return status. Exit status values have specific meanings, conventions, and mechanisms for encoding termination reasons. Understanding these is essential for shell scripting, process orchestration, and debugging.

Page Complete

You now understand the complete taxonomy of abnormal termination, how signals mediate the process, how hardware exceptions work, and strategies for building robust software that handles crashes gracefully. This knowledge is fundamental for systems programming and debugging production issues.

2 / 5

Loading learning content...

Operating SystemsProcess Termination

Process Termination

LevelIntermediate

Duration60 mins

TopicProcess Termination

2 / 5

Abnormal Termination

When Processes Die Unexpectedly

Understanding abnormal termination is crucial for several reasons:

Debugging: When your program crashes, you need to understand what happened and why
Robustness: You must design programs that handle termination signals gracefully
System Administration: You need to monitor, diagnose, and manage misbehaving processes
Security: Abnormal termination can indicate attacks, memory corruption, or critical failures

What You Will Learn

Taxonomy of Abnormal Termination

Abnormal termination can be categorized by its source and cause. Understanding this taxonomy helps in diagnosing issues and designing robust systems.

Categories of Abnormal Termination
Category	Source	Examples	Exit Status Behavior
Signal-Induced	External or Internal	SIGTERM, SIGKILL, SIGINT (Ctrl+C)	128 + signal_number (shell convention)
Hardware Exceptions	CPU/MMU	SIGSEGV, SIGBUS, SIGFPE	Typically generates core dump
Programmatic Abort	Application	abort(), assert() failure	SIGABRT, usually with core dump
Kernel-Initiated	Operating System	OOM killer, resource limits	SIGKILL or SIGXCPU/SIGXFSZ
Parent-Child Protocol	Parent Process	SIGHUP when terminal closes	Depends on signal handling

Signal-Based Termination Model

Most abnormal terminations in Unix-like systems work through the signal mechanism. When a fatal condition occurs:

The condition is detected (by hardware, kernel, or software)
A signal is generated and delivered to the process
The default signal handler (or custom handler) executes
If the default action is termination, the process ends

This unified model means that understanding signals is the key to understanding abnormal termination.

Converting Mermaid diagram...

Termination Signals In Depth

Several signals have termination as their default action. Understanding each is essential for proper process management.

SIGTERM (15) - Polite Termination Request

SIGTERM is the standard signal for requesting process termination. It's "polite" because:

The process can catch it and perform cleanup
The process can even ignore it (though this is usually bad practice)
It gives the process a chance to save state, close connections, etc.

kill -TERM 1234   # Send SIGTERM to PID 1234
kill 1234         # Same, SIGTERM is the default

SIGKILL (9) - Forcible Termination

SIGKILL cannot be caught, blocked, or ignored. When delivered:

The process terminates immediately
No cleanup code runs
Resources are forcibly released by the kernel
Exit status will be 137 (128 + 9)

kill -KILL 1234   # Hard kill
kill -9 1234      # Same

SIGKILL: Last Resort Only

SIGINT (2) - Interactive Interrupt

Generated when the user presses Ctrl+C at the terminal. Key characteristics:

Default action is termination
Can be caught to prompt for confirmation or save work
Only delivered to the foreground process group

SIGQUIT (3) - Quit with Core Dump

Generated by Ctrl+\ at the terminal. Similar to SIGINT but:

Default action is termination WITH core dump
Useful for debugging: generates crash dump of running program
Often used when SIGINT doesn't work

SIGHUP (1) - Hangup

Originally meant "terminal hung up" (modem disconnection). Now used for:

Session leader termination (when terminal closes)
Daemon configuration reload (by convention)
Termination of processes tied to a terminal session

signal_handling_demo.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
#include <stdbool.h>
 
volatile sig_atomic_t shutdown_requested = 0;
 
// Signal handler for graceful shutdown
void shutdown_handler(int signum) {
    // Only async-signal-safe operations here!
    const char* msg = "\nShutdown signal received...\n";
    write(STDOUT_FILENO, msg, strlen(msg));
    shutdown_requested = 1;
}
 
// Handler for SIGQUIT - save state before terminating
void quit_handler(int signum) {
    const char* msg = "\nSIGQUIT: Saving state before exit...\n";
    write(STDOUT_FILENO, msg, strlen(msg));
    
    // In real code: dump state to file for debugging
    // Note: This handler allows the core dump to occur
    
    // Reset to default handler to get core dump
    signal(SIGQUIT, SIG_DFL);
    raise(SIGQUIT);
}
 
void setup_signal_handlers() {
    struct sigaction sa;
    
    // Setup SIGTERM and SIGINT for graceful shutdown
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = shutdown_handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = 0;
    
    if (sigaction(SIGTERM, &sa, NULL) == -1) {
        perror("sigaction SIGTERM");
        exit(1);
    }
    
    if (sigaction(SIGINT, &sa, NULL) == -1) {
        perror("sigaction SIGINT");
        exit(1);
    }
    
    // Special handling for SIGQUIT
    sa.sa_handler = quit_handler;
    if (sigaction(SIGQUIT, &sa, NULL) == -1) {
        perror("sigaction SIGQUIT");
        exit(1);
    }
    
    // SIGHUP - reload config or ignore
    signal(SIGHUP, SIG_IGN);
    
    // Note: SIGKILL cannot be caught or ignored
    // signal(SIGKILL, handler);  // This would fail
}
 
void cleanup() {
    printf("Performing cleanup...\n");
    printf("- Closing database connections\n");
    printf("- Flushing caches\n");
    printf("- Removing temporary files\n");
    printf("- Notifying peers of shutdown\n");
    printf("Cleanup complete.\n");
}
 
int main() {
    setup_signal_handlers();
    
    printf("Server running. PID: %d\n", getpid());
    printf("Send SIGTERM or SIGINT to shutdown gracefully.\n");
    printf("Send SIGQUIT for core dump.\n");
    printf("Send SIGKILL to terminate immediately.\n\n");
    
    // Main loop - check shutdown flag periodically
    while (!shutdown_requested) {
        printf("Working... (try Ctrl+C or kill %d)\n", getpid());
        sleep(2);
    }
    
    // Graceful shutdown
    cleanup();
    printf("Server shutdown complete.\n");
    
    return EXIT_SUCCESS;
}
 
/*
 * Demonstration:
 * 
 * $ ./server
 * Server running. PID: 12345
 * Working...
 * Working...
 * ^C
 * Shutdown signal received...
 * Performing cleanup...
 * - Closing database connections
 * - Flushing caches
 * ...
 * Server shutdown complete.
 * 
 * $ echo $?
 * 0
 */

Complete Termination Signal Reference

•SIGTERM (15): Termination request - catchable, default for 'kill'
•SIGKILL (9): Forced termination - cannot be caught or ignored
•SIGINT (2): Interactive interrupt (Ctrl+C) - catchable
•SIGQUIT (3): Quit with core dump (Ctrl+\) - catchable
•SIGHUP (1): Terminal hangup - often used for config reload
•SIGPIPE (13): Write to pipe with no readers - catchable
•SIGALRM (14): Alarm timer expired - catchable
•SIGUSR1/SIGUSR2 (10/12): User-defined signals

Hardware Exceptions and Fatal Errors

Some signals don't originate from software but from the CPU itself. When the processor encounters an illegal operation, it raises a hardware exception that the kernel converts into a signal.

SIGSEGV (11) - Segmentation Fault

The most common crash cause. Triggered when a process attempts:

Reading from or writing to an invalid memory address
Accessing memory without proper permissions
Dereferencing a NULL pointer
Stack overflow
Buffer overflow beyond mapped memory

The CPU's Memory Management Unit (MMU) detects the invalid access, raises an exception, and the kernel delivers SIGSEGV.

segfault_examples.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
 
// Example 1: NULL pointer dereference
void null_pointer_crash() {
    int* ptr = NULL;
    *ptr = 42;  // SIGSEGV: writing to address 0
}
 
// Example 2: Array out of bounds (when it hits unmapped memory)
void buffer_overflow_crash() {
    char buffer[10];
    for (int i = 0; i < 10000; i++) {
        buffer[i] = 'A';  // Eventually hits unmapped memory
    }
}
 
// Example 3: Stack overflow via infinite recursion
void infinite_recursion() {
    char buffer[1024];  // Stack allocation
    infinite_recursion();  // Eventually exhausts stack space
}
 
// Example 4: Use after free
void use_after_free_crash() {
    int* ptr = malloc(sizeof(int));
    *ptr = 42;
    free(ptr);
    // ptr is now a dangling pointer
    *ptr = 100;  // Undefined behavior, may cause SIGSEGV
}
 
// Example 5: Write to read-only memory
void write_to_readonly() {
    char* str = "Hello";  // String literal, stored in read-only section
    str[0] = 'J';  // SIGSEGV: writing to read-only memory
}
 
// Signal handler to catch SIGSEGV (for demonstration only)
void segfault_handler(int signum) {
    const char* msg = "Caught SIGSEGV! Program will exit.\n";
    write(2, msg, strlen(msg));
    
    // Cannot safely continue after SIGSEGV
    // Must reset handler and re-raise or exit
    signal(SIGSEGV, SIG_DFL);
    raise(SIGSEGV);  // Re-raise to get core dump
}
 
int main() {
    printf("SIGSEGV demonstration\n");
    printf("Uncomment one of the crash functions to see the effect.\n\n");
    
    // Optional: Install handler (for demonstration)
    // signal(SIGSEGV, segfault_handler);
    
    // Uncomment one to trigger:
    // null_pointer_crash();
    // buffer_overflow_crash();
    // infinite_recursion();
    // use_after_free_crash();
    // write_to_readonly();
    
    printf("No crash triggered.\n");
    return 0;
}

SIGSEGV Cannot Be Meaningfully Handled

SIGBUS (7) - Bus Error

Similar to SIGSEGV but indicates a different class of memory errors:

Accessing improperly aligned memory (on architectures that require alignment)
Accessing memory-mapped files beyond their actual size
Other physical memory access problems

SIGFPE (8) - Floating Point Exception

Despite the name, SIGFPE covers integer arithmetic errors too:

Integer division by zero
Integer overflow (when enabled)
Floating-point exceptions (invalid operation, divide-by-zero, overflow, underflow)

SIGILL (4) - Illegal Instruction

Raised when the CPU encounters an invalid or privileged instruction:

Corrupted code segment
Executing data as code (often indicates exploit attempts)
Using instructions not supported by the CPU

arithmetic_crash.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <stdio.h>
#include <signal.h>
#include <fenv.h>
#include <string.h>
#include <unistd.h>
 
void fpe_handler(int signum) {
    const char* msg = "\nCaught SIGFPE: Arithmetic error!\n";
    write(STDERR_FILENO, msg, strlen(msg));
    _exit(1);
}
 
int main() {
    // Install handler
    signal(SIGFPE, fpe_handler);
    
    printf("Demonstrating arithmetic exceptions...\n\n");
    
    // Example 1: Integer division by zero
    // This will raise SIGFPE on most systems
    int x = 10;
    int y = 0;
    
    printf("About to divide %d by %d...\n", x, y);
    
    // Uncommenting will cause SIGFPE:
    // int result = x / y;
    // printf("Result: %d\n", result);
    
    // Example 2: Enable floating-point exceptions
    // (Normally masked and result in NaN/Inf)
    feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);
    
    printf("Floating-point exceptions enabled.\n");
    
    double a = 1.0;
    double b = 0.0;
    
    printf("About to compute %f / %f...\n", a, b);
    
    // This will now raise SIGFPE instead of returning infinity:
    // double fresult = a / b;
    
    printf("No crash triggered.\n");
    return 0;
}

The abort() Function

How abort() Works:

Flushes all open stdio streams (like exit())
Raises SIGABRT signal to the calling process
If SIGABRT is caught and the handler returns, abort() resets to default and raises SIGABRT again
Default SIGABRT action: terminate with core dump

The double-raise mechanism ensures that abort() always terminates the process—even if SIGABRT is caught.

abort_usage.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
#include <unistd.h>
#include <stdbool.h>
 
// Invariant checking macro
#define INVARIANT(condition, message) \
    do { \
        if (!(condition)) { \
            fprintf(stderr, "INVARIANT VIOLATION: %s\n", message); \
            fprintf(stderr, "  At: %s:%d in %s()\n", \
                    __FILE__, __LINE__, __func__); \
            abort(); \
        } \
    } while (0)
 
// Custom assert with more info
#define ASSERT(condition) \
    do { \
        if (!(condition)) { \
            fprintf(stderr, "ASSERTION FAILED: %s\n", #condition); \
            fprintf(stderr, "  At: %s:%d in %s()\n", \
                    __FILE__, __LINE__, __func__); \
            abort(); \
        } \
    } while (0)
 
// Example: Attempting to catch SIGABRT
static volatile bool handler_called = false;
 
void sigabrt_handler(int signum) {
    const char* msg = "SIGABRT handler called!\n";
    write(STDERR_FILENO, msg, strlen(msg));
    handler_called = true;
    // Returning from this handler causes abort() to re-raise SIGABRT
    // with default handler, ensuring termination
}
 
// Example usage
void process_data(int* data, size_t len) {
    INVARIANT(data != NULL, "data pointer must not be NULL");
    INVARIANT(len > 0, "data length must be positive");
    
    // Process the data...
    printf("Processing %zu elements...\n", len);
}
 
void check_system_state() {
    int available_memory = 1024;  // Example
    int required_memory = 2048;   // Example
    
    if (available_memory < required_memory) {
        fprintf(stderr, "FATAL: Insufficient memory (%d < %d)\n",
                available_memory, required_memory);
        abort();  // Cannot continue safely
    }
}
 
int main() {
    printf("Demonstrating abort() behavior...\n\n");
    
    // Install SIGABRT handler
    signal(SIGABRT, sigabrt_handler);
    
    // Example 1: Normal operation with invariant checks
    int data[] = {1, 2, 3, 4, 5};
    process_data(data, 5);  // OK
    
    // Example 2: This would trigger invariant violation
    // process_data(NULL, 0);  // Abort!
    
    // Example 3: Explicit abort for unrecoverable error
    // check_system_state();  // Might abort
    
    // Example 4: Standard library assert (define NDEBUG to disable)
    // assert(1 == 0);  // Abort!
    
    // Example 5: Direct abort
    printf("\nAbout to call abort()...\n");
    // abort();  // Uncomment to see abort behavior
    
    printf("No abort triggered.\n");
    return 0;
}

When to Use abort()

The assert() Macro

The standard assert() macro is the most common way to trigger abort():

#include <assert.h>

void process(int* ptr) {
    assert(ptr != NULL);  // Aborts if ptr is NULL
    // ... use ptr ...
}

Key properties of assert():

Disabled when NDEBUG is defined (typically in release builds)
Prints file, line number, and failed expression before aborting
Calls abort(), generating a core dump
Do NOT use for runtime error handling in production code

assert() is a development tool. Production code should handle errors gracefully, not abort.

Kernel-Initiated Termination

Sometimes the kernel itself decides to terminate a process. This happens to protect system stability when processes misbehave or exceed resource limits.

The OOM Killer (Out of Memory Killer)

When the system runs critically low on memory and cannot allocate more for a requesting process, the Linux kernel invokes the OOM killer to terminate processes and free memory. The OOM killer:

Calculates a "badness score" for each process
Considers memory usage, CPU time, privileges, and oom_score_adj
Selects the process with the highest score
Sends SIGKILL to terminate it

The goal is to kill the minimum number of processes to free enough memory while preserving system stability.

oom_score_demo.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/bin/bash
# Examining and adjusting OOM killer behavior
 
# View OOM score for a process (higher = more likely to be killed)
cat /proc/$$/oom_score
 
# View the OOM score adjustment (-1000 to 1000)
cat /proc/$$/oom_score_adj
 
# Make current process less likely to be killed
# (Requires root for negative values)
echo -500 > /proc/$$/oom_score_adj
 
# Make a process immune to OOM killer (dangerous!)
# echo -1000 > /proc/PID/oom_score_adj
 
# Check which process was last killed by OOM
dmesg | grep -i "killed process"
 
# Example output:
# Out of memory: Killed process 12345 (memory_hog) total-vm:8388608kB
 
# Monitor OOM events in real-time
dmesg -w | grep -i oom

OOM Killer Consequences

Resource Limit Violations

Unix systems allow setting resource limits per process. When these are exceeded, the kernel sends signals:

Limit	Signal	Description
RLIMIT_CPU	SIGXCPU	CPU time limit exceeded
RLIMIT_FSIZE	SIGXFSZ	File size limit exceeded
RLIMIT_CORE	(no signal)	Controls core dump size
RLIMIT_DATA	(allocation fails)	Data segment size limit
RLIMIT_STACK	SIGSEGV	Stack size limit exceeded

resource_limits.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
#include <stdio.h>
#include <stdlib.h>
#include <sys/resource.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
 
void sigxcpu_handler(int signum) {
    const char* msg = "\nCPU time limit exceeded! Finishing up...\n";
    write(STDERR_FILENO, msg, strlen(msg));
    // Signal received once at soft limit
    // SIGKILL comes at hard limit
    _exit(1);
}
 
void print_limits() {
    struct rlimit rl;
    
    getrlimit(RLIMIT_CPU, &rl);
    printf("CPU time limit: soft=%ld, hard=%ld seconds\n",
           rl.rlim_cur, rl.rlim_max);
    
    getrlimit(RLIMIT_FSIZE, &rl);
    printf("File size limit: soft=%ld, hard=%ld bytes\n",
           rl.rlim_cur, rl.rlim_max);
    
    getrlimit(RLIMIT_CORE, &rl);
    printf("Core file size: soft=%ld, hard=%ld bytes\n",
           rl.rlim_cur, rl.rlim_max);
    
    getrlimit(RLIMIT_STACK, &rl);
    printf("Stack size: soft=%ld, hard=%ld bytes\n",
           rl.rlim_cur, rl.rlim_max);
}
 
void set_cpu_limit(int seconds) {
    struct rlimit rl;
    
    rl.rlim_cur = seconds;       // Soft limit: SIGXCPU
    rl.rlim_max = seconds + 5;   // Hard limit: SIGKILL
    
    if (setrlimit(RLIMIT_CPU, &rl) == -1) {
        perror("setrlimit RLIMIT_CPU");
        exit(1);
    }
    
    printf("Set CPU limit to %d seconds (SIGKILL at %d)\n",
           seconds, seconds + 5);
}
 
void cpu_intensive_work() {
    double x = 1.1;
    while (1) {
        x *= 1.0000001;
        for (int i = 0; i < 1000000; i++) {
            x = x * 1.0000001 / 1.0000001;
        }
    }
}
 
int main() {
    printf("Resource Limit Demonstration\n");
    printf("============================\n\n");
    
    print_limits();
    printf("\n");
    
    // Set up SIGXCPU handler
    signal(SIGXCPU, sigxcpu_handler);
    
    // Set a 2-second CPU time limit
    set_cpu_limit(2);
    
    printf("\nStarting CPU-intensive work...\n");
    printf("Will receive SIGXCPU after 2 seconds\n\n");
    
    cpu_intensive_work();
    
    // Never reached
    return 0;
}
 
/*
 * Output:
 * Resource Limit Demonstration
 * ============================
 * 
 * CPU time limit: soft=-1, hard=-1 seconds  (unlimited)
 * ...
 * Set CPU limit to 2 seconds (SIGKILL at 7)
 * 
 * Starting CPU-intensive work...
 * 
 * CPU time limit exceeded! Finishing up...
 */

Core Dumps: Post-Mortem Debugging

Signals That Generate Core Dumps:

SIGABRT (abort)
SIGBUS (bus error)
SIGFPE (floating-point exception)
SIGILL (illegal instruction)
SIGQUIT (quit)
SIGSEGV (segmentation fault)
SIGSYS (bad system call)
SIGTRAP (trace/breakpoint trap)
SIGXCPU (CPU time exceeded, sometimes)
SIGXFSZ (file size exceeded, sometimes)

core_dump_setup.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#!/bin/bash
# Core dump configuration and usage
 
# Check current core dump settings
ulimit -c
# 0 = disabled, unlimited = no size limit
 
# Enable core dumps for current shell session
ulimit -c unlimited
 
# Check core pattern (where core files go)
cat /proc/sys/kernel/core_pattern
# Examples:
# core                    -> ./core
# core.%p                 -> ./core.1234 (with PID)
# /tmp/core.%e.%p         -> /tmp/core.myprogram.1234
# |/usr/lib/systemd/...   -> Piped to coredumpctl (systemd)
 
# Set custom core pattern (requires root)
# %p = PID, %e = executable name, %t = timestamp
echo "/tmp/cores/core.%e.%p.%t" | sudo tee /proc/sys/kernel/core_pattern
 
# Create a test crash
cat > /tmp/crash_test.c << 'EOF'
#include <signal.h>
int main() {
    raise(SIGSEGV);  // Deliberate crash
    return 0;
}
EOF
 
gcc -g -o /tmp/crash_test /tmp/crash_test.c
/tmp/crash_test
 
# Analyze the core dump
ls -la core* /tmp/cores/core*
 
# Use gdb to analyze
gdb /tmp/crash_test core.XXXXX
 
# In gdb:
#   bt            - backtrace
#   info registers - CPU registers
#   x/20x $sp     - examine stack
#   list          - show source code

Debug Builds for Useful Core Dumps

systemd-coredump Integration:

Modern Linux systems using systemd often pipe core dumps to systemd-coredump, which:

Compresses and stores core dumps in /var/lib/systemd/coredump/
Records metadata (signal, PID, executable, timestamp)
Provides the coredumpctl utility for analysis
Automatically prunes old dumps to save space

coredumpctl_usage.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/bin/bash
# Using coredumpctl on systemd systems
 
# List recent core dumps
coredumpctl list
 
# View information about most recent dump
coredumpctl info
 
# View info for specific PID or executable
coredumpctl info 12345
coredumpctl info /usr/bin/myprogram
 
# Launch debugger on most recent dump
coredumpctl debug
 
# Launch debugger on specific dump
coredumpctl debug MATCH
 
# Export core to file
coredumpctl dump -o /tmp/mycore.core
 
# Example output of coredumpctl list:
#
# TIME                            PID  UID  GID SIG COREFILE EXE
# Thu 2024-01-15 10:23:45 EST   12345 1000 1000  11 present  /usr/bin/test
# Thu 2024-01-15 09:15:32 EST   12000 1000 1000   6 present  /usr/bin/myapp
 
# Clean up old core dumps
sudo journalctl --vacuum-size=500M

Designing for Abnormal Termination

Robust software must handle abnormal termination gracefully. While we can't always prevent crashes, we can minimize their impact through careful design.

Key Defensive Strategies

•Signal Handlers for Cleanup: Catch SIGTERM/SIGINT and perform essential cleanup before exiting. Don't try to catch SIGSEGV for recovery—it rarely works.
•Write-Ahead Logging (WAL): Log intended operations before performing them. If the process crashes mid-operation, the log enables recovery.
•Atomic Operations: Use atomic file operations (rename, link) instead of in-place modification. Crashes during write leave the original intact.
•Crash-Only Design: Design services to recover safely from abrupt termination. Assume you'll crash and design for it.
•Process Supervision: Use supervisors (systemd, supervisord) to restart crashed processes automatically.
•Checkpoint/Restart: Periodically save state to allow resumption after crashes, especially for long-running computations.

graceful_shutdown_pattern.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
 
// Global state for signal handler communication
static volatile sig_atomic_t shutdown_flag = 0;
static volatile sig_atomic_t checkpoint_flag = 0;
 
// Signal handlers (minimal work, async-signal-safe only)
void terminate_handler(int signum) {
    shutdown_flag = 1;
}
 
void checkpoint_handler(int signum) {
    checkpoint_flag = 1;
}
 
// Save state to a temporary file, then atomically rename
int save_checkpoint(int iteration, double value) {
    char temp_name[] = "/tmp/checkpoint.XXXXXX";
    int fd = mkstemp(temp_name);
    if (fd < 0) {
        perror("mkstemp");
        return -1;
    }
    
    // Write checkpoint data
    char buffer[256];
    int len = snprintf(buffer, sizeof(buffer), 
                       "iteration=%d\nvalue=%.15f\n", 
                       iteration, value);
    
    if (write(fd, buffer, len) != len) {
        perror("write");
        close(fd);
        unlink(temp_name);
        return -1;
    }
    
    // Ensure data is on disk
    if (fsync(fd) < 0) {
        perror("fsync");
        close(fd);
        unlink(temp_name);
        return -1;
    }
    close(fd);
    
    // Atomically replace old checkpoint
    if (rename(temp_name, "/tmp/checkpoint.dat") < 0) {
        perror("rename");
        unlink(temp_name);
        return -1;
    }
    
    printf("Checkpoint saved: iteration=%d, value=%.6f\n", 
           iteration, value);
    return 0;
}
 
// Load checkpoint if available
int load_checkpoint(int* iteration, double* value) {
    FILE* f = fopen("/tmp/checkpoint.dat", "r");
    if (!f) {
        if (errno == ENOENT) {
            *iteration = 0;
            *value = 1.0;
            return 0;  // No checkpoint, start fresh
        }
        perror("fopen checkpoint");
        return -1;
    }
    
    if (fscanf(f, "iteration=%d\n", iteration) != 1 ||
        fscanf(f, "value=%lf\n", value) != 1) {
        fclose(f);
        fprintf(stderr, "Corrupted checkpoint\n");
        return -1;
    }
    
    fclose(f);
    printf("Restored from checkpoint: iteration=%d, value=%.6f\n",
           *iteration, *value);
    return 0;
}
 
void setup_signal_handlers() {
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    
    // SIGTERM/SIGINT: graceful shutdown
    sa.sa_handler = terminate_handler;
    sigaction(SIGTERM, &sa, NULL);
    sigaction(SIGINT, &sa, NULL);
    
    // SIGUSR1: checkpoint request
    sa.sa_handler = checkpoint_handler;
    sigaction(SIGUSR1, &sa, NULL);
    
    // Ignore SIGPIPE
    signal(SIGPIPE, SIG_IGN);
}
 
int main() {
    int iteration;
    double value;
    
    printf("Robust computation example\n");
    printf("PID: %d\n", getpid());
    printf("Send SIGUSR1 to checkpoint, SIGTERM to shutdown\n\n");
    
    setup_signal_handlers();
    
    // Restore from checkpoint if available
    if (load_checkpoint(&iteration, &value) < 0) {
        fprintf(stderr, "Failed to load checkpoint\n");
        return 1;
    }
    
    // Main processing loop
    while (!shutdown_flag && iteration < 1000000) {
        // Do some work
        value = value * 1.000001;
        iteration++;
        
        // Check for checkpoint request
        if (checkpoint_flag) {
            checkpoint_flag = 0;
            save_checkpoint(iteration, value);
        }
        
        // Periodic checkpoint every 100000 iterations
        if (iteration % 100000 == 0) {
            save_checkpoint(iteration, value);
        }
        
        // Simulate work
        if (iteration % 50000 == 0) {
            printf("Progress: iteration=%d, value=%.6f\n", 
                   iteration, value);
        }
        
        usleep(100);  // Throttle for demo
    }
    
    // Clean shutdown
    printf("\nShutting down...\n");
    save_checkpoint(iteration, value);
    printf("Final state saved. Goodbye.\n");
    
    return 0;
}

The Crash-Only Philosophy

Summary: Abnormal Termination Mastery

We've explored the complete landscape of abnormal process termination—from user-initiated signals to hardware exceptions to kernel intervention. Let's consolidate the essential knowledge:

Key Takeaways

•Most abnormal terminations work through signals: SIGSEGV for crashes, SIGTERM for shutdown requests, SIGKILL for forced termination
•SIGTERM is catchable; SIGKILL is not: Always handle SIGTERM for cleanup, use SIGKILL only as last resort
•Hardware exceptions (SIGSEGV, SIGFPE, SIGBUS) indicate bugs—catch them for logging only, not recovery
•abort() guarantees termination even if SIGABRT is caught—used for internal error detection
•The OOM killer uses SIGKILL: No cleanup possible; protect critical processes with oom_score_adj
•Core dumps enable post-mortem debugging: Configure properly and compile with debug symbols
•Design for crashes: Write-ahead logging, atomic operations, and checkpoint/restart make crashes survivable
•Shell exit codes 128+N indicate signal N: e.g., 139 = 128+11 = SIGSEGV

What's Next:

Page Complete

2 / 5