Exec Family - Learning Module

Loading content...

0/240

Replacing the Process Image

The Complete Transformation

When exec() succeeds, something extraordinary happens: the calling process's entire memory image is destroyed and replaced with a completely new program. The code you were running, your stack, your heap, your data—all gone. In their place: a fresh program ready to execute from its entry point.

This isn't a simple memory copy or overlay. It's a carefully orchestrated transformation involving the kernel, the executable file format, the memory management unit (MMU), and the dynamic linker. Understanding this process reveals why exec() behaves the way it does and explains many of its subtle characteristics.

What You Will Learn

By the end of this page, you will understand exactly what happens when exec() transforms a process: which memory regions are replaced, which kernel resources persist, how the ELF loader works, and why this design enables both flexibility and security.

Understanding Process Memory Layout

Before understanding what exec() replaces, we must understand what a running process's memory looks like. Every Unix process has a virtual address space divided into distinct regions:

Converting Mermaid diagram...

Process Memory Segments
Segment	Contents	Permissions	Source
Text (Code)	Machine instructions	Read + Execute	Loaded from executable file
Data	Initialized global/static variables	Read + Write	Loaded from executable file
BSS	Uninitialized global/static vars	Read + Write	Zero-initialized by kernel
Heap	Dynamic memory (malloc)	Read + Write	Allocated at runtime
Stack	Function frames, locals	Read + Write	Grows automatically
Memory Map	Shared libs, mmap files	Varies	Mapped at load/runtime

When exec() executes, it replaces ALL of these user-space memory regions:

The old text segment (your code) → replaced with new program's code
The old data segment → replaced with new program's initialized data
The old BSS → replaced with new program's BSS (zeroed)
The old heap → discarded entirely (new program starts with empty heap)
The old stack → discarded entirely (new stack created with argc/argv/envp)
The old memory mappings → unmapped (except for new program's shared libraries)

This is a complete wipe. Nothing from your old program's memory survives into the new program.

What exec() Destroys

Let's be explicit about what is completely destroyed when exec() succeeds:

Completely Destroyed by exec()

•All code (text segment) — Your program's instructions are gone. There is no way back.
•All global and static variables — Both initialized (data) and uninitialized (BSS) are replaced.
•All dynamically allocated memory (heap) — Everything from malloc/calloc is gone. Memory leaks become irrelevant.
•The entire call stack — All local variables, all function calls in progress, all return addresses.
•All memory-mapped regions — Anonymous mappings, file mappings (except the new executable).
•Signal handlers — Custom signal handlers are reset to default (caught signals → SIG_DFL).
•Memory locks (mlock) — All are released.
•Thread state — In multi-threaded programs, all threads except the one calling exec() are terminated.
•Thread-local storage — Destroyed with the threads.
•Pending signals for ignored dispositions — Signals set to SIG_IGN remain ignored.

No Cleanup Happens

Because exec() replaces your code, no cleanup code can run. Destructors don't execute. atexit() handlers don't run. Open file handles aren't explicitly closed by your code. This is why exec() has specific guarantees about inherited resources—the new program must be able to function with what it inherits.

no_cleanup_demo.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
 
void cleanup(void) {
    // This function is registered with atexit()
    // But it will NEVER be called if exec() succeeds!
    printf("Cleanup running...
");
}
 
int main() {
    atexit(cleanup);  // Register cleanup function
    
    char *huge_buffer = malloc(1000000000);  // 1GB allocation
    // Memory "leak" - but exec() will reclaim everything anyway
    
    FILE *fp = fopen("/tmp/test.txt", "w");
    fprintf(fp, "Some data...
");
    // File not explicitly closed - but exec() handles it
    
    printf("About to exec...
");
    execl("/bin/echo", "echo", "New program running!", NULL);
    
    // Only reached on exec failure:
    printf("exec failed!
");
    cleanup();  // Would have to call manually on failure
    return 1;
}
 
// When exec() succeeds:
// - cleanup() never called
// - 1GB buffer released by kernel (not a leak!)
// - FILE* stream discarded (underlying fd may persist)
// - All user-mode state gone

What Survives exec()

While exec() destroys the memory image, it performs an image replacement within the same process. Certain kernel-level process attributes survive the transformation:

Preserved Across exec()

•Process ID (PID) — The fundamental identity remains. getpid() returns the same value before and after exec().
•Parent Process ID (PPID) — Relationship to parent unchanged.
•Process Group ID and Session ID — Terminal job control relationships persist.
•Real User ID and Real Group ID — The identity executing the process (unless setuid/setgid).
•Supplementary Group IDs — Additional group memberships.
•Controlling Terminal — TTY association remains.
•Current Working Directory — getcwd() returns same path.
•Root Directory — Chroot remains in effect.
•File Mode Creation Mask (umask) — Permission mask preserved.
•Resource Limits (rlimit) — CPU/memory/file limits carry over.
•Nice Value — Scheduling priority preserved.
•Pending Alarms — Alarms set with alarm() remain pending.
•File Locks — Advisory locks (fcntl/flock) remain held.
•Process Signal Mask — Which signals are blocked.
•Pending Signals — Signals awaiting delivery.
•Open File Descriptors — Unless marked close-on-exec (FD_CLOEXEC).

The close-on-exec flag (FD_CLOEXEC):

File descriptors are special. By default, they survive exec() and become available to the new program. But this can cause problems:

Security: Child shouldn't inherit sensitive file handles
Resource leaks: Forgotten open files accumulate
Conflicts: File descriptors might interfere with new program

The solution is the close-on-exec flag. When set, the kernel automatically closes that file descriptor during exec():

close_on_exec.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
 
int main() {
    // Method 1: Set O_CLOEXEC when opening
    int fd1 = open("/etc/passwd", O_RDONLY | O_CLOEXEC);
    // fd1 will be automatically closed when exec() runs
    
    // Method 2: Set FD_CLOEXEC after opening
    int fd2 = open("/etc/group", O_RDONLY);
    fcntl(fd2, F_SETFD, FD_CLOEXEC);
    // fd2 will also be automatically closed on exec()
    
    // File descriptor without CLOEXEC - survives exec()
    int fd3 = open("/tmp/shared.txt", O_RDONLY);
    // fd3 will be available to the new program
    
    printf("fd1=%d (CLOEXEC), fd2=%d (CLOEXEC), fd3=%d (inherited)
",
           fd1, fd2, fd3);
    
    execl("/bin/ls", "ls", "-l", "/proc/self/fd", NULL);
    // If exec succeeds:
    // - fd1 closed (CLOEXEC)
    // - fd2 closed (CLOEXEC)  
    // - fd3 inherited (still open as fd3 in new process)
    // - stdin(0), stdout(1), stderr(2) also inherited
    
    perror("exec failed");
    return 1;
}

Best Practice: Always Use O_CLOEXEC

Modern code should use O_CLOEXEC by default when opening files. Only leave it off when you explicitly want the child to inherit the file descriptor. This prevents file descriptor leaks and security issues from forgotten handles.

The ELF Loading Process

Most modern Unix systems use the ELF (Executable and Linkable Format) for executables. When exec() runs, the kernel must parse this format and load the program correctly. Let's trace through this process.

ELF File Structure:

An ELF executable contains:

+------------------------+
| ELF Header             |  <- Magic number, CPU arch, entry point
+------------------------+
| Program Headers        |  <- How to load into memory (segments)
+------------------------+
| Section Headers        |  <- For linkers/debuggers (optional at runtime)
+------------------------+
| .text                  |  <- Executable code
+------------------------+
| .rodata                |  <- Read-only data (string literals, etc.)
+------------------------+
| .data                  |  <- Initialized read-write data
+------------------------+
| .bss                   |  <- Uninitialized data (just size info)
+------------------------+
| .symtab, .strtab, ...  |  <- Symbol tables, debug info
+------------------------+

Converting Mermaid diagram...

Step-by-step ELF loading:

1. Validate the ELF Header

// First 4 bytes must be: 0x7f 'E' 'L' 'F'
// e_type must be ET_EXEC (executable) or ET_DYN (PIE/shared object)
// e_machine must match CPU architecture (EM_X86_64, etc.)
// e_version must be EV_CURRENT

2. Process Program Headers (PT_LOAD segments)

Each PT_LOAD segment specifies:

p_vaddr: Virtual address to load at
p_offset: Offset in file
p_filesz: Size in file (what to load)
p_memsz: Size in memory (may be > filesz for BSS)
p_flags: Permissions (PF_R, PF_W, PF_X)

3. Memory Mapping

The kernel uses mmap() internally to map each segment:

// For .text segment (read + execute):
mmap(p_vaddr, p_memsz, PROT_READ | PROT_EXEC, 
     MAP_PRIVATE | MAP_FIXED, fd, p_offset);

// For .data/.bss segment (read + write):
mmap(p_vaddr, p_memsz, PROT_READ | PROT_WRITE,
     MAP_PRIVATE | MAP_FIXED, fd, p_offset);

4. Zero-fill BSS

If p_memsz > p_filesz, the extra bytes (the BSS) are zero-initialized.

5. Dynamic Linking (if applicable)

If the executable has a PT_INTERP segment, the kernel loads the dynamic linker (typically /lib64/ld-linux-x86-64.so.2) and transfers control to it instead of directly to the program.

Demand Paging

The kernel doesn't actually read the entire executable into memory. It sets up page table entries that point to the file. When the CPU tries to execute code or access data, a page fault occurs, and only then does the kernel load that specific page. This is called demand paging—it makes exec() fast and memory-efficient.

Setting Up the Initial Stack

The kernel doesn't just load the program binary—it also prepares the initial stack with everything the program needs to start running. The stack setup follows a specific layout that the C runtime expects.

Initial Stack Layout (x86-64 Linux):

High addresses
┌─────────────────────────────────────┐
│ Platform-specific info (random)     │
│ ELF auxiliary vector (auxv)         │
│ NULL (end of envp)                  │
│ envp[m-1] -> "LAST_VAR=..."         │
│ ...                                 │
│ envp[1] -> "HOME=/home/user"        │
│ envp[0] -> "PATH=/usr/bin:/bin"     │
│ NULL (end of argv)                  │
│ argv[n-1] -> "last_argument"        │
│ ...                                 │
│ argv[1] -> "first_argument"         │
│ argv[0] -> "program_name"           │
│ argc (argument count)               │  <- %rsp points here
└─────────────────────────────────────┘
Low addresses (stack grows down)

stack_layout_demo.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <stdio.h>
 
// This program demonstrates how to access the initial stack
int main(int argc, char *argv[], char *envp[]) {
    printf("Stack layout demonstration:
 
");
    
    // argc: first item on stack
    printf("argc = %d
 
", argc);
    
    // argv: array of argument string pointers
    printf("argv array:
");
    for (int i = 0; i < argc; i++) {
        printf("  argv[%d] = %p -> \"%s\"
", i, argv[i], argv[i]);
    }
    printf("  argv[%d] = %p (NULL terminator)
 
", argc, argv[argc]);
    
    // envp: array of environment string pointers
    printf("envp array (first 5):
");
    for (int i = 0; envp[i] != NULL && i < 5; i++) {
        printf("  envp[%d] = %p -> \"%s\"
", i, envp[i], envp[i]);
    }
    printf("  ...
 
");
    
    // Demonstrating that envp follows argv
    printf("Memory layout verification:
");
    printf("  &argv[0]    = %p
", &argv[0]);
    printf("  &argv[argc] = %p (NULL)
", &argv[argc]);
    printf("  &envp[0]    = %p
", &envp[0]);
    // Note: envp typically starts right after argv's NULL terminator
    
    return 0;
}
 
/* Example output:
Stack layout demonstration:
 
argc = 3
 
argv array:
  argv[0] = 0x7ffc8b9a3f10 -> "./program"
  argv[1] = 0x7ffc8b9a3f1a -> "arg1"
  argv[2] = 0x7ffc8b9a3f1f -> "arg2"
  argv[3] = (nil) (NULL terminator)
 
envp array (first 5):
  envp[0] = 0x7ffc8b9a3f24 -> "PATH=/usr/bin:/bin"
  envp[1] = 0x7ffc8b9a3f37 -> "HOME=/home/user"
  ...
*/

The Auxiliary Vector (auxv):

Above the environment strings, the kernel places an "auxiliary vector" containing information the dynamic linker and program might need:

AT_ENTRY: Program entry point address
AT_PHDR: Address of program headers
AT_PHNUM: Number of program headers
AT_PAGESZ: System page size
AT_UID/AT_EUID: Real/effective user ID
AT_GID/AT_EGID: Real/effective group ID
AT_RANDOM: Pointer to 16 random bytes (for ASLR, stack canaries)
AT_SECURE: 1 if setuid execution
AT_PLATFORM: String identifying CPU platform
AT_HWCAP/AT_HWCAP2: CPU feature flags

Viewing the Auxiliary Vector

You can view the auxiliary vector with: LD_SHOW_AUXV=1 /bin/ls. This causes the dynamic linker to print the auxv entries before running the program. It's useful for understanding what information the kernel provides and for debugging startup issues.

Dynamic Linking During exec()

Most executables aren't statically linked—they depend on shared libraries like libc.so. When exec() loads such an executable, an additional step occurs: the dynamic linker (also called the interpreter or loader) is invoked.

The Dynamic Linking Process:

Kernel detects PT_INTERP segment in ELF headers
- This segment contains the path to the dynamic linker
- Typically: /lib64/ld-linux-x86-64.so.2 (Linux x86-64)
Kernel loads both the program AND the dynamic linker
- Both are mapped into the process address space
- Entry point is set to the dynamic linker, not the program
Dynamic linker takes control first
- Parses the program's dynamic section
- Loads required shared libraries (DT_NEEDED entries)
- Performs symbol resolution (binding)
- Handles relocations
Control transferred to program's entry point
- _start in the program begins executing
- All libraries are loaded and symbols resolved

Converting Mermaid diagram...

view_dynamic_deps.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# View the dynamic linker path (PT_INTERP)
$ readelf -l /bin/ls | grep interpreter
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
 
# List shared library dependencies
$ ldd /bin/ls
    linux-vdso.so.1 (0x00007ffcc3bfe000)
    libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
    /lib64/ld-linux-x86-64.so.2 (0x00007f8a3b600000)
 
# Trace dynamic linker operations
$ LD_DEBUG=libs /bin/ls 2>&1 | head -20
# Shows: finding libraries, loading, binding symbols...

Dynamic Linking Security

Dynamic linking introduces attack vectors. LD_PRELOAD can inject code, LD_LIBRARY_PATH can redirect library loading. For setuid programs, the kernel ignores these environment variables (AT_SECURE=1). Always be aware of the security implications when designing systems that exec() potentially untrusted code.

Script Execution: The Shebang (#!)

Not everything you exec() is a compiled binary. Scripts (Python, Bash, Perl, etc.) can also be executed. The kernel handles this through the shebang mechanism.

How shebang processing works:

Kernel reads the first two bytes of the file
If they are #! (0x23 0x21), it's an interpreter script
Kernel parses the rest of the first line to get the interpreter path
Kernel performs exec() on the interpreter, passing the script as an argument

shebang_examples.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/bin/bash
# This script starts with #!/bin/bash
# When you exec("./script.sh", ["script.sh", NULL])
# Kernel transforms it to: execve("/bin/bash", ["bash", "./script.sh"], envp)
 
#!/usr/bin/env python3
# Using 'env' is portable - it searches PATH for python3
# Transforms to: execve("/usr/bin/env", ["env", "python3", "./script.py"], envp)
# Then env finds python3 and execs it
 
#!/bin/bash -x
# Arguments can be included after the interpreter
# Note: only ONE argument is allowed after the path
# This becomes: execve("/bin/bash", ["bash", "-x", "./script.sh"], envp)
 
#!/bin/cat
# Even 'cat' works! This script prints itself
# Becomes: execve("/bin/cat", ["cat", "./script.sh"], envp)

Shebang Execution Transformation
You call	If script starts with	Kernel executes
`execve("./script.sh", ["./script.sh", "arg1"], envp)`	`#!/bin/bash`	`execve("/bin/bash", ["bash", "./script.sh", "arg1"], envp)`
`execve("./prog.py", ["prog.py"], envp)`	`#!/usr/bin/python3`	`execve("/usr/bin/python3", ["python3", "./prog.py"], envp)`
`execve("./tool.pl", ["tool.pl", "-v"], envp)`	`#!/usr/bin/perl`	`execve("/usr/bin/perl", ["perl", "./tool.pl", "-v"], envp)`

Shebang Limitations

The shebang line has limitations: it must be within the first few hundred bytes (varies by system, often 255-256), can only have one argument to the interpreter, and the interpreter path must be absolute (or use /usr/bin/env for PATH searching). Also, scripts must have execute permission even though the interpreter does the actual execution.

Security Implications of Process Image Replacement

Process image replacement has profound security implications. The complete destruction and recreation of the memory space serves as a security boundary, but several edge cases require careful attention.

Security Benefits of exec()

•Clean slate memory — All potentially compromised heap/stack data is destroyed
•Code integrity — Fresh code loaded from verified executable file
•Privilege separation — setuid can elevate privilege cleanly
•ASLR reset — New random memory layout for each exec
•Signal handler reset — Custom handlers don't carry over (except SIG_IGN)

Security Concerns with exec()

•File descriptor leaks — Forgotten FDs can expose data to child
•Environment manipulation — LD_PRELOAD, PATH attacks possible
•Race conditions — TOCTOU between check and exec
•Setuid complications — Inherited resources + elevated privileges
•Resource limits — May be too permissive for target program

Setuid/Setgid Execution:

When an executable has the setuid or setgid bit set, exec() causes a privilege transition:

File permissions: -rwsr-xr-x (setuid bit = 's')

Before exec:  RUID=1000 (user), EUID=1000 (user)
After exec:   RUID=1000 (user), EUID=0 (root) ← from file owner

This is how programs like sudo, passwd, and ping gain elevated privileges. The kernel automatically:

Sets EUID to the file owner (for setuid)
Sets EGID to the file group (for setgid)
Clears dangerous capabilities
Ignores certain environment variables (LD_PRELOAD, LD_LIBRARY_PATH)
Sets the AT_SECURE auxiliary vector flag

secure_exec.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
 
// Secure exec() pattern for privileged programs
void secure_exec(const char *program, char *const argv[]) {
    // 1. Close all unnecessary file descriptors
    //    (or use closefrom() where available)
    for (int fd = 3; fd < 1024; fd++) {
        close(fd);  // Ignore failure for non-open FDs
    }
    
    // 2. Create clean, controlled environment
    char *safe_env[] = {
        "PATH=/usr/bin:/bin",
        "IFS= \t
",
        "TERM=dumb",
        NULL
    };
    
    // 3. Reset signal handlers to default
    //    (exec does this for caught signals, but not SIG_IGN)
    for (int sig = 1; sig < 32; sig++) {
        signal(sig, SIG_DFL);
    }
    
    // 4. Clear signal mask
    sigset_t empty;
    sigemptyset(&empty);
    sigprocmask(SIG_SETMASK, &empty, NULL);
    
    // 5. Change to known directory
    chdir("/");
    
    // 6. Execute with controlled environment
    execve(program, argv, safe_env);
    
    // Failure - exit with error
    _exit(127);
}

Never Trust Inherited State in Privileged Programs

If you're writing a privileged program (setuid, daemon, etc.), never assume the inherited environment, file descriptors, or signal state are safe. Sanitize everything. Close FDs you don't need. Build a clean environment. Experienced attackers craft malicious process contexts to exploit careless privileged programs.

Exec Failure Modes and Atomicity

exec() is designed to be atomic with respect to process state: either it succeeds completely and the new program runs, or it fails completely and the original process continues unchanged. There is no partial exec.

The atomicity guarantee:

exec() validates everything before making any changes
The kernel checks permissions, opens files, validates formats first
Only after all checks pass does memory modification begin
Once memory modification begins, there's no rollback—failure = process termination

In practice, if exec() returns at all, it failed. If exec() fails, the original process is completely intact and can try alternative actions, report errors, or exit gracefully.

exec_failure_recovery.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
 
int main() {
    // Attempt to exec a program
    execl("/nonexistent/program", "program", NULL);
    
    // We only reach here if exec() failed
    // The original process state is completely preserved:
    // - Same PID
    // - Same memory (code, heap, stack all intact)
    // - Same file descriptors
    // - Same everything
    
    // We can now handle the error intelligently
    switch (errno) {
        case ENOENT:
            fprintf(stderr, "Program not found
");
            // Maybe try alternative path?
            execl("/usr/local/bin/program", "program", NULL);
            break;
            
        case EACCES:
            fprintf(stderr, "Permission denied
");
            // Maybe we need to gain privilege first?
            break;
            
        case ENOEXEC:
            fprintf(stderr, "Not an executable format
");
            // Maybe it's a script - try shell?
            execl("/bin/sh", "sh", "/path/to/script", NULL);
            break;
            
        default:
            fprintf(stderr, "exec failed: %s
", strerror(errno));
    }
    
    // Final fallback
    fprintf(stderr, "All exec attempts failed, exiting
");
    return 127;  // Convention for "command not found"
}

When Atomicity Isn't Quite Perfect

In rare edge cases, state can leak across a failed exec(). For example, in multithreaded programs, other threads may observe the exec() in progress before it fails. Signal handlers for asynchronous signals could theoretically see inconsistent state if exec fails mid-way through certain operations. These edge cases are extremely rare in practice.

Summary: Process Image Replacement

We've now explored the complete mechanism of process image replacement. Let's consolidate the key concepts:

Key Takeaways

•exec() destroys all memory — Text, data, BSS, heap, stack, and mappings are all replaced.
•PID and kernel resources survive — File descriptors (unless CLOEXEC), IDs, limits, working directory, and more persist.
•ELF loading is demand-paged — The kernel maps the executable file; pages are loaded on first access.
•The stack is pre-populated — argc, argv, envp, and the auxiliary vector are set up by the kernel.
•Dynamic linking adds complexity — The dynamic linker (ld.so) runs first, loading shared libraries.
•Scripts use shebang (#!) — The kernel invokes interpreters automatically based on file content.
•exec() is atomic — Either complete success or the original process continues unchanged.
•Security requires explicit sanitization — Don't trust inherited file descriptors, environment, or signal state in privileged programs.

What's next:

Now that you understand how the process image is replaced, we'll examine how arguments are passed to the new program. We'll dive deep into the argv mechanism, how argument strings are stored and accessed, limits on argument size, and best practices for constructing and parsing command-line arguments.

Page Complete

You now understand exactly what happens when exec() transforms a process: which memory regions are destroyed, which kernel resources persist, how ELF loading works, and how dynamic linking and script execution fit into the picture. Next, we'll explore the mechanics of passing arguments to the new program.

Replacing the Process Image

The Complete Transformation

What You Will Learn

Understanding Process Memory Layout

Before understanding what exec() replaces, we must understand what a running process's memory looks like. Every Unix process has a virtual address space divided into distinct regions:

Converting Mermaid diagram...

Process Memory Segments
Segment	Contents	Permissions	Source
Text (Code)	Machine instructions	Read + Execute	Loaded from executable file
Data	Initialized global/static variables	Read + Write	Loaded from executable file
BSS	Uninitialized global/static vars	Read + Write	Zero-initialized by kernel
Heap	Dynamic memory (malloc)	Read + Write	Allocated at runtime
Stack	Function frames, locals	Read + Write	Grows automatically
Memory Map	Shared libs, mmap files	Varies	Mapped at load/runtime

When exec() executes, it replaces ALL of these user-space memory regions:

The old text segment (your code) → replaced with new program's code
The old data segment → replaced with new program's initialized data
The old BSS → replaced with new program's BSS (zeroed)
The old heap → discarded entirely (new program starts with empty heap)
The old stack → discarded entirely (new stack created with argc/argv/envp)
The old memory mappings → unmapped (except for new program's shared libraries)

This is a complete wipe. Nothing from your old program's memory survives into the new program.

What exec() Destroys

Let's be explicit about what is completely destroyed when exec() succeeds:

Completely Destroyed by exec()

•All code (text segment) — Your program's instructions are gone. There is no way back.
•All global and static variables — Both initialized (data) and uninitialized (BSS) are replaced.
•All dynamically allocated memory (heap) — Everything from malloc/calloc is gone. Memory leaks become irrelevant.
•The entire call stack — All local variables, all function calls in progress, all return addresses.
•All memory-mapped regions — Anonymous mappings, file mappings (except the new executable).
•Signal handlers — Custom signal handlers are reset to default (caught signals → SIG_DFL).
•Memory locks (mlock) — All are released.
•Thread state — In multi-threaded programs, all threads except the one calling exec() are terminated.
•Thread-local storage — Destroyed with the threads.
•Pending signals for ignored dispositions — Signals set to SIG_IGN remain ignored.

No Cleanup Happens

no_cleanup_demo.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
 
void cleanup(void) {
    // This function is registered with atexit()
    // But it will NEVER be called if exec() succeeds!
    printf("Cleanup running...
");
}
 
int main() {
    atexit(cleanup);  // Register cleanup function
    
    char *huge_buffer = malloc(1000000000);  // 1GB allocation
    // Memory "leak" - but exec() will reclaim everything anyway
    
    FILE *fp = fopen("/tmp/test.txt", "w");
    fprintf(fp, "Some data...
");
    // File not explicitly closed - but exec() handles it
    
    printf("About to exec...
");
    execl("/bin/echo", "echo", "New program running!", NULL);
    
    // Only reached on exec failure:
    printf("exec failed!
");
    cleanup();  // Would have to call manually on failure
    return 1;
}
 
// When exec() succeeds:
// - cleanup() never called
// - 1GB buffer released by kernel (not a leak!)
// - FILE* stream discarded (underlying fd may persist)
// - All user-mode state gone

What Survives exec()

While exec() destroys the memory image, it performs an image replacement within the same process. Certain kernel-level process attributes survive the transformation:

Preserved Across exec()

•Process ID (PID) — The fundamental identity remains. getpid() returns the same value before and after exec().
•Parent Process ID (PPID) — Relationship to parent unchanged.
•Process Group ID and Session ID — Terminal job control relationships persist.
•Real User ID and Real Group ID — The identity executing the process (unless setuid/setgid).
•Supplementary Group IDs — Additional group memberships.
•Controlling Terminal — TTY association remains.
•Current Working Directory — getcwd() returns same path.
•Root Directory — Chroot remains in effect.
•File Mode Creation Mask (umask) — Permission mask preserved.
•Resource Limits (rlimit) — CPU/memory/file limits carry over.
•Nice Value — Scheduling priority preserved.
•Pending Alarms — Alarms set with alarm() remain pending.
•File Locks — Advisory locks (fcntl/flock) remain held.
•Process Signal Mask — Which signals are blocked.
•Pending Signals — Signals awaiting delivery.
•Open File Descriptors — Unless marked close-on-exec (FD_CLOEXEC).

The close-on-exec flag (FD_CLOEXEC):

File descriptors are special. By default, they survive exec() and become available to the new program. But this can cause problems:

Security: Child shouldn't inherit sensitive file handles
Resource leaks: Forgotten open files accumulate
Conflicts: File descriptors might interfere with new program

The solution is the close-on-exec flag. When set, the kernel automatically closes that file descriptor during exec():

close_on_exec.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
 
int main() {
    // Method 1: Set O_CLOEXEC when opening
    int fd1 = open("/etc/passwd", O_RDONLY | O_CLOEXEC);
    // fd1 will be automatically closed when exec() runs
    
    // Method 2: Set FD_CLOEXEC after opening
    int fd2 = open("/etc/group", O_RDONLY);
    fcntl(fd2, F_SETFD, FD_CLOEXEC);
    // fd2 will also be automatically closed on exec()
    
    // File descriptor without CLOEXEC - survives exec()
    int fd3 = open("/tmp/shared.txt", O_RDONLY);
    // fd3 will be available to the new program
    
    printf("fd1=%d (CLOEXEC), fd2=%d (CLOEXEC), fd3=%d (inherited)
",
           fd1, fd2, fd3);
    
    execl("/bin/ls", "ls", "-l", "/proc/self/fd", NULL);
    // If exec succeeds:
    // - fd1 closed (CLOEXEC)
    // - fd2 closed (CLOEXEC)  
    // - fd3 inherited (still open as fd3 in new process)
    // - stdin(0), stdout(1), stderr(2) also inherited
    
    perror("exec failed");
    return 1;
}

Best Practice: Always Use O_CLOEXEC

The ELF Loading Process

ELF File Structure:

An ELF executable contains:

+------------------------+
| ELF Header             |  <- Magic number, CPU arch, entry point
+------------------------+
| Program Headers        |  <- How to load into memory (segments)
+------------------------+
| Section Headers        |  <- For linkers/debuggers (optional at runtime)
+------------------------+
| .text                  |  <- Executable code
+------------------------+
| .rodata                |  <- Read-only data (string literals, etc.)
+------------------------+
| .data                  |  <- Initialized read-write data
+------------------------+
| .bss                   |  <- Uninitialized data (just size info)
+------------------------+
| .symtab, .strtab, ...  |  <- Symbol tables, debug info
+------------------------+

Converting Mermaid diagram...

Step-by-step ELF loading:

1. Validate the ELF Header

// First 4 bytes must be: 0x7f 'E' 'L' 'F'
// e_type must be ET_EXEC (executable) or ET_DYN (PIE/shared object)
// e_machine must match CPU architecture (EM_X86_64, etc.)
// e_version must be EV_CURRENT

2. Process Program Headers (PT_LOAD segments)

Each PT_LOAD segment specifies:

p_vaddr: Virtual address to load at
p_offset: Offset in file
p_filesz: Size in file (what to load)
p_memsz: Size in memory (may be > filesz for BSS)
p_flags: Permissions (PF_R, PF_W, PF_X)

3. Memory Mapping

The kernel uses mmap() internally to map each segment:

// For .text segment (read + execute):
mmap(p_vaddr, p_memsz, PROT_READ | PROT_EXEC, 
     MAP_PRIVATE | MAP_FIXED, fd, p_offset);

// For .data/.bss segment (read + write):
mmap(p_vaddr, p_memsz, PROT_READ | PROT_WRITE,
     MAP_PRIVATE | MAP_FIXED, fd, p_offset);

4. Zero-fill BSS

If p_memsz > p_filesz, the extra bytes (the BSS) are zero-initialized.

5. Dynamic Linking (if applicable)

If the executable has a PT_INTERP segment, the kernel loads the dynamic linker (typically /lib64/ld-linux-x86-64.so.2) and transfers control to it instead of directly to the program.

Demand Paging

Setting Up the Initial Stack

Initial Stack Layout (x86-64 Linux):

High addresses
┌─────────────────────────────────────┐
│ Platform-specific info (random)     │
│ ELF auxiliary vector (auxv)         │
│ NULL (end of envp)                  │
│ envp[m-1] -> "LAST_VAR=..."         │
│ ...                                 │
│ envp[1] -> "HOME=/home/user"        │
│ envp[0] -> "PATH=/usr/bin:/bin"     │
│ NULL (end of argv)                  │
│ argv[n-1] -> "last_argument"        │
│ ...                                 │
│ argv[1] -> "first_argument"         │
│ argv[0] -> "program_name"           │
│ argc (argument count)               │  <- %rsp points here
└─────────────────────────────────────┘
Low addresses (stack grows down)

stack_layout_demo.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <stdio.h>
 
// This program demonstrates how to access the initial stack
int main(int argc, char *argv[], char *envp[]) {
    printf("Stack layout demonstration:
 
");
    
    // argc: first item on stack
    printf("argc = %d
 
", argc);
    
    // argv: array of argument string pointers
    printf("argv array:
");
    for (int i = 0; i < argc; i++) {
        printf("  argv[%d] = %p -> \"%s\"
", i, argv[i], argv[i]);
    }
    printf("  argv[%d] = %p (NULL terminator)
 
", argc, argv[argc]);
    
    // envp: array of environment string pointers
    printf("envp array (first 5):
");
    for (int i = 0; envp[i] != NULL && i < 5; i++) {
        printf("  envp[%d] = %p -> \"%s\"
", i, envp[i], envp[i]);
    }
    printf("  ...
 
");
    
    // Demonstrating that envp follows argv
    printf("Memory layout verification:
");
    printf("  &argv[0]    = %p
", &argv[0]);
    printf("  &argv[argc] = %p (NULL)
", &argv[argc]);
    printf("  &envp[0]    = %p
", &envp[0]);
    // Note: envp typically starts right after argv's NULL terminator
    
    return 0;
}
 
/* Example output:
Stack layout demonstration:
 
argc = 3
 
argv array:
  argv[0] = 0x7ffc8b9a3f10 -> "./program"
  argv[1] = 0x7ffc8b9a3f1a -> "arg1"
  argv[2] = 0x7ffc8b9a3f1f -> "arg2"
  argv[3] = (nil) (NULL terminator)
 
envp array (first 5):
  envp[0] = 0x7ffc8b9a3f24 -> "PATH=/usr/bin:/bin"
  envp[1] = 0x7ffc8b9a3f37 -> "HOME=/home/user"
  ...
*/

The Auxiliary Vector (auxv):

Above the environment strings, the kernel places an "auxiliary vector" containing information the dynamic linker and program might need:

AT_ENTRY: Program entry point address
AT_PHDR: Address of program headers
AT_PHNUM: Number of program headers
AT_PAGESZ: System page size
AT_UID/AT_EUID: Real/effective user ID
AT_GID/AT_EGID: Real/effective group ID
AT_RANDOM: Pointer to 16 random bytes (for ASLR, stack canaries)
AT_SECURE: 1 if setuid execution
AT_PLATFORM: String identifying CPU platform
AT_HWCAP/AT_HWCAP2: CPU feature flags

Viewing the Auxiliary Vector

Dynamic Linking During exec()

The Dynamic Linking Process:

Kernel detects PT_INTERP segment in ELF headers
- This segment contains the path to the dynamic linker
- Typically: /lib64/ld-linux-x86-64.so.2 (Linux x86-64)
Kernel loads both the program AND the dynamic linker
- Both are mapped into the process address space
- Entry point is set to the dynamic linker, not the program
Dynamic linker takes control first
- Parses the program's dynamic section
- Loads required shared libraries (DT_NEEDED entries)
- Performs symbol resolution (binding)
- Handles relocations
Control transferred to program's entry point
- _start in the program begins executing
- All libraries are loaded and symbols resolved

Converting Mermaid diagram...

view_dynamic_deps.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# View the dynamic linker path (PT_INTERP)
$ readelf -l /bin/ls | grep interpreter
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
 
# List shared library dependencies
$ ldd /bin/ls
    linux-vdso.so.1 (0x00007ffcc3bfe000)
    libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
    /lib64/ld-linux-x86-64.so.2 (0x00007f8a3b600000)
 
# Trace dynamic linker operations
$ LD_DEBUG=libs /bin/ls 2>&1 | head -20
# Shows: finding libraries, loading, binding symbols...

Dynamic Linking Security

Script Execution: The Shebang (#!)

Not everything you exec() is a compiled binary. Scripts (Python, Bash, Perl, etc.) can also be executed. The kernel handles this through the shebang mechanism.

How shebang processing works:

Kernel reads the first two bytes of the file
If they are #! (0x23 0x21), it's an interpreter script
Kernel parses the rest of the first line to get the interpreter path
Kernel performs exec() on the interpreter, passing the script as an argument

shebang_examples.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/bin/bash
# This script starts with #!/bin/bash
# When you exec("./script.sh", ["script.sh", NULL])
# Kernel transforms it to: execve("/bin/bash", ["bash", "./script.sh"], envp)
 
#!/usr/bin/env python3
# Using 'env' is portable - it searches PATH for python3
# Transforms to: execve("/usr/bin/env", ["env", "python3", "./script.py"], envp)
# Then env finds python3 and execs it
 
#!/bin/bash -x
# Arguments can be included after the interpreter
# Note: only ONE argument is allowed after the path
# This becomes: execve("/bin/bash", ["bash", "-x", "./script.sh"], envp)
 
#!/bin/cat
# Even 'cat' works! This script prints itself
# Becomes: execve("/bin/cat", ["cat", "./script.sh"], envp)

Shebang Execution Transformation
You call	If script starts with	Kernel executes
`execve("./script.sh", ["./script.sh", "arg1"], envp)`	`#!/bin/bash`	`execve("/bin/bash", ["bash", "./script.sh", "arg1"], envp)`
`execve("./prog.py", ["prog.py"], envp)`	`#!/usr/bin/python3`	`execve("/usr/bin/python3", ["python3", "./prog.py"], envp)`
`execve("./tool.pl", ["tool.pl", "-v"], envp)`	`#!/usr/bin/perl`	`execve("/usr/bin/perl", ["perl", "./tool.pl", "-v"], envp)`

Shebang Limitations

Security Implications of Process Image Replacement

Security Benefits of exec()

•Clean slate memory — All potentially compromised heap/stack data is destroyed
•Code integrity — Fresh code loaded from verified executable file
•Privilege separation — setuid can elevate privilege cleanly
•ASLR reset — New random memory layout for each exec
•Signal handler reset — Custom handlers don't carry over (except SIG_IGN)

Security Concerns with exec()

•File descriptor leaks — Forgotten FDs can expose data to child
•Environment manipulation — LD_PRELOAD, PATH attacks possible
•Race conditions — TOCTOU between check and exec
•Setuid complications — Inherited resources + elevated privileges
•Resource limits — May be too permissive for target program

Setuid/Setgid Execution:

When an executable has the setuid or setgid bit set, exec() causes a privilege transition:

File permissions: -rwsr-xr-x (setuid bit = 's')

Before exec:  RUID=1000 (user), EUID=1000 (user)
After exec:   RUID=1000 (user), EUID=0 (root) ← from file owner

This is how programs like sudo, passwd, and ping gain elevated privileges. The kernel automatically:

Sets EUID to the file owner (for setuid)
Sets EGID to the file group (for setgid)
Clears dangerous capabilities
Ignores certain environment variables (LD_PRELOAD, LD_LIBRARY_PATH)
Sets the AT_SECURE auxiliary vector flag

secure_exec.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
 
// Secure exec() pattern for privileged programs
void secure_exec(const char *program, char *const argv[]) {
    // 1. Close all unnecessary file descriptors
    //    (or use closefrom() where available)
    for (int fd = 3; fd < 1024; fd++) {
        close(fd);  // Ignore failure for non-open FDs
    }
    
    // 2. Create clean, controlled environment
    char *safe_env[] = {
        "PATH=/usr/bin:/bin",
        "IFS= \t
",
        "TERM=dumb",
        NULL
    };
    
    // 3. Reset signal handlers to default
    //    (exec does this for caught signals, but not SIG_IGN)
    for (int sig = 1; sig < 32; sig++) {
        signal(sig, SIG_DFL);
    }
    
    // 4. Clear signal mask
    sigset_t empty;
    sigemptyset(&empty);
    sigprocmask(SIG_SETMASK, &empty, NULL);
    
    // 5. Change to known directory
    chdir("/");
    
    // 6. Execute with controlled environment
    execve(program, argv, safe_env);
    
    // Failure - exit with error
    _exit(127);
}

Never Trust Inherited State in Privileged Programs

Exec Failure Modes and Atomicity

The atomicity guarantee:

exec() validates everything before making any changes
The kernel checks permissions, opens files, validates formats first
Only after all checks pass does memory modification begin
Once memory modification begins, there's no rollback—failure = process termination

In practice, if exec() returns at all, it failed. If exec() fails, the original process is completely intact and can try alternative actions, report errors, or exit gracefully.

exec_failure_recovery.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
 
int main() {
    // Attempt to exec a program
    execl("/nonexistent/program", "program", NULL);
    
    // We only reach here if exec() failed
    // The original process state is completely preserved:
    // - Same PID
    // - Same memory (code, heap, stack all intact)
    // - Same file descriptors
    // - Same everything
    
    // We can now handle the error intelligently
    switch (errno) {
        case ENOENT:
            fprintf(stderr, "Program not found
");
            // Maybe try alternative path?
            execl("/usr/local/bin/program", "program", NULL);
            break;
            
        case EACCES:
            fprintf(stderr, "Permission denied
");
            // Maybe we need to gain privilege first?
            break;
            
        case ENOEXEC:
            fprintf(stderr, "Not an executable format
");
            // Maybe it's a script - try shell?
            execl("/bin/sh", "sh", "/path/to/script", NULL);
            break;
            
        default:
            fprintf(stderr, "exec failed: %s
", strerror(errno));
    }
    
    // Final fallback
    fprintf(stderr, "All exec attempts failed, exiting
");
    return 127;  // Convention for "command not found"
}

When Atomicity Isn't Quite Perfect

Summary: Process Image Replacement

We've now explored the complete mechanism of process image replacement. Let's consolidate the key concepts:

Key Takeaways

•exec() destroys all memory — Text, data, BSS, heap, stack, and mappings are all replaced.
•PID and kernel resources survive — File descriptors (unless CLOEXEC), IDs, limits, working directory, and more persist.
•ELF loading is demand-paged — The kernel maps the executable file; pages are loaded on first access.
•The stack is pre-populated — argc, argv, envp, and the auxiliary vector are set up by the kernel.
•Dynamic linking adds complexity — The dynamic linker (ld.so) runs first, loading shared libraries.
•Scripts use shebang (#!) — The kernel invokes interpreters automatically based on file content.
•exec() is atomic — Either complete success or the original process continues unchanged.
•Security requires explicit sanitization — Don't trust inherited file descriptors, environment, or signal state in privileged programs.

What's next:

Page Complete