Loading content...
When exec() succeeds, something extraordinary happens: the calling process's entire memory image is destroyed and replaced with a completely new program. The code you were running, your stack, your heap, your data—all gone. In their place: a fresh program ready to execute from its entry point.
This isn't a simple memory copy or overlay. It's a carefully orchestrated transformation involving the kernel, the executable file format, the memory management unit (MMU), and the dynamic linker. Understanding this process reveals why exec() behaves the way it does and explains many of its subtle characteristics.
By the end of this page, you will understand exactly what happens when exec() transforms a process: which memory regions are replaced, which kernel resources persist, how the ELF loader works, and why this design enables both flexibility and security.
Before understanding what exec() replaces, we must understand what a running process's memory looks like. Every Unix process has a virtual address space divided into distinct regions:
| Segment | Contents | Permissions | Source |
|---|---|---|---|
| Text (Code) | Machine instructions | Read + Execute | Loaded from executable file |
| Data | Initialized global/static variables | Read + Write | Loaded from executable file |
| BSS | Uninitialized global/static vars | Read + Write | Zero-initialized by kernel |
| Heap | Dynamic memory (malloc) | Read + Write | Allocated at runtime |
| Stack | Function frames, locals | Read + Write | Grows automatically |
| Memory Map | Shared libs, mmap files | Varies | Mapped at load/runtime |
When exec() executes, it replaces ALL of these user-space memory regions:
This is a complete wipe. Nothing from your old program's memory survives into the new program.
Let's be explicit about what is completely destroyed when exec() succeeds:
Because exec() replaces your code, no cleanup code can run. Destructors don't execute. atexit() handlers don't run. Open file handles aren't explicitly closed by your code. This is why exec() has specific guarantees about inherited resources—the new program must be able to function with what it inherits.
1234567891011121314151617181920212223242526272829303132333435363738
#include <stdio.h>#include <stdlib.h>#include <unistd.h> void cleanup(void) { // This function is registered with atexit() // But it will NEVER be called if exec() succeeds! printf("Cleanup running...");} int main() { atexit(cleanup); // Register cleanup function char *huge_buffer = malloc(1000000000); // 1GB allocation // Memory "leak" - but exec() will reclaim everything anyway FILE *fp = fopen("/tmp/test.txt", "w"); fprintf(fp, "Some data..."); // File not explicitly closed - but exec() handles it printf("About to exec..."); execl("/bin/echo", "echo", "New program running!", NULL); // Only reached on exec failure: printf("exec failed!"); cleanup(); // Would have to call manually on failure return 1;} // When exec() succeeds:// - cleanup() never called// - 1GB buffer released by kernel (not a leak!)// - FILE* stream discarded (underlying fd may persist)// - All user-mode state goneWhile exec() destroys the memory image, it performs an image replacement within the same process. Certain kernel-level process attributes survive the transformation:
getpid() returns the same value before and after exec().getcwd() returns same path.alarm() remain pending.The close-on-exec flag (FD_CLOEXEC):
File descriptors are special. By default, they survive exec() and become available to the new program. But this can cause problems:
The solution is the close-on-exec flag. When set, the kernel automatically closes that file descriptor during exec():
1234567891011121314151617181920212223242526272829303132
#include <fcntl.h>#include <unistd.h>#include <stdio.h> int main() { // Method 1: Set O_CLOEXEC when opening int fd1 = open("/etc/passwd", O_RDONLY | O_CLOEXEC); // fd1 will be automatically closed when exec() runs // Method 2: Set FD_CLOEXEC after opening int fd2 = open("/etc/group", O_RDONLY); fcntl(fd2, F_SETFD, FD_CLOEXEC); // fd2 will also be automatically closed on exec() // File descriptor without CLOEXEC - survives exec() int fd3 = open("/tmp/shared.txt", O_RDONLY); // fd3 will be available to the new program printf("fd1=%d (CLOEXEC), fd2=%d (CLOEXEC), fd3=%d (inherited)", fd1, fd2, fd3); execl("/bin/ls", "ls", "-l", "/proc/self/fd", NULL); // If exec succeeds: // - fd1 closed (CLOEXEC) // - fd2 closed (CLOEXEC) // - fd3 inherited (still open as fd3 in new process) // - stdin(0), stdout(1), stderr(2) also inherited perror("exec failed"); return 1;}Modern code should use O_CLOEXEC by default when opening files. Only leave it off when you explicitly want the child to inherit the file descriptor. This prevents file descriptor leaks and security issues from forgotten handles.
Most modern Unix systems use the ELF (Executable and Linkable Format) for executables. When exec() runs, the kernel must parse this format and load the program correctly. Let's trace through this process.
ELF File Structure:
An ELF executable contains:
+------------------------+
| ELF Header | <- Magic number, CPU arch, entry point
+------------------------+
| Program Headers | <- How to load into memory (segments)
+------------------------+
| Section Headers | <- For linkers/debuggers (optional at runtime)
+------------------------+
| .text | <- Executable code
+------------------------+
| .rodata | <- Read-only data (string literals, etc.)
+------------------------+
| .data | <- Initialized read-write data
+------------------------+
| .bss | <- Uninitialized data (just size info)
+------------------------+
| .symtab, .strtab, ... | <- Symbol tables, debug info
+------------------------+
Step-by-step ELF loading:
1. Validate the ELF Header
// First 4 bytes must be: 0x7f 'E' 'L' 'F'
// e_type must be ET_EXEC (executable) or ET_DYN (PIE/shared object)
// e_machine must match CPU architecture (EM_X86_64, etc.)
// e_version must be EV_CURRENT
2. Process Program Headers (PT_LOAD segments)
Each PT_LOAD segment specifies:
p_vaddr: Virtual address to load atp_offset: Offset in filep_filesz: Size in file (what to load)p_memsz: Size in memory (may be > filesz for BSS)p_flags: Permissions (PF_R, PF_W, PF_X)3. Memory Mapping
The kernel uses mmap() internally to map each segment:
// For .text segment (read + execute):
mmap(p_vaddr, p_memsz, PROT_READ | PROT_EXEC,
MAP_PRIVATE | MAP_FIXED, fd, p_offset);
// For .data/.bss segment (read + write):
mmap(p_vaddr, p_memsz, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_FIXED, fd, p_offset);
4. Zero-fill BSS
If p_memsz > p_filesz, the extra bytes (the BSS) are zero-initialized.
5. Dynamic Linking (if applicable)
If the executable has a PT_INTERP segment, the kernel loads the dynamic linker (typically /lib64/ld-linux-x86-64.so.2) and transfers control to it instead of directly to the program.
The kernel doesn't actually read the entire executable into memory. It sets up page table entries that point to the file. When the CPU tries to execute code or access data, a page fault occurs, and only then does the kernel load that specific page. This is called demand paging—it makes exec() fast and memory-efficient.
The kernel doesn't just load the program binary—it also prepares the initial stack with everything the program needs to start running. The stack setup follows a specific layout that the C runtime expects.
Initial Stack Layout (x86-64 Linux):
High addresses
┌─────────────────────────────────────┐
│ Platform-specific info (random) │
│ ELF auxiliary vector (auxv) │
│ NULL (end of envp) │
│ envp[m-1] -> "LAST_VAR=..." │
│ ... │
│ envp[1] -> "HOME=/home/user" │
│ envp[0] -> "PATH=/usr/bin:/bin" │
│ NULL (end of argv) │
│ argv[n-1] -> "last_argument" │
│ ... │
│ argv[1] -> "first_argument" │
│ argv[0] -> "program_name" │
│ argc (argument count) │ <- %rsp points here
└─────────────────────────────────────┘
Low addresses (stack grows down)
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
#include <stdio.h> // This program demonstrates how to access the initial stackint main(int argc, char *argv[], char *envp[]) { printf("Stack layout demonstration: "); // argc: first item on stack printf("argc = %d ", argc); // argv: array of argument string pointers printf("argv array:"); for (int i = 0; i < argc; i++) { printf(" argv[%d] = %p -> \"%s\"", i, argv[i], argv[i]); } printf(" argv[%d] = %p (NULL terminator) ", argc, argv[argc]); // envp: array of environment string pointers printf("envp array (first 5):"); for (int i = 0; envp[i] != NULL && i < 5; i++) { printf(" envp[%d] = %p -> \"%s\"", i, envp[i], envp[i]); } printf(" ... "); // Demonstrating that envp follows argv printf("Memory layout verification:"); printf(" &argv[0] = %p", &argv[0]); printf(" &argv[argc] = %p (NULL)", &argv[argc]); printf(" &envp[0] = %p", &envp[0]); // Note: envp typically starts right after argv's NULL terminator return 0;} /* Example output:Stack layout demonstration: argc = 3 argv array: argv[0] = 0x7ffc8b9a3f10 -> "./program" argv[1] = 0x7ffc8b9a3f1a -> "arg1" argv[2] = 0x7ffc8b9a3f1f -> "arg2" argv[3] = (nil) (NULL terminator) envp array (first 5): envp[0] = 0x7ffc8b9a3f24 -> "PATH=/usr/bin:/bin" envp[1] = 0x7ffc8b9a3f37 -> "HOME=/home/user" ...*/The Auxiliary Vector (auxv):
Above the environment strings, the kernel places an "auxiliary vector" containing information the dynamic linker and program might need:
AT_ENTRY: Program entry point addressAT_PHDR: Address of program headersAT_PHNUM: Number of program headersAT_PAGESZ: System page sizeAT_UID/AT_EUID: Real/effective user IDAT_GID/AT_EGID: Real/effective group IDAT_RANDOM: Pointer to 16 random bytes (for ASLR, stack canaries)AT_SECURE: 1 if setuid executionAT_PLATFORM: String identifying CPU platformAT_HWCAP/AT_HWCAP2: CPU feature flagsYou can view the auxiliary vector with: LD_SHOW_AUXV=1 /bin/ls. This causes the dynamic linker to print the auxv entries before running the program. It's useful for understanding what information the kernel provides and for debugging startup issues.
Most executables aren't statically linked—they depend on shared libraries like libc.so. When exec() loads such an executable, an additional step occurs: the dynamic linker (also called the interpreter or loader) is invoked.
The Dynamic Linking Process:
Kernel detects PT_INTERP segment in ELF headers
/lib64/ld-linux-x86-64.so.2 (Linux x86-64)Kernel loads both the program AND the dynamic linker
Dynamic linker takes control first
Control transferred to program's entry point
_start in the program begins executing1234567891011121314
# View the dynamic linker path (PT_INTERP)$ readelf -l /bin/ls | grep interpreter [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] # List shared library dependencies$ ldd /bin/ls linux-vdso.so.1 (0x00007ffcc3bfe000) libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 /lib64/ld-linux-x86-64.so.2 (0x00007f8a3b600000) # Trace dynamic linker operations$ LD_DEBUG=libs /bin/ls 2>&1 | head -20# Shows: finding libraries, loading, binding symbols...Dynamic linking introduces attack vectors. LD_PRELOAD can inject code, LD_LIBRARY_PATH can redirect library loading. For setuid programs, the kernel ignores these environment variables (AT_SECURE=1). Always be aware of the security implications when designing systems that exec() potentially untrusted code.
Not everything you exec() is a compiled binary. Scripts (Python, Bash, Perl, etc.) can also be executed. The kernel handles this through the shebang mechanism.
How shebang processing works:
#! (0x23 0x21), it's an interpreter script123456789101112131415161718
#!/bin/bash# This script starts with #!/bin/bash# When you exec("./script.sh", ["script.sh", NULL])# Kernel transforms it to: execve("/bin/bash", ["bash", "./script.sh"], envp) #!/usr/bin/env python3# Using 'env' is portable - it searches PATH for python3# Transforms to: execve("/usr/bin/env", ["env", "python3", "./script.py"], envp)# Then env finds python3 and execs it #!/bin/bash -x# Arguments can be included after the interpreter# Note: only ONE argument is allowed after the path# This becomes: execve("/bin/bash", ["bash", "-x", "./script.sh"], envp) #!/bin/cat# Even 'cat' works! This script prints itself# Becomes: execve("/bin/cat", ["cat", "./script.sh"], envp)| You call | If script starts with | Kernel executes |
|---|---|---|
execve("./script.sh", ["./script.sh", "arg1"], envp) | #!/bin/bash | execve("/bin/bash", ["bash", "./script.sh", "arg1"], envp) |
execve("./prog.py", ["prog.py"], envp) | #!/usr/bin/python3 | execve("/usr/bin/python3", ["python3", "./prog.py"], envp) |
execve("./tool.pl", ["tool.pl", "-v"], envp) | #!/usr/bin/perl | execve("/usr/bin/perl", ["perl", "./tool.pl", "-v"], envp) |
The shebang line has limitations: it must be within the first few hundred bytes (varies by system, often 255-256), can only have one argument to the interpreter, and the interpreter path must be absolute (or use /usr/bin/env for PATH searching). Also, scripts must have execute permission even though the interpreter does the actual execution.
Process image replacement has profound security implications. The complete destruction and recreation of the memory space serves as a security boundary, but several edge cases require careful attention.
Setuid/Setgid Execution:
When an executable has the setuid or setgid bit set, exec() causes a privilege transition:
File permissions: -rwsr-xr-x (setuid bit = 's')
Before exec: RUID=1000 (user), EUID=1000 (user)
After exec: RUID=1000 (user), EUID=0 (root) ← from file owner
This is how programs like sudo, passwd, and ping gain elevated privileges. The kernel automatically:
1234567891011121314151617181920212223242526272829303132333435363738394041
#include <unistd.h>#include <fcntl.h>#include <stdlib.h> // Secure exec() pattern for privileged programsvoid secure_exec(const char *program, char *const argv[]) { // 1. Close all unnecessary file descriptors // (or use closefrom() where available) for (int fd = 3; fd < 1024; fd++) { close(fd); // Ignore failure for non-open FDs } // 2. Create clean, controlled environment char *safe_env[] = { "PATH=/usr/bin:/bin", "IFS= \t", "TERM=dumb", NULL }; // 3. Reset signal handlers to default // (exec does this for caught signals, but not SIG_IGN) for (int sig = 1; sig < 32; sig++) { signal(sig, SIG_DFL); } // 4. Clear signal mask sigset_t empty; sigemptyset(&empty); sigprocmask(SIG_SETMASK, &empty, NULL); // 5. Change to known directory chdir("/"); // 6. Execute with controlled environment execve(program, argv, safe_env); // Failure - exit with error _exit(127);}If you're writing a privileged program (setuid, daemon, etc.), never assume the inherited environment, file descriptors, or signal state are safe. Sanitize everything. Close FDs you don't need. Build a clean environment. Experienced attackers craft malicious process contexts to exploit careless privileged programs.
exec() is designed to be atomic with respect to process state: either it succeeds completely and the new program runs, or it fails completely and the original process continues unchanged. There is no partial exec.
The atomicity guarantee:
In practice, if exec() returns at all, it failed. If exec() fails, the original process is completely intact and can try alternative actions, report errors, or exit gracefully.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
#include <unistd.h>#include <stdio.h>#include <errno.h>#include <string.h> int main() { // Attempt to exec a program execl("/nonexistent/program", "program", NULL); // We only reach here if exec() failed // The original process state is completely preserved: // - Same PID // - Same memory (code, heap, stack all intact) // - Same file descriptors // - Same everything // We can now handle the error intelligently switch (errno) { case ENOENT: fprintf(stderr, "Program not found"); // Maybe try alternative path? execl("/usr/local/bin/program", "program", NULL); break; case EACCES: fprintf(stderr, "Permission denied"); // Maybe we need to gain privilege first? break; case ENOEXEC: fprintf(stderr, "Not an executable format"); // Maybe it's a script - try shell? execl("/bin/sh", "sh", "/path/to/script", NULL); break; default: fprintf(stderr, "exec failed: %s", strerror(errno)); } // Final fallback fprintf(stderr, "All exec attempts failed, exiting"); return 127; // Convention for "command not found"}In rare edge cases, state can leak across a failed exec(). For example, in multithreaded programs, other threads may observe the exec() in progress before it fails. Signal handlers for asynchronous signals could theoretically see inconsistent state if exec fails mid-way through certain operations. These edge cases are extremely rare in practice.
We've now explored the complete mechanism of process image replacement. Let's consolidate the key concepts:
What's next:
Now that you understand how the process image is replaced, we'll examine how arguments are passed to the new program. We'll dive deep into the argv mechanism, how argument strings are stored and accessed, limits on argument size, and best practices for constructing and parsing command-line arguments.
You now understand exactly what happens when exec() transforms a process: which memory regions are destroyed, which kernel resources persist, how ELF loading works, and how dynamic linking and script execution fit into the picture. Next, we'll explore the mechanics of passing arguments to the new program.