Loading content...
An executable file sitting on disk is just data—bytes representing code, initialized variables, and metadata. For a program to actually run, the operating system must perform a sophisticated series of operations that transform this static file into a dynamic, executing process with its own memory space, stack, and execution context.
The loading process is this transformation. It involves reading the executable format, creating a virtual address space, mapping file contents into memory, initializing the stack and heap, loading any required shared libraries, performing runtime relocations, and finally transferring control to the program's entry point.
This is where compilation meets execution—where the linker's output becomes the kernel's input.
By the end of this page, you will understand the complete loading sequence—from the exec() system call through process creation, memory mapping, dynamic linking, to the execution of the first user instruction. You'll grasp how the kernel and dynamic linker collaborate to bring programs to life.
Program loading begins with the exec() family of system calls (execve, execl, execp, etc.). When a process calls exec(), it requests the kernel to replace its current program image with a new one:
This is fundamentally different from fork(): fork creates a new process with a copy of the parent's image, while exec replaces the current image entirely.
1234567891011121314
// The fundamental exec variantint execve(const char *pathname, // Path to executable char *const argv[], // Command-line arguments char *const envp[]); // Environment variables // Example usagechar *args[] = {"ls", "-la", NULL}; // Must be NULL-terminatedchar *env[] = {"PATH=/bin", NULL}; // Environment execve("/bin/ls", args, env);// If successful, this never returns!// The current process image is replaced entirely perror("execve failed"); // Only reached if exec failsWhen execve() is called, the kernel performs these steps:
123456789101112131415161718192021
// Linux kernel: fs/exec.c (conceptual) SYSCALL_DEFINE3(execve, const char __user *, filename, const char __user *const __user *, argv, const char __user *const __user *, envp){ // Step 1: Open and read the file struct file *file = open_exec(filename); // Step 2: Identify format and find handler // Magic bytes determine type: ELF (0x7f ELF), script (#!), etc. struct linux_binfmt *fmt = search_binary_handler(bprm); // Step 3: Format-specific loading (for ELF: load_elf_binary) retval = fmt->load_binary(bprm); // Step 4: Setup stack with argc, argv, envp // Step 5: Start execution at entry point return retval; // Never returns on success}Linux supports multiple executable formats via the binfmt mechanism. ELF is the primary format, but the kernel also handles scripts (via the #! interpreter line), flat binaries, and others. This extensibility allows running Java JAR files, Windows executables (via Wine), and custom formats.
For ELF executables, the kernel's load_elf_binary() function orchestrates the loading process. This involves interpreting the program headers—the execution view of the ELF file that tells the kernel how to set up the process's address space.
While section headers describe the file for linking, program headers describe segments for loading. Each loadable segment specifies:
12345678910111213141516171819202122232425262728
$ readelf -l /bin/ls Elf file type is DYN (Position-Independent Executable)Entry point 0x6b10There are 13 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x0002d8 0x0002d8 R 0x8 INTERP 0x000318 0x0000000000000318 0x0000000000000318 0x00001c 0x00001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x003510 0x003510 R 0x1000 LOAD 0x004000 0x0000000000004000 0x0000000000004000 0x013581 0x013581 R E 0x1000 LOAD 0x018000 0x0000000000018000 0x0000000000018000 0x004ba8 0x004ba8 R 0x1000 LOAD 0x01d390 0x000000000001e390 0x000000000001e390 0x001288 0x002548 RW 0x1000 DYNAMIC 0x01e348 0x000000000001f348 0x000000000001f348 0x0001f0 0x0001f0 RW 0x8 NOTE 0x000338 0x0000000000000338 0x0000000000000338 0x000030 0x000030 R 0x8 GNU_EH_FRAME 0x01c8ec 0x000000000001c8ec 0x000000000001c8ec 0x0004dc 0x0004dc R 0x4 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10 GNU_RELRO 0x01d390 0x000000000001e390 0x000000000001e390 0x000c70 0x000c70 R 0x1 Section to Segment mapping: 00 01 .interp 02 .interp .note.gnu.build-id .note.ABI-tag .gnu.hash ... 03 .init .plt .plt.got .plt.sec .text .fini 04 .rodata .eh_frame_hdr .eh_frame 05 .init_array .fini_array .data.rel.ro .dynamic .got .data .bssThe kernel processes these headers to create the memory layout:
The kernel doesn't actually copy the entire executable into memory. Instead, it uses memory mapping (mmap) to establish a relationship between virtual addresses and the executable file on disk.
With memory mapping:
This means a program can start executing almost immediately, even if the executable is gigabytes large. Only the pages actually accessed are loaded.
123456789101112131415161718192021222324252627282930313233
// Kernel maps segments using mmap-like mechanism // Map code segment (read-only, executable, file-backed)addr = mmap( load_addr, // Requested virtual address segment_size, // Size PROT_READ | PROT_EXEC, // Read and execute MAP_PRIVATE | MAP_FIXED,// Private copy, fixed address fd, // File descriptor of executable file_offset // Offset in file); // Map data segment (read-write, file-backed)addr = mmap( data_load_addr, data_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED, fd, data_file_offset); // .bss: Additional anonymous memory (not in file)if (mem_size > file_size) { // Zero-initialized expansion for .bss mmap( data_load_addr + file_size, mem_size - file_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0 );}For shared libraries mapped into multiple processes, the kernel uses copy-on-write for writable pages:
This optimization dramatically reduces memory usage when many processes use the same libraries.
1234567891011121314
$ cat /proc/$(pidof firefox)/maps | head -305593e7a00000-5593e7a01000 r--p 00000000 103:02 2883844 /usr/lib/firefox/firefox5593e7a01000-5593e7a02000 r-xp 00001000 103:02 2883844 /usr/lib/firefox/firefox5593e7a02000-5593e7a03000 r--p 00002000 103:02 2883844 /usr/lib/firefox/firefox5593e7a03000-5593e7a04000 rw-p 00002000 103:02 2883844 /usr/lib/firefox/firefox7f1234500000-7f1234522000 r--p 00000000 103:02 1234567 /lib/x86_64-linux-gnu/libc.so.67f1234522000-7f12346b1000 r-xp 00022000 103:02 1234567 /lib/x86_64-linux-gnu/libc.so.6...7ffe12340000-7ffe12361000 rw-p 00000000 00:00 0 [stack]7ffe123fe000-7ffe12402000 r--p 00000000 00:00 0 [vvar]7ffe12402000-7ffe12404000 r-xp 00000000 00:00 0 [vdso] # Columns: address range, permissions, offset, device, inode, pathname# Permissions: r=read, w=write, x=execute, p=private(COW), s=sharedModern systems randomize the base addresses where executables, libraries, stack, and heap are loaded. This is ASLR—a security feature that makes exploits harder by randomizing memory layout each run. Position-Independent Executables (PIE) enable full ASLR for the main executable.
Before transferring control to the program, the kernel must set up the initial stack. The stack contains critical information the program needs to start:
// Stack layout after exec (top of stack = low address)// (Stack grows downward, so this is from top to bottom) // ┌─────────────────────────────────────────┐ ← High address// │ Information block │// │ (strings for argv and envp) │// ├─────────────────────────────────────────┤// │ Null auxiliary vector entry │// ├─────────────────────────────────────────┤// │ Auxiliary vector entries (AT_*) │// │ AT_PHDR, AT_ENTRY, AT_PHNUM, etc. │// ├─────────────────────────────────────────┤// │ NULL word (envp terminator) │// ├─────────────────────────────────────────┤// │ Environment pointers (envp[0], ...) │// ├─────────────────────────────────────────┤// │ NULL word (argv terminator) │// ├─────────────────────────────────────────┤// │ Argument pointers (argv[0], argv[1]...) │// ├─────────────────────────────────────────┤// │ argc (argument count) │// └─────────────────────────────────────────┘ ← Initial SP// ← Low addressThe auxiliary vector is a critical but often overlooked structure. It provides the dynamic linker and C library with essential system information:
12345678910111213141516171819202122232425262728
// Important AT_* entries in the auxiliary vector AT_PHDR // Address of program headers in memoryAT_PHENT // Size of program header entryAT_PHNUM // Number of program headersAT_ENTRY // Entry point of the programAT_BASE // Base address where interpreter was loadedAT_EXECFN // Filename of executed programAT_PAGESZ // System page sizeAT_UID // Real user IDAT_EUID // Effective user IDAT_GID // Real group IDAT_EGID // Effective group IDAT_RANDOM // Address of 16 random bytes (for stack canary)AT_SYSINFO_EHDR // Address of vDSO // View auxiliary vector for a running process$ LD_SHOW_AUXV=1 lsAT_SYSINFO_EHDR: 0x7ffc3d7fe000AT_HWCAP: bfebfbffAT_PAGESZ: 4096AT_PHDR: 0x55d4e8a00040AT_PHENT: 56AT_PHNUM: 13AT_BASE: 0x7f8b12a00000AT_ENTRY: 0x55d4e8a06b10AT_UID: 1000...The dynamic linker needs auxv because it's loaded before any C library setup. To find program headers for relocation, determine page size for mapping, and locate its own entry point, it must read auxv directly rather than calling library functions.
For dynamically-linked executables, the kernel doesn't jump directly to the program's entry point. Instead, it transfers control to the dynamic linker (ld.so), which must complete several tasks before the program can run:
12345678910111213141516171819202122232425262728293031323334353637
// Conceptual loading sequence in ld.so void _dl_start(void *arg) { // 1. Bootstrap: Relocate ourselves _dl_start_final(arg); // Self-relocation before we can use GOT // 2. Parse executable's DYNAMIC section for (dyn = _DYNAMIC; dyn->d_tag != DT_NULL; dyn++) { switch (dyn->d_tag) { case DT_NEEDED: // Library dependency needed_libs.add(dyn->d_un.d_val); break; case DT_RPATH: // Library search path case DT_RUNPATH: search_paths.add(dyn->d_un.d_val); break; // Many more entries... } } // 3. Load required libraries (recursively) for (lib : needed_libs) { load_library(lib); // May trigger more DT_NEEDED } // 4. Perform relocations _dl_relocate_object(main_map); for (lib : loaded_libs) { _dl_relocate_object(lib); } // 5. Call constructors (bottom-up: libraries first) _dl_init(main_map, argc, argv, envp); // 6. Transfer to program entry point _dl_start_user(entry_point);}Shared libraries are loaded in breadth-first order based on DT_NEEDED entries:
This determines the symbol lookup order: when searching for a symbol, the dynamic linker searches in the order libraries were loaded (actually, a more complex "global scope" ordering is used).
1234567891011121314
# Trace shared library loading$ LD_DEBUG=libs ./program find library=libc.so.6 [0]; searching search path=/lib/x86_64-linux-gnu/tls/haswell/... trying file=/lib/x86_64-linux-gnu/tls/haswell/libc.so.6 ... trying file=/lib/x86_64-linux-gnu/libc.so.6 found libc.so.6 at /lib/x86_64-linux-gnu/libc.so.6 # Trace symbol resolution$ LD_DEBUG=symbols ./program symbol=printf; lookup in file=./program symbol=printf; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 binding file ./program to /lib/x86_64-linux-gnu/libc.so.6: normal symbol `printf'The default symbol lookup order enables 'interposition'—defining a symbol in your program or LD_PRELOAD library to override library functions. While powerful for debugging, this can cause problems if libraries expect their own internal symbols. RTLD_LOCAL and -Bsymbolic can change this behavior.
After the dynamic linker completes its work, control is transferred to the program's entry point. But this isn't main()—it's the C runtime startup code called _start.
_start (entry point, written in assembly)
__libc_start_main (C library initialization)
main() (your code!)
12345678910111213141516171819202122232425262728293031323334
// glibc's _start (simplified)// sysdeps/x86_64/start.S .text .globl _start .type _start, @function_start: // Clear frame pointer for debugger xorl %ebp, %ebp // argc is at top of stack, put in first arg register popq %rdi // argc // argv is now at top of stack movq %rsp, %rsi // argv // Align stack to 16 bytes (ABI requirement) andq $~15, %rsp // Push garbage to maintain alignment, then call pushq %rax // Arguments to __libc_start_main: // rdi = main, rsi = argc, rdx = argv, rcx = init // r8 = fini, r9 = rtld_fini, stack = stack_end movq main@GOTPCREL(%rip), %rdi movq __libc_csu_init@GOTPCREL(%rip), %rcx movq __libc_csu_fini@GOTPCREL(%rip), %r8 call __libc_start_main@PLT // __libc_start_main never returns // But just in case... hlt12345678910111213141516171819202122232425262728293031323334
// Simplified view of __libc_start_main// glibc: csu/libc-start.c int __libc_start_main( int (*main)(int, char **, char **), int argc, char **argv, void (*init)(void), // __libc_csu_init void (*fini)(void), // __libc_csu_fini void (*rtld_fini)(void), // Dynamic linker cleanup void *stack_end){ // Store stack end for profiling/backtraces __libc_stack_end = stack_end; // Get environment pointers (after argv) char **envp = argv + argc + 1; // Initialize threading __pthread_initialize_minimal(); // Register cleanup functions __cxa_atexit(rtld_fini, NULL, NULL); __cxa_atexit(fini, NULL, NULL); // Call constructors (*init)(argc, argv, envp); // Call main! int result = main(argc, argv, envp); // Exit (runs destructors, atexit handlers) exit(result);}You can write programs without main() by providing your own _start. This is common in minimal programs, exploits, or when avoiding C library dependencies. Use gcc -nostdlib to link without standard startup code.
Let's trace the complete loading sequence from shell command to first user instruction:
/bin/ls| Stage | Actor | Key Actions |
|---|---|---|
| 1. Shell | bash/zsh | fork() creates child process, prepare argv/envp |
| 2. execve() | Child process | System call to kernel, path = '/bin/ls' |
| 3. Format Check | Kernel | Read ELF header, verify magic (0x7f ELF) |
| 4. Process Image | Kernel | Destroy old mappings, create new address space |
| 5. mmap Segments | Kernel | Map LOAD segments from /bin/ls (code, data) |
| 6. Load ld.so | Kernel | Read INTERP, map /lib64/ld-linux-x86-64.so.2 |
| 7. Stack Setup | Kernel | Push argc, argv, envp, auxv to stack |
| 8. Transfer | Kernel → ld.so | Jump to ld.so entry point |
| 9. Self-reloc | ld.so | Relocate ld.so itself |
| 10. Load Libs | ld.so | Map libc.so.6, libpthread.so, etc. |
| 11. Relocations | ld.so | Fill GOT entries, patch code |
| 12. Constructors | ld.so | Call .init_array functions in libraries |
| 13. Transfer | ld.so → _start | Jump to /bin/ls entry point |
| 14. C Runtime | _start → main | __libc_start_main, then main() |
| 15. User Code | main() | ls program logic executes! |
12345678910111213141516
$ strace -f /bin/ls 2>&1 | head -25execve("/bin/ls", ["ls"], 0x7ffe...) = 0brk(NULL) = 0x559b79a00000mmap(NULL, 8192, PROT_READ|PROT_WRITE, ...) = 0x7f1234567000access("/etc/ld.so.preload", R_OK) = -1 ENOENTopenat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3...openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|...) = 3read(3, "\177ELF\2\1\1\3\0\0\0...", 832) = 832mmap(NULL, 2037344, PROT_READ, MAP_PRIVATE|..., 3, 0) = 0x7f...mprotect(0x7f..., 1859584, PROT_READ|PROT_EXEC) = 0mmap(0x7f..., 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|..., 3, 0x1ef000) = 0x7f......brk(NULL) = 0x559b79a00000brk(0x559b79a21000) = 0x559b79a21000 // Heap setupopenat(AT_FDCWD, ".", O_RDONLY|...) = 3 // ls reads directoryProgram loading is the bridge between static executables and dynamic processes. The kernel and dynamic linker work in concert to transform an ELF file into a running program with its own address space, stack, and execution context.
What's next:
With loading understood, we now examine relocatable code—the techniques that allow code to work regardless of where it's loaded in memory. The final page explores position-independent code, relocation mechanics, and why these concepts matter for security and flexibility.
You now understand how executables come to life—from the execve system call through kernel parsing, memory mapping, dynamic linking, and finally reaching main(). This knowledge is essential for debugging, security analysis, and understanding process behavior.