Linkers And Loaders - Learning Module

Loading content...

0/227

Loading Process

From Executable File to Running Process

An executable file sitting on disk is just data—bytes representing code, initialized variables, and metadata. For a program to actually run, the operating system must perform a sophisticated series of operations that transform this static file into a dynamic, executing process with its own memory space, stack, and execution context.

The loading process is this transformation. It involves reading the executable format, creating a virtual address space, mapping file contents into memory, initializing the stack and heap, loading any required shared libraries, performing runtime relocations, and finally transferring control to the program's entry point.

This is where compilation meets execution—where the linker's output becomes the kernel's input.

What You Will Learn

By the end of this page, you will understand the complete loading sequence—from the exec() system call through process creation, memory mapping, dynamic linking, to the execution of the first user instruction. You'll grasp how the kernel and dynamic linker collaborate to bring programs to life.

The exec() System Call: Triggering the Load

Program loading begins with the exec() family of system calls (execve, execl, execp, etc.). When a process calls exec(), it requests the kernel to replace its current program image with a new one:

The calling process's code, data, and stack are replaced
The process ID (PID) remains the same
Open file descriptors may be preserved (unless marked close-on-exec)
Signal handlers are reset to defaults

This is fundamentally different from fork(): fork creates a new process with a copy of the parent's image, while exec replaces the current image entirely.

execve System Call
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// The fundamental exec variant
int execve(const char *pathname,    // Path to executable
           char *const argv[],       // Command-line arguments
           char *const envp[]);      // Environment variables
 
// Example usage
char *args[] = {"ls", "-la", NULL};  // Must be NULL-terminated
char *env[] = {"PATH=/bin", NULL};   // Environment
 
execve("/bin/ls", args, env);
// If successful, this never returns!
// The current process image is replaced entirely
 
perror("execve failed");  // Only reached if exec fails

What Happens Inside the Kernel

When execve() is called, the kernel performs these steps:

Validate the executable path — Check existence and permissions
Read the file header — Determine executable format (ELF, script, etc.)
Handle interpreters — For scripts, load the interpreter instead
Check credentials — Handle setuid/setgid bits
Prepare the new process image — Set up memory, mappings
Transfer control — Jump to the entry point or dynamic linker

Kernel's execve Entry Point (simplified)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Linux kernel: fs/exec.c (conceptual)
 
SYSCALL_DEFINE3(execve, const char __user *, filename,
                const char __user *const __user *, argv,
                const char __user *const __user *, envp)
{
    // Step 1: Open and read the file
    struct file *file = open_exec(filename);
    
    // Step 2: Identify format and find handler
    // Magic bytes determine type: ELF (0x7f ELF), script (#!), etc.
    struct linux_binfmt *fmt = search_binary_handler(bprm);
    
    // Step 3: Format-specific loading (for ELF: load_elf_binary)
    retval = fmt->load_binary(bprm);
    
    // Step 4: Setup stack with argc, argv, envp
    // Step 5: Start execution at entry point
    
    return retval;  // Never returns on success
}

Binary Formats

Linux supports multiple executable formats via the binfmt mechanism. ELF is the primary format, but the kernel also handles scripts (via the #! interpreter line), flat binaries, and others. This extensibility allows running Java JAR files, Windows executables (via Wine), and custom formats.

ELF Loading Sequence

For ELF executables, the kernel's load_elf_binary() function orchestrates the loading process. This involves interpreting the program headers—the execution view of the ELF file that tells the kernel how to set up the process's address space.

Program Headers (Segments)

While section headers describe the file for linking, program headers describe segments for loading. Each loadable segment specifies:

Virtual address: Where in memory to place the segment
File offset and size: Where in the file the segment data lives
Memory size: How much memory to allocate (may exceed file size for .bss)
Permissions: Read, write, execute flags
Alignment: Memory alignment requirements

Viewing Program Headers (readelf -l)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$ readelf -l /bin/ls
 
Elf file type is DYN (Position-Independent Executable)
Entry point 0x6b10
There are 13 program headers, starting at offset 64
 
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x0002d8 0x0002d8 R   0x8
  INTERP         0x000318 0x0000000000000318 0x0000000000000318 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x003510 0x003510 R   0x1000
  LOAD           0x004000 0x0000000000004000 0x0000000000004000 0x013581 0x013581 R E 0x1000
  LOAD           0x018000 0x0000000000018000 0x0000000000018000 0x004ba8 0x004ba8 R   0x1000
  LOAD           0x01d390 0x000000000001e390 0x000000000001e390 0x001288 0x002548 RW  0x1000
  DYNAMIC        0x01e348 0x000000000001f348 0x000000000001f348 0x0001f0 0x0001f0 RW  0x8
  NOTE           0x000338 0x0000000000000338 0x0000000000000338 0x000030 0x000030 R   0x8
  GNU_EH_FRAME   0x01c8ec 0x000000000001c8ec 0x000000000001c8ec 0x0004dc 0x0004dc R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x01d390 0x000000000001e390 0x000000000001e390 0x000c70 0x000c70 R   0x1
 
 Section to Segment mapping:
  00     
  01     .interp
  02     .interp .note.gnu.build-id .note.ABI-tag .gnu.hash ...
  03     .init .plt .plt.got .plt.sec .text .fini
  04     .rodata .eh_frame_hdr .eh_frame
  05     .init_array .fini_array .data.rel.ro .dynamic .got .data .bss

Key Segment Types

PT_LOAD: Loadable segment—data copied from file to memory
PT_INTERP: Specifies the dynamic linker (interpreter) to use
PT_DYNAMIC: Contains dynamic linking information
PT_GNU_STACK: Indicates whether stack should be executable
PT_GNU_RELRO: Section to be made read-only after relocations

The kernel processes these headers to create the memory layout:

Converting Mermaid diagram...

Memory Mapping: Demand Paging in Action

The kernel doesn't actually copy the entire executable into memory. Instead, it uses memory mapping (mmap) to establish a relationship between virtual addresses and the executable file on disk.

Demand Paging

With memory mapping:

Virtual address ranges are reserved but not immediately populated
Page table entries marked invalid initially
On first access: page fault occurs
Kernel handles fault by loading the page from disk
Subsequent accesses hit memory directly (no fault)

This means a program can start executing almost immediately, even if the executable is gigabytes large. Only the pages actually accessed are loaded.

Memory Mapping Conceptual View
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Kernel maps segments using mmap-like mechanism
 
// Map code segment (read-only, executable, file-backed)
addr = mmap(
    load_addr,              // Requested virtual address
    segment_size,           // Size
    PROT_READ | PROT_EXEC,  // Read and execute
    MAP_PRIVATE | MAP_FIXED,// Private copy, fixed address
    fd,                     // File descriptor of executable
    file_offset             // Offset in file
);
 
// Map data segment (read-write, file-backed)
addr = mmap(
    data_load_addr,
    data_size,
    PROT_READ | PROT_WRITE,
    MAP_PRIVATE | MAP_FIXED,
    fd,
    data_file_offset
);
 
// .bss: Additional anonymous memory (not in file)
if (mem_size > file_size) {
    // Zero-initialized expansion for .bss
    mmap(
        data_load_addr + file_size,
        mem_size - file_size,
        PROT_READ | PROT_WRITE,
        MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
        -1, 0
    );
}

Copy-on-Write (COW)

For shared libraries mapped into multiple processes, the kernel uses copy-on-write for writable pages:

Read-only code pages are shared directly (same physical frame)
Writable data pages are marked COW initially
On write attempt: Page fault triggers duplication
Each process gets its own copy only when modified

This optimization dramatically reduces memory usage when many processes use the same libraries.

Viewing Process Memory Map
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ cat /proc/$(pidof firefox)/maps | head -30
5593e7a00000-5593e7a01000 r--p 00000000 103:02 2883844 /usr/lib/firefox/firefox
5593e7a01000-5593e7a02000 r-xp 00001000 103:02 2883844 /usr/lib/firefox/firefox
5593e7a02000-5593e7a03000 r--p 00002000 103:02 2883844 /usr/lib/firefox/firefox
5593e7a03000-5593e7a04000 rw-p 00002000 103:02 2883844 /usr/lib/firefox/firefox
7f1234500000-7f1234522000 r--p 00000000 103:02 1234567 /lib/x86_64-linux-gnu/libc.so.6
7f1234522000-7f12346b1000 r-xp 00022000 103:02 1234567 /lib/x86_64-linux-gnu/libc.so.6
...
7ffe12340000-7ffe12361000 rw-p 00000000 00:00 0        [stack]
7ffe123fe000-7ffe12402000 r--p 00000000 00:00 0        [vvar]
7ffe12402000-7ffe12404000 r-xp 00000000 00:00 0        [vdso]
 
# Columns: address range, permissions, offset, device, inode, pathname
# Permissions: r=read, w=write, x=execute, p=private(COW), s=shared

ASLR - Address Space Layout Randomization

Modern systems randomize the base addresses where executables, libraries, stack, and heap are loaded. This is ASLR—a security feature that makes exploits harder by randomizing memory layout each run. Position-Independent Executables (PIE) enable full ASLR for the main executable.

Stack Setup: Arguments and Environment

Before transferring control to the program, the kernel must set up the initial stack. The stack contains critical information the program needs to start:

argc: Argument count
argv: Array of argument string pointers
envp: Array of environment string pointers
Auxiliary vector (auxv): System information for the dynamic linker

Initial Stack Layout
// Stack layout after exec (top of stack = low address)
// (Stack grows downward, so this is from top to bottom)
 
// ┌─────────────────────────────────────────┐ ← High address
// │ Information block                        │
// │   (strings for argv and envp)           │
// ├─────────────────────────────────────────┤
// │ Null auxiliary vector entry              │
// ├─────────────────────────────────────────┤
// │ Auxiliary vector entries (AT_*)          │
// │   AT_PHDR, AT_ENTRY, AT_PHNUM, etc.     │
// ├─────────────────────────────────────────┤
// │ NULL word (envp terminator)              │
// ├─────────────────────────────────────────┤
// │ Environment pointers (envp[0], ...)      │
// ├─────────────────────────────────────────┤
// │ NULL word (argv terminator)              │
// ├─────────────────────────────────────────┤
// │ Argument pointers (argv[0], argv[1]...)  │
// ├─────────────────────────────────────────┤
// │ argc (argument count)                    │
// └─────────────────────────────────────────┘ ← Initial SP
//                                             ← Low address

Auxiliary Vector (auxv)

The auxiliary vector is a critical but often overlooked structure. It provides the dynamic linker and C library with essential system information:

Auxiliary Vector Entries
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Important AT_* entries in the auxiliary vector
 
AT_PHDR     // Address of program headers in memory
AT_PHENT    // Size of program header entry
AT_PHNUM    // Number of program headers
AT_ENTRY    // Entry point of the program
AT_BASE     // Base address where interpreter was loaded
AT_EXECFN   // Filename of executed program
AT_PAGESZ   // System page size
AT_UID      // Real user ID
AT_EUID     // Effective user ID
AT_GID      // Real group ID
AT_EGID     // Effective group ID
AT_RANDOM   // Address of 16 random bytes (for stack canary)
AT_SYSINFO_EHDR  // Address of vDSO
 
// View auxiliary vector for a running process
$ LD_SHOW_AUXV=1 ls
AT_SYSINFO_EHDR: 0x7ffc3d7fe000
AT_HWCAP:        bfebfbff
AT_PAGESZ:       4096
AT_PHDR:         0x55d4e8a00040
AT_PHENT:        56
AT_PHNUM:        13
AT_BASE:         0x7f8b12a00000
AT_ENTRY:        0x55d4e8a06b10
AT_UID:          1000
...

Why the Dynamic Linker Needs auxv

The dynamic linker needs auxv because it's loaded before any C library setup. To find program headers for relocation, determine page size for mapping, and locate its own entry point, it must read auxv directly rather than calling library functions.

Dynamic Linker Execution

For dynamically-linked executables, the kernel doesn't jump directly to the program's entry point. Instead, it transfers control to the dynamic linker (ld.so), which must complete several tasks before the program can run:

Dynamic Linker Initialization

Self-relocation: ld.so must relocate itself (it's also a shared object!)
Parse the executable: Read program headers and dynamic section
Load dependencies: Map all required shared libraries
Perform relocations: Patch GOT entries, resolve symbols
Initialize libraries: Run constructors (.init_array functions)
Transfer control: Jump to the program's actual entry point

Dynamic Linker Loading Sequence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Conceptual loading sequence in ld.so
 
void _dl_start(void *arg) {
    // 1. Bootstrap: Relocate ourselves
    _dl_start_final(arg);  // Self-relocation before we can use GOT
    
    // 2. Parse executable's DYNAMIC section
    for (dyn = _DYNAMIC; dyn->d_tag != DT_NULL; dyn++) {
        switch (dyn->d_tag) {
            case DT_NEEDED:  // Library dependency
                needed_libs.add(dyn->d_un.d_val);
                break;
            case DT_RPATH:   // Library search path
            case DT_RUNPATH:
                search_paths.add(dyn->d_un.d_val);
                break;
            // Many more entries...
        }
    }
    
    // 3. Load required libraries (recursively)
    for (lib : needed_libs) {
        load_library(lib);  // May trigger more DT_NEEDED
    }
    
    // 4. Perform relocations
    _dl_relocate_object(main_map);
    for (lib : loaded_libs) {
        _dl_relocate_object(lib);
    }
    
    // 5. Call constructors (bottom-up: libraries first)
    _dl_init(main_map, argc, argv, envp);
    
    // 6. Transfer to program entry point
    _dl_start_user(entry_point);
}

Library Loading Order

Shared libraries are loaded in breadth-first order based on DT_NEEDED entries:

Libraries needed by the executable
Libraries needed by those libraries
And so on, recursively

This determines the symbol lookup order: when searching for a symbol, the dynamic linker searches in the order libraries were loaded (actually, a more complex "global scope" ordering is used).

Tracing Library Loading
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Trace shared library loading
$ LD_DEBUG=libs ./program
     find library=libc.so.6 [0]; searching
      search path=/lib/x86_64-linux-gnu/tls/haswell/...
       trying file=/lib/x86_64-linux-gnu/tls/haswell/libc.so.6
       ...
       trying file=/lib/x86_64-linux-gnu/libc.so.6
      found libc.so.6 at /lib/x86_64-linux-gnu/libc.so.6
 
# Trace symbol resolution
$ LD_DEBUG=symbols ./program
     symbol=printf;  lookup in file=./program
     symbol=printf;  lookup in file=/lib/x86_64-linux-gnu/libc.so.6
     binding file ./program to /lib/x86_64-linux-gnu/libc.so.6: normal symbol `printf'

Symbol Interposition

The default symbol lookup order enables 'interposition'—defining a symbol in your program or LD_PRELOAD library to override library functions. While powerful for debugging, this can cause problems if libraries expect their own internal symbols. RTLD_LOCAL and -Bsymbolic can change this behavior.

The Entry Point: Where Execution Begins

After the dynamic linker completes its work, control is transferred to the program's entry point. But this isn't main()—it's the C runtime startup code called _start.

The Path to main()

_start (entry point, written in assembly)
- Called by kernel/dynamic linker
- Minimal setup, calls __libc_start_main
__libc_start_main (C library initialization)
- Sets up threading
- Registers atexit handlers
- Calls constructors
- Calls main()
- Calls exit() with main's return value
main() (your code!)
- Finally, user code executes
- Receives argc, argv, envp as arguments

_start Entry Point (x86-64)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// glibc's _start (simplified)
// sysdeps/x86_64/start.S
 
    .text
    .globl _start
    .type _start, @function
_start:
    // Clear frame pointer for debugger
    xorl    %ebp, %ebp
    
    // argc is at top of stack, put in first arg register
    popq    %rdi                    // argc
    
    // argv is now at top of stack
    movq    %rsp, %rsi              // argv
    
    // Align stack to 16 bytes (ABI requirement)
    andq    $~15, %rsp
    
    // Push garbage to maintain alignment, then call
    pushq   %rax
    
    // Arguments to __libc_start_main:
    // rdi = main, rsi = argc, rdx = argv, rcx = init
    // r8 = fini, r9 = rtld_fini, stack = stack_end
    movq    main@GOTPCREL(%rip), %rdi
    movq    __libc_csu_init@GOTPCREL(%rip), %rcx
    movq    __libc_csu_fini@GOTPCREL(%rip), %r8
    
    call    __libc_start_main@PLT
    
    // __libc_start_main never returns
    // But just in case...
    hlt

__libc_start_main Overview
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Simplified view of __libc_start_main
// glibc: csu/libc-start.c
 
int __libc_start_main(
    int (*main)(int, char **, char **),
    int argc,
    char **argv,
    void (*init)(void),      // __libc_csu_init
    void (*fini)(void),      // __libc_csu_fini
    void (*rtld_fini)(void), // Dynamic linker cleanup
    void *stack_end)
{
    // Store stack end for profiling/backtraces
    __libc_stack_end = stack_end;
    
    // Get environment pointers (after argv)
    char **envp = argv + argc + 1;
    
    // Initialize threading
    __pthread_initialize_minimal();
    
    // Register cleanup functions
    __cxa_atexit(rtld_fini, NULL, NULL);
    __cxa_atexit(fini, NULL, NULL);
    
    // Call constructors
    (*init)(argc, argv, envp);
    
    // Call main!
    int result = main(argc, argv, envp);
    
    // Exit (runs destructors, atexit handlers)
    exit(result);
}

Programs Without main()

You can write programs without main() by providing your own _start. This is common in minimal programs, exploits, or when avoiding C library dependencies. Use gcc -nostdlib to link without standard startup code.

Complete Loading Timeline

Let's trace the complete loading sequence from shell command to first user instruction:

Timeline: Running `/bin/ls`

Loading Sequence Timeline
Stage	Actor	Key Actions
1. Shell	bash/zsh	fork() creates child process, prepare argv/envp
2. execve()	Child process	System call to kernel, path = '/bin/ls'
3. Format Check	Kernel	Read ELF header, verify magic (0x7f ELF)
4. Process Image	Kernel	Destroy old mappings, create new address space
5. mmap Segments	Kernel	Map LOAD segments from /bin/ls (code, data)
6. Load ld.so	Kernel	Read INTERP, map /lib64/ld-linux-x86-64.so.2
7. Stack Setup	Kernel	Push argc, argv, envp, auxv to stack
8. Transfer	Kernel → ld.so	Jump to ld.so entry point
9. Self-reloc	ld.so	Relocate ld.so itself
10. Load Libs	ld.so	Map libc.so.6, libpthread.so, etc.
11. Relocations	ld.so	Fill GOT entries, patch code
12. Constructors	ld.so	Call .init_array functions in libraries
13. Transfer	ld.so → _start	Jump to /bin/ls entry point
14. C Runtime	_start → main	__libc_start_main, then main()
15. User Code	main()	ls program logic executes!

Tracing with strace
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ strace -f /bin/ls 2>&1 | head -25
execve("/bin/ls", ["ls"], 0x7ffe...) = 0
brk(NULL)                               = 0x559b79a00000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, ...) = 0x7f1234567000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
...
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|...) = 3
read(3, "\177ELF\2\1\1\3\0\0\0...", 832) = 832
mmap(NULL, 2037344, PROT_READ, MAP_PRIVATE|..., 3, 0) = 0x7f...
mprotect(0x7f..., 1859584, PROT_READ|PROT_EXEC) = 0
mmap(0x7f..., 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|..., 3, 0x1ef000) = 0x7f...
...
brk(NULL)                               = 0x559b79a00000
brk(0x559b79a21000)                     = 0x559b79a21000  // Heap setup
openat(AT_FDCWD, ".", O_RDONLY|...)     = 3               // ls reads directory

Summary: The Complete Loading Picture

Program loading is the bridge between static executables and dynamic processes. The kernel and dynamic linker work in concert to transform an ELF file into a running program with its own address space, stack, and execution context.

Key Takeaways

•execve() initiates loading by asking the kernel to replace the current process image with a new executable.
•Program headers guide loading, telling the kernel what to map where, with what permissions.
•Memory mapping with demand paging means only accessed pages are loaded, enabling fast startup even for large executables.
•The stack is initialized with argc, argv, envp, and the auxiliary vector containing system information.
•The dynamic linker (ld.so) completes loading by mapping shared libraries, performing relocations, and running constructors.
•Execution finally reaches _start → __libc_start_main → main(), the path from entry point to user code.

What's next:

With loading understood, we now examine relocatable code—the techniques that allow code to work regardless of where it's loaded in memory. The final page explores position-independent code, relocation mechanics, and why these concepts matter for security and flexibility.

Page Complete

You now understand how executables come to life—from the execve system call through kernel parsing, memory mapping, dynamic linking, and finally reaching main(). This knowledge is essential for debugging, security analysis, and understanding process behavior.

Loading Process

From Executable File to Running Process

This is where compilation meets execution—where the linker's output becomes the kernel's input.

What You Will Learn

The exec() System Call: Triggering the Load

The calling process's code, data, and stack are replaced
The process ID (PID) remains the same
Open file descriptors may be preserved (unless marked close-on-exec)
Signal handlers are reset to defaults

This is fundamentally different from fork(): fork creates a new process with a copy of the parent's image, while exec replaces the current image entirely.

execve System Call
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// The fundamental exec variant
int execve(const char *pathname,    // Path to executable
           char *const argv[],       // Command-line arguments
           char *const envp[]);      // Environment variables
 
// Example usage
char *args[] = {"ls", "-la", NULL};  // Must be NULL-terminated
char *env[] = {"PATH=/bin", NULL};   // Environment
 
execve("/bin/ls", args, env);
// If successful, this never returns!
// The current process image is replaced entirely
 
perror("execve failed");  // Only reached if exec fails

What Happens Inside the Kernel

When execve() is called, the kernel performs these steps:

Validate the executable path — Check existence and permissions
Read the file header — Determine executable format (ELF, script, etc.)
Handle interpreters — For scripts, load the interpreter instead
Check credentials — Handle setuid/setgid bits
Prepare the new process image — Set up memory, mappings
Transfer control — Jump to the entry point or dynamic linker

Kernel's execve Entry Point (simplified)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Linux kernel: fs/exec.c (conceptual)
 
SYSCALL_DEFINE3(execve, const char __user *, filename,
                const char __user *const __user *, argv,
                const char __user *const __user *, envp)
{
    // Step 1: Open and read the file
    struct file *file = open_exec(filename);
    
    // Step 2: Identify format and find handler
    // Magic bytes determine type: ELF (0x7f ELF), script (#!), etc.
    struct linux_binfmt *fmt = search_binary_handler(bprm);
    
    // Step 3: Format-specific loading (for ELF: load_elf_binary)
    retval = fmt->load_binary(bprm);
    
    // Step 4: Setup stack with argc, argv, envp
    // Step 5: Start execution at entry point
    
    return retval;  // Never returns on success
}

Binary Formats

ELF Loading Sequence

Program Headers (Segments)

While section headers describe the file for linking, program headers describe segments for loading. Each loadable segment specifies:

Virtual address: Where in memory to place the segment
File offset and size: Where in the file the segment data lives
Memory size: How much memory to allocate (may exceed file size for .bss)
Permissions: Read, write, execute flags
Alignment: Memory alignment requirements

Viewing Program Headers (readelf -l)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$ readelf -l /bin/ls
 
Elf file type is DYN (Position-Independent Executable)
Entry point 0x6b10
There are 13 program headers, starting at offset 64
 
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x0002d8 0x0002d8 R   0x8
  INTERP         0x000318 0x0000000000000318 0x0000000000000318 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x003510 0x003510 R   0x1000
  LOAD           0x004000 0x0000000000004000 0x0000000000004000 0x013581 0x013581 R E 0x1000
  LOAD           0x018000 0x0000000000018000 0x0000000000018000 0x004ba8 0x004ba8 R   0x1000
  LOAD           0x01d390 0x000000000001e390 0x000000000001e390 0x001288 0x002548 RW  0x1000
  DYNAMIC        0x01e348 0x000000000001f348 0x000000000001f348 0x0001f0 0x0001f0 RW  0x8
  NOTE           0x000338 0x0000000000000338 0x0000000000000338 0x000030 0x000030 R   0x8
  GNU_EH_FRAME   0x01c8ec 0x000000000001c8ec 0x000000000001c8ec 0x0004dc 0x0004dc R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x01d390 0x000000000001e390 0x000000000001e390 0x000c70 0x000c70 R   0x1
 
 Section to Segment mapping:
  00     
  01     .interp
  02     .interp .note.gnu.build-id .note.ABI-tag .gnu.hash ...
  03     .init .plt .plt.got .plt.sec .text .fini
  04     .rodata .eh_frame_hdr .eh_frame
  05     .init_array .fini_array .data.rel.ro .dynamic .got .data .bss

Key Segment Types

PT_LOAD: Loadable segment—data copied from file to memory
PT_INTERP: Specifies the dynamic linker (interpreter) to use
PT_DYNAMIC: Contains dynamic linking information
PT_GNU_STACK: Indicates whether stack should be executable
PT_GNU_RELRO: Section to be made read-only after relocations

The kernel processes these headers to create the memory layout:

Converting Mermaid diagram...

Memory Mapping: Demand Paging in Action

The kernel doesn't actually copy the entire executable into memory. Instead, it uses memory mapping (mmap) to establish a relationship between virtual addresses and the executable file on disk.

Demand Paging

With memory mapping:

Virtual address ranges are reserved but not immediately populated
Page table entries marked invalid initially
On first access: page fault occurs
Kernel handles fault by loading the page from disk
Subsequent accesses hit memory directly (no fault)

This means a program can start executing almost immediately, even if the executable is gigabytes large. Only the pages actually accessed are loaded.

Memory Mapping Conceptual View
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Kernel maps segments using mmap-like mechanism
 
// Map code segment (read-only, executable, file-backed)
addr = mmap(
    load_addr,              // Requested virtual address
    segment_size,           // Size
    PROT_READ | PROT_EXEC,  // Read and execute
    MAP_PRIVATE | MAP_FIXED,// Private copy, fixed address
    fd,                     // File descriptor of executable
    file_offset             // Offset in file
);
 
// Map data segment (read-write, file-backed)
addr = mmap(
    data_load_addr,
    data_size,
    PROT_READ | PROT_WRITE,
    MAP_PRIVATE | MAP_FIXED,
    fd,
    data_file_offset
);
 
// .bss: Additional anonymous memory (not in file)
if (mem_size > file_size) {
    // Zero-initialized expansion for .bss
    mmap(
        data_load_addr + file_size,
        mem_size - file_size,
        PROT_READ | PROT_WRITE,
        MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
        -1, 0
    );
}

Copy-on-Write (COW)

For shared libraries mapped into multiple processes, the kernel uses copy-on-write for writable pages:

Read-only code pages are shared directly (same physical frame)
Writable data pages are marked COW initially
On write attempt: Page fault triggers duplication
Each process gets its own copy only when modified

This optimization dramatically reduces memory usage when many processes use the same libraries.

Viewing Process Memory Map
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ cat /proc/$(pidof firefox)/maps | head -30
5593e7a00000-5593e7a01000 r--p 00000000 103:02 2883844 /usr/lib/firefox/firefox
5593e7a01000-5593e7a02000 r-xp 00001000 103:02 2883844 /usr/lib/firefox/firefox
5593e7a02000-5593e7a03000 r--p 00002000 103:02 2883844 /usr/lib/firefox/firefox
5593e7a03000-5593e7a04000 rw-p 00002000 103:02 2883844 /usr/lib/firefox/firefox
7f1234500000-7f1234522000 r--p 00000000 103:02 1234567 /lib/x86_64-linux-gnu/libc.so.6
7f1234522000-7f12346b1000 r-xp 00022000 103:02 1234567 /lib/x86_64-linux-gnu/libc.so.6
...
7ffe12340000-7ffe12361000 rw-p 00000000 00:00 0        [stack]
7ffe123fe000-7ffe12402000 r--p 00000000 00:00 0        [vvar]
7ffe12402000-7ffe12404000 r-xp 00000000 00:00 0        [vdso]
 
# Columns: address range, permissions, offset, device, inode, pathname
# Permissions: r=read, w=write, x=execute, p=private(COW), s=shared

ASLR - Address Space Layout Randomization

Stack Setup: Arguments and Environment

Before transferring control to the program, the kernel must set up the initial stack. The stack contains critical information the program needs to start:

argc: Argument count
argv: Array of argument string pointers
envp: Array of environment string pointers
Auxiliary vector (auxv): System information for the dynamic linker

Initial Stack Layout
// Stack layout after exec (top of stack = low address)
// (Stack grows downward, so this is from top to bottom)
 
// ┌─────────────────────────────────────────┐ ← High address
// │ Information block                        │
// │   (strings for argv and envp)           │
// ├─────────────────────────────────────────┤
// │ Null auxiliary vector entry              │
// ├─────────────────────────────────────────┤
// │ Auxiliary vector entries (AT_*)          │
// │   AT_PHDR, AT_ENTRY, AT_PHNUM, etc.     │
// ├─────────────────────────────────────────┤
// │ NULL word (envp terminator)              │
// ├─────────────────────────────────────────┤
// │ Environment pointers (envp[0], ...)      │
// ├─────────────────────────────────────────┤
// │ NULL word (argv terminator)              │
// ├─────────────────────────────────────────┤
// │ Argument pointers (argv[0], argv[1]...)  │
// ├─────────────────────────────────────────┤
// │ argc (argument count)                    │
// └─────────────────────────────────────────┘ ← Initial SP
//                                             ← Low address

Auxiliary Vector (auxv)

The auxiliary vector is a critical but often overlooked structure. It provides the dynamic linker and C library with essential system information:

Auxiliary Vector Entries
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Important AT_* entries in the auxiliary vector
 
AT_PHDR     // Address of program headers in memory
AT_PHENT    // Size of program header entry
AT_PHNUM    // Number of program headers
AT_ENTRY    // Entry point of the program
AT_BASE     // Base address where interpreter was loaded
AT_EXECFN   // Filename of executed program
AT_PAGESZ   // System page size
AT_UID      // Real user ID
AT_EUID     // Effective user ID
AT_GID      // Real group ID
AT_EGID     // Effective group ID
AT_RANDOM   // Address of 16 random bytes (for stack canary)
AT_SYSINFO_EHDR  // Address of vDSO
 
// View auxiliary vector for a running process
$ LD_SHOW_AUXV=1 ls
AT_SYSINFO_EHDR: 0x7ffc3d7fe000
AT_HWCAP:        bfebfbff
AT_PAGESZ:       4096
AT_PHDR:         0x55d4e8a00040
AT_PHENT:        56
AT_PHNUM:        13
AT_BASE:         0x7f8b12a00000
AT_ENTRY:        0x55d4e8a06b10
AT_UID:          1000
...

Why the Dynamic Linker Needs auxv

Dynamic Linker Execution

Dynamic Linker Initialization

Self-relocation: ld.so must relocate itself (it's also a shared object!)
Parse the executable: Read program headers and dynamic section
Load dependencies: Map all required shared libraries
Perform relocations: Patch GOT entries, resolve symbols
Initialize libraries: Run constructors (.init_array functions)
Transfer control: Jump to the program's actual entry point

Dynamic Linker Loading Sequence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Conceptual loading sequence in ld.so
 
void _dl_start(void *arg) {
    // 1. Bootstrap: Relocate ourselves
    _dl_start_final(arg);  // Self-relocation before we can use GOT
    
    // 2. Parse executable's DYNAMIC section
    for (dyn = _DYNAMIC; dyn->d_tag != DT_NULL; dyn++) {
        switch (dyn->d_tag) {
            case DT_NEEDED:  // Library dependency
                needed_libs.add(dyn->d_un.d_val);
                break;
            case DT_RPATH:   // Library search path
            case DT_RUNPATH:
                search_paths.add(dyn->d_un.d_val);
                break;
            // Many more entries...
        }
    }
    
    // 3. Load required libraries (recursively)
    for (lib : needed_libs) {
        load_library(lib);  // May trigger more DT_NEEDED
    }
    
    // 4. Perform relocations
    _dl_relocate_object(main_map);
    for (lib : loaded_libs) {
        _dl_relocate_object(lib);
    }
    
    // 5. Call constructors (bottom-up: libraries first)
    _dl_init(main_map, argc, argv, envp);
    
    // 6. Transfer to program entry point
    _dl_start_user(entry_point);
}

Library Loading Order

Shared libraries are loaded in breadth-first order based on DT_NEEDED entries:

Libraries needed by the executable
Libraries needed by those libraries
And so on, recursively

This determines the symbol lookup order: when searching for a symbol, the dynamic linker searches in the order libraries were loaded (actually, a more complex "global scope" ordering is used).

Tracing Library Loading
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Trace shared library loading
$ LD_DEBUG=libs ./program
     find library=libc.so.6 [0]; searching
      search path=/lib/x86_64-linux-gnu/tls/haswell/...
       trying file=/lib/x86_64-linux-gnu/tls/haswell/libc.so.6
       ...
       trying file=/lib/x86_64-linux-gnu/libc.so.6
      found libc.so.6 at /lib/x86_64-linux-gnu/libc.so.6
 
# Trace symbol resolution
$ LD_DEBUG=symbols ./program
     symbol=printf;  lookup in file=./program
     symbol=printf;  lookup in file=/lib/x86_64-linux-gnu/libc.so.6
     binding file ./program to /lib/x86_64-linux-gnu/libc.so.6: normal symbol `printf'

Symbol Interposition

The Entry Point: Where Execution Begins

After the dynamic linker completes its work, control is transferred to the program's entry point. But this isn't main()—it's the C runtime startup code called _start.

The Path to main()

_start (entry point, written in assembly)
- Called by kernel/dynamic linker
- Minimal setup, calls __libc_start_main
__libc_start_main (C library initialization)
- Sets up threading
- Registers atexit handlers
- Calls constructors
- Calls main()
- Calls exit() with main's return value
main() (your code!)
- Finally, user code executes
- Receives argc, argv, envp as arguments

_start Entry Point (x86-64)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// glibc's _start (simplified)
// sysdeps/x86_64/start.S
 
    .text
    .globl _start
    .type _start, @function
_start:
    // Clear frame pointer for debugger
    xorl    %ebp, %ebp
    
    // argc is at top of stack, put in first arg register
    popq    %rdi                    // argc
    
    // argv is now at top of stack
    movq    %rsp, %rsi              // argv
    
    // Align stack to 16 bytes (ABI requirement)
    andq    $~15, %rsp
    
    // Push garbage to maintain alignment, then call
    pushq   %rax
    
    // Arguments to __libc_start_main:
    // rdi = main, rsi = argc, rdx = argv, rcx = init
    // r8 = fini, r9 = rtld_fini, stack = stack_end
    movq    main@GOTPCREL(%rip), %rdi
    movq    __libc_csu_init@GOTPCREL(%rip), %rcx
    movq    __libc_csu_fini@GOTPCREL(%rip), %r8
    
    call    __libc_start_main@PLT
    
    // __libc_start_main never returns
    // But just in case...
    hlt

__libc_start_main Overview
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Simplified view of __libc_start_main
// glibc: csu/libc-start.c
 
int __libc_start_main(
    int (*main)(int, char **, char **),
    int argc,
    char **argv,
    void (*init)(void),      // __libc_csu_init
    void (*fini)(void),      // __libc_csu_fini
    void (*rtld_fini)(void), // Dynamic linker cleanup
    void *stack_end)
{
    // Store stack end for profiling/backtraces
    __libc_stack_end = stack_end;
    
    // Get environment pointers (after argv)
    char **envp = argv + argc + 1;
    
    // Initialize threading
    __pthread_initialize_minimal();
    
    // Register cleanup functions
    __cxa_atexit(rtld_fini, NULL, NULL);
    __cxa_atexit(fini, NULL, NULL);
    
    // Call constructors
    (*init)(argc, argv, envp);
    
    // Call main!
    int result = main(argc, argv, envp);
    
    // Exit (runs destructors, atexit handlers)
    exit(result);
}

Programs Without main()

Complete Loading Timeline

Let's trace the complete loading sequence from shell command to first user instruction:

Timeline: Running `/bin/ls`

Loading Sequence Timeline
Stage	Actor	Key Actions
1. Shell	bash/zsh	fork() creates child process, prepare argv/envp
2. execve()	Child process	System call to kernel, path = '/bin/ls'
3. Format Check	Kernel	Read ELF header, verify magic (0x7f ELF)
4. Process Image	Kernel	Destroy old mappings, create new address space
5. mmap Segments	Kernel	Map LOAD segments from /bin/ls (code, data)
6. Load ld.so	Kernel	Read INTERP, map /lib64/ld-linux-x86-64.so.2
7. Stack Setup	Kernel	Push argc, argv, envp, auxv to stack
8. Transfer	Kernel → ld.so	Jump to ld.so entry point
9. Self-reloc	ld.so	Relocate ld.so itself
10. Load Libs	ld.so	Map libc.so.6, libpthread.so, etc.
11. Relocations	ld.so	Fill GOT entries, patch code
12. Constructors	ld.so	Call .init_array functions in libraries
13. Transfer	ld.so → _start	Jump to /bin/ls entry point
14. C Runtime	_start → main	__libc_start_main, then main()
15. User Code	main()	ls program logic executes!

Tracing with strace
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ strace -f /bin/ls 2>&1 | head -25
execve("/bin/ls", ["ls"], 0x7ffe...) = 0
brk(NULL)                               = 0x559b79a00000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, ...) = 0x7f1234567000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
...
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|...) = 3
read(3, "\177ELF\2\1\1\3\0\0\0...", 832) = 832
mmap(NULL, 2037344, PROT_READ, MAP_PRIVATE|..., 3, 0) = 0x7f...
mprotect(0x7f..., 1859584, PROT_READ|PROT_EXEC) = 0
mmap(0x7f..., 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|..., 3, 0x1ef000) = 0x7f...
...
brk(NULL)                               = 0x559b79a00000
brk(0x559b79a21000)                     = 0x559b79a21000  // Heap setup
openat(AT_FDCWD, ".", O_RDONLY|...)     = 3               // ls reads directory

Summary: The Complete Loading Picture

Key Takeaways

•execve() initiates loading by asking the kernel to replace the current process image with a new executable.
•Program headers guide loading, telling the kernel what to map where, with what permissions.
•Memory mapping with demand paging means only accessed pages are loaded, enabling fast startup even for large executables.
•The stack is initialized with argc, argv, envp, and the auxiliary vector containing system information.
•The dynamic linker (ld.so) completes loading by mapping shared libraries, performing relocations, and running constructors.
•Execution finally reaches _start → __libc_start_main → main(), the path from entry point to user code.

What's next:

Page Complete

Loading Process

What Happens Inside the Kernel

Program Headers (Segments)

Key Segment Types

Demand Paging

Copy-on-Write (COW)

Auxiliary Vector (auxv)

Dynamic Linker Initialization

Library Loading Order

The Path to main()

Timeline: Running /bin/ls

Loading Process

What Happens Inside the Kernel

Program Headers (Segments)

Key Segment Types

Demand Paging

Copy-on-Write (COW)

Auxiliary Vector (auxv)

Dynamic Linker Initialization

Library Loading Order

The Path to main()

Timeline: Running /bin/ls

Timeline: Running `/bin/ls`

Timeline: Running `/bin/ls`