Linkers And Loaders - Learning Module

Loading content...

0/227

Relocatable Code

Code That Works Anywhere

Early computer programs were written for specific memory addresses. If the program assumed it would be loaded at address 0x1000, loading it anywhere else would cause immediate failure—jumps would go to wrong locations, data accesses would read garbage. This rigidity was acceptable when one program had exclusive control of memory, but modern systems run many programs simultaneously, each needing its own address range.

Relocatable code solves this problem. It's code written (or processed) so that it functions correctly regardless of its load address. This capability underlies:

Shared libraries that load at different addresses in different processes
ASLR that randomizes addresses for security
Position-Independent Executables (PIE) for maximum flexibility
Dynamic loading of plugins at runtime

What You Will Learn

By the end of this page, you will understand the mechanics of relocatable code—from static relocation by the linker to position-independent code (PIC) for shared libraries. You'll grasp PC-relative addressing, GOT/PLT mechanics, and how these techniques enable modern software flexibility and security.

The Relocation Problem

To understand relocation, we must first understand why it's necessary. Consider a simple function that references a global variable:

int counter = 0;
void increment() {
    counter++;
}

When compiled, the increment function must access counter. But what address should it use? At compile time, we don't know where the program will be loaded in memory.

Address Problem Illustration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Object file assembly (addresses relative to start of file)
// Assume counter is at offset 0x100 in .data section
 
increment:
    movl    $0x????????, %eax    ; Load address of counter
    incl    (%eax)               ; Increment value at that address
    ret
 
; The ???????? can't be filled in because:
; 1. We don't know where .data will be loaded
; 2. We don't know where this code will be loaded
; 3. The final address depends on all object files being combined
 
; The assembler generates a PLACEHOLDER (usually 0)
; Plus a RELOCATION ENTRY saying "patch this with counter's address"

Two Approaches to Relocation

There are fundamentally two ways to handle this problem:

1. Load-Time Relocation (Static Relocation)

Compile code with placeholder addresses
Linker/loader patches in final addresses when positions are known
Simple but requires modifying code at load time

2. Position-Independent Code (PIC)

Compile code to determine addresses at runtime
Uses PC-relative addressing and indirection tables
More complex but allows true code sharing

Relocation Approaches Comparison
Aspect	Load-Time Relocation	Position-Independent Code
Code modification	Patched at load time	Never modified
Code sharing	Each process gets modified copy	Code pages shared across processes
Load time	Faster (simple patching)	Slightly slower (GOT setup)
Runtime overhead	None after loading	Minor (GOT indirection)
ASLR compatible	Yes, but wastes memory	Yes, with full sharing
Used for	Main executable (PIE disabled)	Shared libraries, PIE executables

Historical Context

Early systems used only load-time relocation because PIC required architectural features (like PC-relative addressing) that weren't always available. Modern architectures like x86-64 make PIC efficient, leading to its widespread adoption.

Load-Time Relocation: Patching Addresses

Load-time relocation (also called static relocation or text relocation) involves the linker or loader modifying the code to insert final addresses. This is the simplest approach conceptually.

How It Works

Compiler generates code with placeholder addresses (typically 0)
Assembler creates relocation entries describing where patches are needed
Linker determines final addresses based on how it lays out sections
Linker/Loader patches the code by computing and inserting final addresses

Relocation Entry Processing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Relocation entry describes a patch location
struct Elf64_Rela {
    Elf64_Addr  r_offset;   // Where to patch (offset in section)
    Elf64_Xword r_info;     // Symbol + Type (what kind of patch)
    Elf64_Sxword r_addend;  // Constant to add to computed address
};
 
// Example: Patching reference to counter variable
 
// Object file has:
// Offset 0x05: movl $0x00000000, %eax  <- placeholder
// Relocation: {offset: 0x06, symbol: counter, type: R_X86_64_32}
 
// Linker determines:
// - .text will be loaded at 0x401000
// - counter will be at 0x404000
 
// After linking:
// Offset 0x05: movl $0x00404000, %eax  <- patched!
// The 0x00000000 at offset 0x06 is replaced with 0x00404000

Common Relocation Types (x86-64)

Different relocation types specify different calculations:

Common x86-64 Relocation Types
Type	Size	Calculation	Use Case
R_X86_64_64	8 bytes	S + A	Absolute 64-bit address
R_X86_64_32	4 bytes	S + A (truncated)	Absolute 32-bit address
R_X86_64_PC32	4 bytes	S + A - P	PC-relative 32-bit offset
R_X86_64_PLT32	4 bytes	L + A - P	PLT-relative (function calls)
R_X86_64_GOTPCREL	4 bytes	G + GOT + A - P	PC-relative to GOT entry

Where:

S = Symbol value (final address)
A = Addend from relocation entry
P = Place (address being patched)
L = PLT entry address
G = Offset in GOT for this symbol
GOT = Address of Global Offset Table

Text Relocations and Security

Load-time relocation requires modifying the .text section, which prevents code sharing and requires writable code pages. Writable code pages are a security risk. Modern systems reject binaries with text relocations or require special permissions. Always use -fPIC for shared libraries.

Position-Independent Code (PIC)

Position-Independent Code takes a different approach: instead of patching addresses into code, it uses techniques that compute addresses at runtime relative to the current position. The key insight is that while absolute addresses change when code is relocated, relative offsets between parts of the same module remain constant.

PC-Relative Addressing

The foundation of PIC is PC-relative addressing—computing addresses as offsets from the Program Counter (instruction pointer). On x86-64, the RIP register (64-bit instruction pointer) can be used as a base for addressing:

PC-Relative vs Absolute Addressing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// Accessing a local static variable
 
// NON-PIC (absolute addressing)
// If loaded at different address, this breaks!
movl    $0x404000, %eax      ; Hardcoded address
movl    (%eax), %edx
 
// PIC (RIP-relative addressing)
// Works regardless of load address!
movl    local_var(%rip), %edx    ; Offset from current RIP
 
// The assembler calculates:
// If instruction is at offset 0x1000 and local_var at offset 0x2000
// Encoded as: movl 0x1000(%rip), %edx  (offset = 0x2000 - 0x1000)
// At runtime: 
//   If loaded at base 0x7f0000000000
//   RIP = 0x7f0000001005 (after fetching instruction)
//   Address = 0x7f0000001005 + 0x1000 = 0x7f0000002005 ← correct!

The Challenge: External Symbols

PC-relative addressing works for symbols within the same module because their relative positions are known at link time. But what about external symbols—functions and data from other shared libraries? Their positions can't be known until runtime.

This is where the Global Offset Table (GOT) comes in.

Converting Mermaid diagram...

Compile with -fPIC

To generate position-independent code, use gcc -fPIC (for shared libraries) or gcc -fpic (slightly smaller but may have limitations). On x86-64, -fPIC is highly efficient due to native RIP-relative addressing support.

Global Offset Table: The Indirection Layer

The Global Offset Table (GOT) is the key data structure enabling position-independent access to external symbols. It's a table of pointers, with one entry for each external symbol the module references.

How GOT Works

Code uses PC-relative addressing to access GOT (GOT position is known at link time)
GOT contains pointers to actual symbol locations
Dynamic linker fills GOT with correct addresses at load time
Each process has its own GOT (it's in writable data, not shared code)

GOT Access Pattern
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Accessing external variable through GOT
 
// C code:
extern int external_counter;
void increment_external() {
    external_counter++;
}
 
// PIC assembly:
increment_external:
    ; Step 1: Load GOT entry address (PC-relative, known at link time)
    movq    external_counter@GOTPCREL(%rip), %rax
    
    ; Now %rax contains address of GOT entry for external_counter
    ; GOT entry contains address of external_counter
    
    ; Step 2: Load actual address from GOT
    movq    (%rax), %rax
    
    ; %rax now contains actual address of external_counter
    
    ; Step 3: Increment the value
    incl    (%rax)
    ret
 
// At runtime:
// %rip + offset → GOT entry at 0x7f0000403ff8
// GOT[external_counter] contains 0x7f0000501000
// (filled by dynamic linker from the library providing external_counter)

GOT Sections

ELF files may have multiple GOT-related sections:

.got: Standard GOT for data references
.got.plt: GOT entries used by PLT for function calls (supports lazy binding)
.plt.got: Combined PLT/GOT entries (in some linking modes)

The separation allows different treatment: .got must be fully initialized at load time, while .got.plt can be lazily bound.

Examining GOT Contents
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ objdump -R /usr/bin/ls | head -20
/usr/bin/ls:     file format elf64-x86-64
 
DYNAMIC RELOCATION RECORDS
OFFSET           TYPE              VALUE 
000000000001e548 R_X86_64_GLOB_DAT  __gmon_start__
000000000001e550 R_X86_64_GLOB_DAT  _ITM_deregisterTMCloneTable
000000000001e558 R_X86_64_GLOB_DAT  stdout@GLIBC_2.2.5
000000000001e560 R_X86_64_GLOB_DAT  stderr@GLIBC_2.2.5
000000000001e568 R_X86_64_GLOB_DAT  program_invocation_short_name@@GLIBC_2.2.5
...
 
$ readelf -x .got.plt /usr/bin/ls | head
Hex dump of section '.got.plt':
  0x0001f000 901e0100 00000000 00000000 00000000 ................
  0x0001f010 00000000 00000000 36400000 00000000 ........6@......
  0x0001f020 46400000 00000000 56400000 00000000 F@......V@......
 
# Initial GOT.PLT entries point back to PLT (for lazy binding)
# After first call, they're patched to actual function addresses

GOT Security Considerations

The GOT must be writable (for the dynamic linker to fill it), making it an attack target. GOT overwrite attacks modify entries to redirect function calls. Mitigations include RELRO (marking GOT read-only after initialization) and stack canaries.

Procedure Linkage Table (PLT) for Functions

While the GOT provides access to external data, the Procedure Linkage Table (PLT) optimizes external function calls with lazy binding—deferring symbol resolution until the first call to each function.

Why Lazy Binding?

A program might link against libraries with hundreds of functions but only call a few during typical execution. Resolving all symbols at startup would waste time. Lazy binding defers this work until needed.

PLT Structure

Each external function has a PLT entry (stub) that:

On first call: Invokes the dynamic linker's resolver
Resolver finds function: Searches libraries for the symbol
Updates GOT: Writes actual address to GOT entry
Jumps to function: First call completes
Subsequent calls: PLT jumps directly through updated GOT

PLT Entry Walk-through
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// PLT entry for puts function
 
// Before first call to puts:
// .got.plt entry contains: address of PLT stub's push instruction
 
puts@plt:
    jmp     *puts@GOTPLT(%rip)    ; Jump through GOT entry
                                   ; Initially → next instruction (not resolved)
    pushq   $0                     ; Push relocation index
    jmp     .plt                   ; Jump to resolver
 
// First call execution:
// 1. jmp *GOT → lands on pushq (GOT points here initially)
// 2. pushq $0 → push relocation index for puts
// 3. jmp .plt → go to PLT[0] (resolver trampoline)
// 4. Resolver finds puts in libc at 0x7f...puts
// 5. Resolver writes 0x7f...puts to GOT entry
// 6. Resolver jumps to 0x7f...puts
// 7. puts executes and returns to caller
 
// After first call:
// .got.plt entry contains: 0x7f...puts (actual address)
 
// Second call:
// 1. jmp *GOT → goes directly to puts in libc
// No resolver invocation!

PLT[0]: The Resolver Trampoline
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// PLT[0] - Common entry point for all lazy resolutions
 
.plt:
    pushq   (%rip)                ; Push GOT[1] - link_map pointer
                                  ; (identifies this shared object)
    jmpq    *GOT[2](%rip)         ; Jump to _dl_runtime_resolve
                                  ; (the actual resolver function)
 
// _dl_runtime_resolve (in ld.so):
// 1. Receives: relocation index (from individual PLT stub)
// 2. Receives: link_map pointer (identifies which library)
// 3. Looks up symbol using relocation info
// 4. Finds definition in loaded libraries
// 5. Updates GOT entry with found address
// 6. Jumps to resolved function
 
// The resolver is careful to preserve all registers
// so the function call appears seamless

BIND_NOW and LD_BIND_NOW

Lazy binding can be disabled with the DF_BIND_NOW flag or LD_BIND_NOW=1 environment variable. This resolves all symbols at load time, increasing startup time but providing more predictable behavior and enabling full RELRO protection.

Position-Independent Executables (PIE)

While shared libraries have always required PIC, traditional executables used fixed addresses (the main executable was loaded at a predetermined base address). Position-Independent Executables (PIE) apply PIC techniques to the main executable, enabling ASLR for everything.

ASLR (Address Space Layout Randomization)

ASLR is a security technique that randomizes the memory locations of:

Stack base
Heap base
Shared library mappings
Main executable (if PIE)

Without PIE, the main executable loads at a fixed address, giving attackers a reliable target. With PIE, even the main program's addresses are unpredictable.

Creating and Identifying PIE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Compile as PIE (default on many modern systems)
gcc -pie -fPIE source.c -o program_pie
 
# Compile as non-PIE (traditional)
gcc -no-pie -fno-PIE source.c -o program_nopie
 
# Check if binary is PIE
$ file program_pie
program_pie: ELF 64-bit LSB pie executable, x86-64, ...
                                   ^^^
$ file program_nopie
program_nopie: ELF 64-bit LSB executable, x86-64, ...
                             ^^^^^^^^^^
 
# Using readelf
$ readelf -h program_pie | grep Type
  Type:                              DYN (Position-Independent Executable)
$ readelf -h program_nopie | grep Type
  Type:                              EXEC (Executable file)
 
# ASLR in action
$ ./program_pie &
[1] 12345
$ cat /proc/12345/maps | head -1
5622a3400000-5622a3401000 r--p ...   ← Random base
 
$ ./program_pie &  
[2] 12346
$ cat /proc/12346/maps | head -1
563f12800000-563f12801000 r--p ...   ← Different random base

PIE Performance

PIE has minimal overhead on x86-64 due to efficient RIP-relative addressing. The GOT indirection for external references is the same as in shared libraries. For internal references, PC-relative addressing has zero overhead.

On older architectures (32-bit x86) without RIP-relative addressing, PIE required dedicating a register as a GOT base pointer, causing noticeable overhead.

PIE Benefits

•Full ASLR coverage including main executable
•Enhanced security against code reuse attacks
•Works with RELRO for GOT protection
•Minimal overhead on x86-64
•Default on most Linux distributions now

PIE Considerations

•Slightly larger code (GOT references)
•Minor load-time overhead (more relocations)
•Debugging slightly more complex
•Some embedded systems may prefer fixed addresses
•Legacy tools may not support PIE analysis

Always Use PIE

Unless you have specific compatibility requirements, always compile with PIE enabled. The security benefits (full ASLR) far outweigh the minimal performance cost. Most modern distributions enable PIE by default.

Security: RELRO and Beyond

The writable nature of GOT creates security vulnerabilities. Several protections mitigate these risks:

RELRO (Relocation Read-Only)

RELRO makes portions of the GOT read-only after the dynamic linker finishes initialization.

Partial RELRO (default):

.dynamic, .ctors, .dtors, etc. made read-only
.got.plt remains writable (lazy binding works)
Protects against some overwrites but not PLT-GOT

Full RELRO (with -Wl,-z,relro,-z,now):

All GOT entries resolved at load time (no lazy binding)
Entire GOT marked read-only
Complete protection but slower startup

Checking and Enabling RELRO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Compile with full RELRO
gcc -Wl,-z,relro,-z,now source.c -o secure_program
 
# Check RELRO status using checksec (pwntools)
$ checksec program_partial_relro
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
 
$ checksec program_full_relro
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
 
# Check using readelf
$ readelf -l program | grep GNU_RELRO
  GNU_RELRO      0x001e90 0x000000000001fe90 0x000000000001fe90
 
# Check for BIND_NOW
$ readelf -d program | grep BIND_NOW
 0x000000000000001e (FLAGS)    BIND_NOW

Security Protections Summary
Protection	Purpose	Enable Flag
PIE	ASLR for main executable	`-pie -fPIE`
Partial RELRO	Protect some GOT sections	`-Wl,-z,relro` (often default)
Full RELRO	Protect all GOT, disable lazy binding	`-Wl,-z,relro,-z,now`
NX/DEP	Non-executable stack/data	Default (use `-z noexecstack`)
Stack Canaries	Detect stack buffer overflows	`-fstack-protector-strong`
FORTIFY	Buffer overflow checks in libc	`-D_FORTIFY_SOURCE=2`

Defense in Depth

No single protection is perfect. ASLR can be defeated by information leaks. Canaries can be bypassed with non-linear overwrites. Use all available protections together for defense in depth.

Summary: Relocatable Code Mastery

Relocatable code is the foundation of modern software flexibility. From shared libraries to ASLR security, the techniques of position-independent code enable the dynamic, secure systems we rely on daily.

Key Takeaways

•The relocation problem arises because final addresses aren't known at compile time—addresses must be resolved at link or load time.
•Load-time relocation patches addresses directly into code, simple but prevents sharing and requires writable code.
•Position-independent code (PIC) uses PC-relative addressing and GOT indirection to work at any address without modification.
•The GOT provides indirection for external symbols, with entries filled by the dynamic linker at load time.
•PLT enables lazy binding for function calls, deferring symbol resolution until first use for faster startup.
•PIE extends PIC to executables, enabling full ASLR and significantly improving security.
•RELRO protections make GOT read-only after initialization, preventing GOT overwrite attacks.

Module Complete:

Congratulations! You've completed the Linkers and Loaders module. You now understand the complete journey from source code to executing process—compilation, object files, linking, loading, and relocatable code. This knowledge is foundational for operating systems development, security research, performance optimization, and advanced debugging.

Module Complete

You've mastered the journey from source code to running process. You understand compilation, object file formats, static and dynamic linking, the loading process, and relocatable code. This knowledge forms the bridge between programming and operating systems—essential for any systems-level work.