Loading learning content...
When the assembler completes its work, it produces an object file—a structured container holding machine code, data, metadata, and instructions for the linker. Object files are the lingua franca of compiled programming: they provide a standardized format that allows code compiled from different languages, by different compilers, at different times, to be combined into a single executable.
Understanding object file internals is not merely academic curiosity. This knowledge is essential for:
gdb, objdump, and readelf expose object file structuresBy the end of this page, you will understand object file formats (ELF, COFF, Mach-O), their internal structure, sections, symbol tables, relocation entries, and how these components enable linking and loading. You'll be able to analyze object files using standard tools.
Different operating systems use different object file formats, though the core concepts are remarkably similar. The three dominant formats are:
ELF (Executable and Linkable Format)
COFF (Common Object File Format)
.obj, .exe, and .dll files on WindowsMach-O (Mach Object)
.o, executables, and .dylib (dynamic libraries)| Format | Platform | Object Extension | Executable | Dynamic Library |
|---|---|---|---|---|
| ELF | Linux, BSD, Solaris | .o | No extension | .so |
| PE/COFF | Windows | .obj | .exe | .dll |
| Mach-O | macOS, iOS | .o | No extension | .dylib |
Why ELF matters most:
We'll focus primarily on ELF because:
The principles we explore apply universally—COFF and Mach-O have analogous structures with different terminology.
ELF was designed in the late 1980s as part of System V Release 4 to replace older formats like a.out (Assembler OUTput). Its name reflects its dual purpose: the same format serves both 'Executable' (ready to run) and 'Linkable' (ready for linking) roles.
An ELF file consists of several distinct parts organized hierarchically:
The key insight is that ELF provides two views of the same file:
123456789101112131415161718192021
$ readelf -h hello.oELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: REL (Relocatable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x0 Start of program headers: 0 (bytes into file) Start of section headers: 528 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 0 (bytes) Number of program headers: 0 Size of section headers: 64 (bytes) Number of section headers: 13 Section header string table index: 12Key fields in the ELF header:
0x7f ELF): Identifies the file as ELFREL (relocatable), EXEC (executable), DYN (shared object)ELF files are divided into sections—contiguous chunks of data with specific purposes. Understanding the standard sections is essential for comprehending program memory layout.
The .text section contains executable machine code—the actual instructions the CPU will execute. Key characteristics:
1234567891011121314
$ objdump -d hello.o hello.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <main>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi b: e8 00 00 00 00 call 10 <main+0x10> 10: b8 00 00 00 00 mov $0x0,%eax 15: 5d pop %rbp 16: c3 retThe .data section contains initialized global and static variables. These are variables with explicit initial values that persist for the program's lifetime.
123456789
// These go in .data sectionint global_count = 42; // Initialized globalstatic double pi = 3.14159; // Initialized static int main() { static int call_count = 0; // Initialized local static call_count++; return call_count;}The .bss section (Block Started by Symbol) contains uninitialized global and static variables. The key insight: .bss doesn't store the actual zeros—it just records how many bytes are needed. This saves file space.
BSS optimization: A file with 1MB of uninitialized globals doesn't need 1MB of zeros in the object file. The .bss section header simply says "reserve 1MB at runtime."
1234
// These go in .bss section (no space in file, zeroed at load)int uninitialized_counter; // Implicitly zerostatic char buffer[4096]; // 4KB reserved, not storedint zero_initialized = 0; // May go in .bss (compiler choice)The .rodata section contains constant data that should never be modified:
const)Placing data in .rodata enables the memory to be mapped read-only, catching accidental writes and enabling sharing.
| Section | Contents | In File? | Runtime Permissions |
|---|---|---|---|
.text | Machine code | Yes | Read + Execute |
.data | Initialized globals/statics | Yes | Read + Write |
.bss | Uninitialized globals/statics | No (size only) | Read + Write |
.rodata | Constants, string literals | Yes | Read only |
Use size command to see section sizes: size hello.o. This shows text, data, and bss sizes—essential for understanding program memory footprint and diagnosing binary bloat.
The symbol table is arguably the most important metadata structure in an object file. It's the registry of all named entities—functions, global variables, constants—that either:
Symbol tables enable separate compilation: each source file is compiled independently, and the linker uses symbol tables to connect references to definitions across files.
12345678910111213
// ELF64 Symbol Table Entry (24 bytes each)typedef struct { Elf64_Word st_name; // String table offset for name unsigned char st_info; // Type and binding attributes unsigned char st_other; // Visibility Elf64_Half st_shndx; // Section index (where defined) Elf64_Addr st_value; // Address or offset Elf64_Xword st_size; // Size of the symbol} Elf64_Sym; // st_info encodes both type and binding:// Type: NOTYPE, OBJECT (data), FUNC, SECTION, FILE// Binding: LOCAL, GLOBAL, WEAKBinding determines visibility and linkage behavior:
static in C)Weak symbols are powerful for providing default implementations that users can override:
12345678910111213141516
// library.c - Provides weak default__attribute__((weak)) void error_handler(const char* msg) { fprintf(stderr, "Error: %s", msg); exit(1);} // user_code.c - Can override with strong symbolvoid error_handler(const char* msg) { // Custom error handling log_to_file(msg); attempt_recovery();} // If user doesn't define error_handler, weak default is used// If user defines it, their version wins1234567891011121314151617
$ readelf -s example.o Symbol table '.symtab' contains 12 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS example.c 2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text 3: 0000000000000000 4 OBJECT LOCAL DEFAULT 3 local_static 4: 0000000000000000 28 FUNC GLOBAL DEFAULT 1 main 5: 0000000000000000 4 OBJECT GLOBAL DEFAULT 4 global_var 6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf 7: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND external_func # Ndx column meanings:# UND = Undefined (external reference)# ABS = Absolute (not associated with section)# Number = Section index where symbol is definedSymbol names aren't stored in the symbol table entries directly. Instead, st_name is an offset into a string table (.strtab). The string table is simply a sequence of null-terminated strings. This design:
Every undefined symbol (UND) must be resolved during linking. If the linker can't find a definition, you get the dreaded 'undefined reference' error. Common causes: missing library, typo in function name, or forgetting to link an object file.
Relocation entries are the linker's instructions for fixing addresses. When code references a symbol whose address isn't yet known—an external function, a global variable, or even code in the same file at an undetermined location—the assembler generates a relocation entry.
Each relocation entry tells the linker:
12345678910111213141516
// ELF64 Relocation Entry with Addend (24 bytes each)typedef struct { Elf64_Addr r_offset; // Where to apply relocation Elf64_Xword r_info; // Symbol index and relocation type Elf64_Sxword r_addend; // Constant addend for calculation} Elf64_Rela; // r_info encodes symbol index and type:// Symbol index: upper 32 bits// Relocation type: lower 32 bits // Common x86-64 relocation types:// R_X86_64_PC32 : 32-bit PC-relative// R_X86_64_PLT32 : 32-bit PLT-relative (for function calls)// R_X86_64_64 : 64-bit absolute address// R_X86_64_GOTPCREL: 32-bit PC-relative to GOT entryThe relocation type specifies how to compute the final address. Different types exist because:
PC-relative relocations are crucial for position-independent code (PIC). Instead of storing absolute addresses, code uses offsets from the current instruction pointer. This allows code to work regardless of where it's loaded in memory.
123456789101112131415
$ readelf -r example.o Relocation section '.rela.text' at offset 0x1d8 contains 3 entries: Offset Info Type Sym.Value Sym.Name+Addend000000000007 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 400000000000c 000b00000004 R_X86_64_PLT32 0000000000000000 printf - 4000000000015 000c00000004 R_X86_64_PLT32 0000000000000000 external_func - 4 # Interpretation:# At offset 0x07: reference to .rodata (string constant)# At offset 0x0c: call to printf via PLT# At offset 0x15: call to external_func via PLT # The -4 addend compensates for x86-64 instruction encoding# (the relocation target is computed from the *end* of the instruction)When the linker processes relocations:
For a PC-relative relocation (R_X86_64_PC32):
final_value = symbol_address + addend - relocation_offset
This gives the distance from the current instruction to the target—exactly what a relative jump or call needs.
The relocation mechanism is why you can compile code without knowing where it will be loaded. The linker and loader handle address resolution. This is fundamental to modern operating systems, shared libraries, and ASLR (Address Space Layout Randomization) for security.
Beyond the core sections, object files contain numerous auxiliary sections for debugging, exception handling, and other purposes:
When compiled with -g, the compiler generates DWARF debugging information in sections like:
.debug_info: Type information, variable locations.debug_line: Source line to address mapping.debug_abbrev: Abbreviations for compact encoding.debug_str: Debug string table.debug_frame: Call frame information for stack unwinding1234567891011121314151617181920212223242526
$ readelf -S example.oThere are 14 section headers, starting at offset 0x390: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .text PROGBITS 0000000000000000 00000040 0000000000000017 0000000000000000 AX 0 0 1 [ 2] .rela.text RELA 0000000000000000 00000280 0000000000000048 0000000000000018 I 11 1 8 [ 3] .data PROGBITS 0000000000000000 00000058 0000000000000004 0000000000000000 WA 0 0 4 [ 4] .bss NOBITS 0000000000000000 0000005c 0000000000000004 0000000000000000 WA 0 0 4 [ 5] .rodata PROGBITS 0000000000000000 0000005c 000000000000000e 0000000000000000 A 0 0 1 [ 6] .comment PROGBITS 0000000000000000 0000006a 0000000000000013 0000000000000001 MS 0 0 1 [ 7] .note.GNU-stack PROGBITS 0000000000000000 0000007d 0000000000000000 0000000000000000 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), T (TLS)For languages with exception support (C++, Rust), special sections enable stack unwinding:
.eh_frame: Exception handling frame information (runtime unwinding).eh_frame_hdr: Lookup table for .eh_frame.gcc_except_table: Language-specific exception tables.note.GNU-stack: Indicates whether the stack should be executable (security).comment: Compiler version string.init_array/.fini_array: Constructor/destructor function pointers.got/.plt: Global Offset Table and Procedure Linkage Table (dynamic linking)| Section | Purpose | Loaded? |
|---|---|---|
.text | Executable code | Yes |
.rodata | Read-only data, string literals | Yes |
.data | Initialized read-write data | Yes |
.bss | Uninitialized data (zeroed) | Yes (as zeros) |
.symtab | Symbol table | No (stripped in release) |
.strtab | Symbol name strings | No (stripped in release) |
.rela.* | Relocation entries | No (consumed by linker) |
.debug_* | Debug information | No (stripped in release) |
.eh_frame | Exception handling | Yes |
Production executables often remove debug symbols and symbol tables using strip. This reduces file size and provides some obfuscation. However, .dynsym (dynamic symbol table) must be preserved for shared library function calls to work.
Mastering object file analysis tools is essential for systems programming, debugging, and security research. Here's your toolkit:
The Swiss Army knife for ELF files. Platform-specific but comprehensive:
1234567891011121314151617181920
# File headerreadelf -h binary # Section headersreadelf -S binary # Symbol tablereadelf -s binary # Relocationsreadelf -r binary # Program headers (segments)readelf -l binary # Dynamic section (shared libraries)readelf -d binary # All informationreadelf -a binaryDisassembler and object file dumper. Works with various formats:
1234567891011121314151617
# Disassemble .text sectionobjdump -d binary # Disassemble with source (if debug info available)objdump -S binary # All headersobjdump -x binary # Section contents in hexobjdump -s binary # Disassemble all sectionsobjdump -D binary # Display relocationsobjdump -r binaryCompact symbol table viewer:
1234567891011121314
# List all symbolsnm binary # Include undefined symbolsnm -u binary # Show symbol sizesnm -S binary # Demangle C++ namesnm -C binary # Sort by size (find large symbols)nm -S --size-sort binaryObject files are the critical intermediaries between compilation and execution. Their structured format enables the modular compilation and linking model that makes large software development practical.
.text for code, .data/.bss for variables, .rodata for constants, plus many auxiliary sections.What's next:
With a solid understanding of object files, we're ready to explore linking—the process that combines multiple object files into a single executable. The next page covers static and dynamic linking in detail, explaining how symbols are resolved and how shared libraries work.
You now understand object file structure at a deep level—ELF format, sections, symbol tables, and relocations. This knowledge is foundational for understanding linking, loading, and ultimately how the operating system creates and manages processes.