Operating SystemsMemory Management Fundamentals

Linkers and Loaders

LevelIntermediate

Duration90 mins

TopicMemory Management Fundamentals

1 / 5

Compilation Process

From Human Thought to Machine Execution

Every line of code you write exists in a form that CPUs cannot directly understand. Processors speak the language of binary instructions—sequences of zeros and ones representing operations on registers and memory. The compilation process is the elaborate transformation that bridges human-readable source code and machine-executable binaries.

This transformation is not a simple translation. It involves multiple sophisticated stages, each performing specific analyses and transformations that progressively convert high-level abstractions into low-level machine code. Understanding this process is fundamental to operating systems because it directly determines how programs are organized in memory, how they reference code and data, and how the operating system ultimately loads and executes them.

What You Will Learn

By the end of this page, you will understand the complete compilation pipeline—preprocessing, compilation proper, assembly, and assembly into object code. You'll grasp how each stage contributes to the final executable, why each stage exists, and how this knowledge impacts memory management and linking.

The Big Picture: Source to Execution

Before examining each stage in detail, let's establish the complete picture. When you invoke a compiler like gcc or clang on a C source file, it doesn't directly produce an executable. Instead, the compilation system executes four distinct phases:

Preprocessing — Textual manipulation (macro expansion, file inclusion)
Compilation — Translation to assembly language
Assembly — Conversion to machine code (object file)
Linking — Combining object files into an executable

This modular design is intentional. Each phase has a well-defined input and output, enabling tools to be developed independently and allowing developers to inspect intermediate results for debugging or optimization.

Compilation Phases Overview
Phase	Input	Output	Tool	Purpose
Preprocessing	Source code (`.c`, `.cpp`)	Preprocessed source (`.i`)	`cpp` (C Preprocessor)	Macro expansion, includes, conditional compilation
Compilation	Preprocessed source (`.i`)	Assembly (`.s`)	`cc1` (Compiler proper)	Syntax analysis, optimization, code generation
Assembly	Assembly (`.s`)	Object code (`.o`)	`as` (Assembler)	Convert mnemonics to machine code
Linking	Object files (`.o`)	Executable	`ld` (Linker)	Resolve symbols, combine into executable

Separate Compilation Model

This pipeline enables separate compilation—each source file can be compiled independently into an object file. Only at the linking stage are all pieces combined. This dramatically improves build times for large projects, as unchanged files don't need recompilation.

Preprocessing: Textual Transformation

The preprocessing stage operates purely at the textual level. The preprocessor (historically cpp for C) knows nothing about the language's syntax or semantics—it performs pattern-matching and text substitution based on preprocessor directives.

Key operations performed during preprocessing:

Macro Expansion: Replaces macro invocations with their definitions
File Inclusion: Inserts the contents of header files
Conditional Compilation: Includes or excludes code based on conditions
Line Control: Generates line number information for error messages
Pragma Processing: Handles compiler-specific directives

example.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// example.c - Before preprocessing
#include <stdio.h>
 
#define MAX_BUFFER 1024
#define SQUARE(x) ((x) * (x))
 
#ifdef DEBUG
    #define LOG(msg) printf("[DEBUG] %s
", msg)
#else
    #define LOG(msg) ((void)0)
#endif
 
int main() {
    char buffer[MAX_BUFFER];
    int result = SQUARE(5);
    LOG("Computation complete");
    return 0;
}

After preprocessing (with -E flag to see preprocessed output), the file transforms dramatically. The #include <stdio.h> expands to thousands of lines from the stdio header and its transitive includes. All macros are expanded inline:

example.i (preprocessed, simplified)
1
2
3
4
5
6
7
8
9
// ... thousands of lines from stdio.h ...
// ... type definitions, function declarations ...
 
int main() {
    char buffer[1024];              // MAX_BUFFER replaced
    int result = ((5) * (5));       // SQUARE(5) expanded
    ((void)0);                      // LOG expanded (DEBUG not defined)
    return 0;
}

Inspecting Preprocessor Output

Use gcc -E source.c -o source.i to see the preprocessed file. This is invaluable for debugging macro-related issues. The output can be enormous due to header expansion—a simple 10-line program might produce 50,000+ lines after preprocessing.

Header guards and pragma once:

Because the preprocessor performs literal text inclusion, the same header might be included multiple times (through different paths), causing redefinition errors. Header guards prevent this:

myheader.h
1
2
3
4
5
6
7
8
9
10
11
12
#ifndef MYHEADER_H    // If not already defined
#define MYHEADER_H    // Define the guard macro
 
// Header contents here
typedef struct {
    int x;
    int y;
} Point;
 
void process_point(Point* p);
 
#endif  // MYHEADER_H

Modern compilers also support #pragma once as a simpler alternative, though it's not part of the C standard. The preprocessor's textual nature has important implications:

No type checking: Macros have no type safety
No scope: Macros affect everything after their definition
Side effects: SQUARE(x++) would evaluate x++ twice

Understanding these limitations is crucial for writing robust code and debugging preprocessor-related issues.

Compilation Proper: From Source to Assembly

The compilation proper stage is where the real linguistic transformation occurs. The compiler takes preprocessed source code and produces assembly language—a human-readable representation of machine instructions. This stage itself involves multiple sophisticated sub-phases:

Frontend phases:

Lexical Analysis — Breaking source into tokens
Syntax Analysis — Building a parse tree
Semantic Analysis — Type checking, scope resolution

Middle-end phases: 4. Intermediate Representation (IR) — Platform-independent code form 5. Optimization — Improving code efficiency

Backend phases: 6. Code Generation — Producing assembly for target architecture 7. Target-specific Optimization — Architecture-specific improvements

Converting Mermaid diagram...

Lexical Analysis (Tokenization)

The lexer (or scanner) reads the character stream and groups characters into tokens—the fundamental units of the language. Each token has a type and value:

Lexical Analysis Example
1
2
3
4
5
6
7
8
9
// Source code
int count = 42;
 
// Token stream produced by lexer:
// TOKEN_KEYWORD_INT    : "int"
// TOKEN_IDENTIFIER     : "count"
// TOKEN_ASSIGN         : "="
// TOKEN_INTEGER_LITERAL: "42"
// TOKEN_SEMICOLON      : ";"

Syntax Analysis (Parsing)

The parser consumes the token stream and constructs an Abstract Syntax Tree (AST)—a hierarchical representation of the program's grammatical structure. The AST captures the essential structure while discarding syntactic noise like parentheses and semicolons.

AST Example
// Expression: a + b * c
 
// AST structure (+ has lower precedence than *)
//       (+)
//      /   \
//    (a)   (*)
//         /   \
//       (b)   (c)
 
// This correctly represents that b*c is computed first

Semantic Analysis

Semantic analysis adds meaning to the syntax tree. This phase performs:

Type Checking: Ensuring operations are type-compatible
Scope Resolution: Matching identifiers to declarations
Declaration Checking: Verifying variables are declared before use
Type Coercion: Inserting implicit conversions where needed

After semantic analysis, the compiler has a complete understanding of the program's meaning and can detect errors like:

Using an undeclared variable
Type mismatches in assignments
Invalid function call arguments
Duplicate declarations in the same scope

Frontend vs Backend Errors

Errors detected in the frontend (lexer, parser, semantic analyzer) are compile-time errors that prevent code generation. These include syntax errors, type mismatches, and undefined references. Understanding this distinction helps in debugging: if the compiler rejects your code, the issue is in these early phases.

Intermediate Representation and Optimization

Modern compilers don't translate directly from AST to assembly. Instead, they use an Intermediate Representation (IR)—a platform-independent code format that serves as the target for optimization. Popular IRs include:

LLVM IR: Used by Clang, Rust, Swift
GCC GIMPLE/RTL: Used by GCC
JVM Bytecode: Used by Java compilers

The IR provides a uniform representation that allows optimizations to be written once and applied regardless of source language or target architecture.

LLVM IR Example
1
2
3
4
5
6
7
8
9
10
11
12
; Original C code:
; int square(int x) { return x * x; }
 
define i32 @square(i32 %x) {
entry:
  %mul = mul nsw i32 %x, %x
  ret i32 %mul
}
 
; LLVM IR uses typed registers (%x, %mul)
; Instructions are in SSA (Static Single Assignment) form
; Each register is assigned exactly once

Optimization Techniques

The optimizer transforms the IR to produce faster and/or smaller code. Optimizations operate at different levels:

Local optimizations (within a basic block):

Constant folding: Evaluate 3 + 4 at compile time to 7
Strength reduction: Replace x * 2 with x << 1
Dead code elimination: Remove unreachable code

Global optimizations (across basic blocks):

Common subexpression elimination: Compute a + b once if used multiple times
Loop-invariant code motion: Move invariant computations outside loops
Inline expansion: Replace function calls with function body

Interprocedural optimizations (across functions):

Function inlining: Insert function body at call sites
Devirtualization: Replace virtual calls with direct calls
Link-time optimization (LTO): Optimize across compilation units

Before Optimization
1
2
3
4
5
6
7
8
9
10
11
int example(int n) {
    int x = 3 + 4;      // Constant expression
    int y = x * 2;      // Multiply by power of 2
    int sum = 0;
    for (int i = 0; i < n; i++) {
        int base = 10;  // Loop invariant
        sum += base * i;
    }
    int unused = 42;    // Dead code
    return sum + y;
}

After Optimization
1
2
3
4
5
6
7
8
9
10
11
int example(int n) {
    // x = 7 (constant folded)
    // y = 14 (7 * 2, constant folded)
    int sum = 0;
    // base = 10 hoisted outside loop
    for (int i = 0; i < n; i++) {
        sum += 10 * i;  // Or: sum += (i << 1) + (i << 3)
    }
    // unused eliminated
    return sum + 14;
}

Optimization Levels

Compilers provide optimization flags: -O0 (no optimization, fast compile), -O1 (basic optimizations), -O2 (most optimizations), -O3 (aggressive including vectorization), -Os (optimize for size). Higher levels improve performance but increase compile time and may make debugging harder.

Code Generation: Producing Assembly

The code generator transforms optimized IR into assembly language for the target architecture. This phase must handle:

Instruction Selection: Choosing target instructions for IR operations
Register Allocation: Mapping virtual registers to physical registers
Instruction Scheduling: Ordering instructions to minimize pipeline stalls
Calling Convention Compliance: Following ABI for function calls

The output is assembly language—a human-readable representation of machine instructions.

x86-64 Assembly Output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
; C source: int add(int a, int b) { return a + b; }
; Compiled with gcc -S -O2 for x86-64
 
    .text
    .globl  add
    .type   add, @function
add:
    .cfi_startproc
    leal    (%rdi,%rsi), %eax    ; Compute a + b using LEA
    ret                           ; Return (result in %eax)
    .cfi_endproc
    .size   add, .-add
 
; Note: Arguments passed in %edi (a) and %esi (b) per System V ABI
; Result returned in %eax
; LEA (Load Effective Address) performs addition without affecting flags

Register Allocation

One of the most challenging aspects of code generation is register allocation. The IR uses unlimited virtual registers, but physical CPUs have limited registers (x86-64 has 16 general-purpose registers). The allocator must:

Compute live ranges — When is each value needed?
Build interference graph — Which values can't share registers?
Color the graph — Assign registers minimizing conflicts
Handle spills — Store values to memory when registers exhausted

Poor register allocation forces values to be stored in memory ("spilled"), dramatically slowing execution since memory access is 100x slower than register access.

x86-64 General Purpose Registers
Register	64-bit	32-bit	16-bit	8-bit low	Purpose/Convention
RAX	%rax	%eax	%ax	%al	Return value, accumulator
RBX	%rbx	%ebx	%bx	%bl	Callee-saved, base pointer
RCX	%rcx	%ecx	%cx	%cl	4th argument, counter
RDX	%rdx	%edx	%dx	%dl	3rd argument, I/O
RSI	%rsi	%esi	%si	%sil	2nd argument, source index
RDI	%rdi	%edi	%di	%dil	1st argument, dest index
RSP	%rsp	%esp	%sp	%spl	Stack pointer (reserved)
RBP	%rbp	%ebp	%bp	%bpl	Base pointer, callee-saved

ABI Compliance

The Application Binary Interface (ABI) defines calling conventions, register usage, and stack layout. On Linux x86-64, the System V ABI specifies that the first six integer arguments go in RDI, RSI, RDX, RCX, R8, R9. Understanding the ABI is essential for debugging, writing assembly, or interfacing with libraries.

Assembly Stage: From Mnemonics to Machine Code

The assembler (as on Unix systems) translates assembly language into machine code, producing an object file. This transformation is relatively straightforward—mostly a one-to-one mapping of mnemonics to binary instruction encodings—but involves several important operations:

Instruction Encoding: Converting mnemonics to binary opcodes
Symbol Table Creation: Recording addresses of labels and functions
Relocation Entry Generation: Marking addresses that need adjustment
Section Organization: Grouping code, data, and metadata

Assembly to Machine Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
; Assembly instruction
mov $42, %eax        ; Move immediate 42 into EAX
 
; Binary encoding (little-endian x86-64)
; B8 2A 00 00 00
; ^^ ^^^^^^^^^^
; |  42 in little-endian (0x0000002A)
; Opcode for "mov imm32, eax"
 
; The assembler performs this translation for every instruction
; and produces the final object file with:
; - .text section: machine code
; - .data section: initialized data
; - .bss section: uninitialized data
; - .symtab: symbol table
; - .rela.*: relocation entries

Symbol Table

The assembler creates a symbol table recording every label, function, and global variable defined or referenced in the file. Each entry contains:

Name: The symbol's identifier
Value: The current address or offset
Size: For data symbols, the size in bytes
Binding: Local (file-scoped) or Global (visible to linker)
Type: Function, Object (data), Section, etc.
Section: Which section contains this symbol

Symbols can be defined (have an address in this file) or undefined (reference external symbols to be resolved during linking).

Symbol Table Example (nm output)
1
2
3
4
5
6
7
8
9
10
11
$ nm example.o
0000000000000000 T main          ; Defined in .text, global
                 U printf        ; Undefined, external reference
0000000000000000 D global_var    ; Defined in .data, global
0000000000000004 d local_static  ; Defined in .data, local
0000000000000000 B uninitialized ; Defined in .bss, global
 
# Symbol types:
# T/t = text (code), D/d = data, B/b = BSS
# Uppercase = global, lowercase = local
# U = undefined (external reference)

Relocation Entries

When the assembler encounters a reference to a symbol whose address isn't yet known (like an external function or a global variable in the same file at a location not yet settled), it generates a relocation entry. This entry tells the linker:

Where in the code the address reference occurs
Which symbol is being referenced
How to compute the final address (relocation type)

Relocation allows object files to be compiled independently without knowing the final memory layout.

Object Files Are Not Executable

Object files (.o) contain machine code but cannot be executed directly. They have unresolved symbols (external references) and contain addresses that assume the code starts at address 0. The linker must combine object files, resolve symbols, and perform relocations to create an executable.

Complete Example: Tracing the Pipeline

Let's trace a simple program through the entire compilation process to solidify understanding. We'll use GCC with flags to observe each stage.

hello.c
1
2
3
4
5
6
7
8
9
#include <stdio.h>
 
#define GREETING "Hello, World!"
 
int main() {
    printf("%s
", GREETING);
    return 0;
}

Compilation Commands
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Step 1: Preprocessing only
gcc -E hello.c -o hello.i
# hello.i contains expanded stdio.h and GREETING macro
 
# Step 2: Compilation to assembly
gcc -S hello.i -o hello.s
# hello.s contains x86-64 assembly
 
# Step 3: Assembly to object file
gcc -c hello.s -o hello.o
# hello.o is an ELF object file
 
# Step 4: Linking to executable
gcc hello.o -o hello
# hello is the final executable
 
# Or all at once:
gcc hello.c -o hello

hello.s (Assembly Output, simplified)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
    .file   "hello.c"
    .section    .rodata
.LC0:
    .string "Hello, World!"     ; GREETING expanded
    .text
    .globl  main
    .type   main, @function
main:
    pushq   %rbp                ; Save frame pointer
    movq    %rsp, %rbp          ; Set up stack frame
    leaq    .LC0(%rip), %rdi    ; Load address of string
    call    puts@PLT            ; Call puts (printf optimized)
    movl    $0, %eax            ; Return 0
    popq    %rbp                ; Restore frame pointer
    ret
    .size   main, .-main

Compiler Optimizations in Action

Notice that GCC replaced printf("%s ", str) with puts(str). This is a common optimization—puts is simpler and faster for printing strings with newlines. The compiler recognized the pattern and substituted a more efficient function.

Examining the Object File

We can use tools to inspect the generated object file:

Object File Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ objdump -d hello.o
hello.o:     file format elf64-x86-64
 
Disassembly of section .text:
 
0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi  # Address TBD
   b:   e8 00 00 00 00          call   10 <main+0x10>  # Needs relocation
  10:   b8 00 00 00 00          mov    $0x0,%eax
  15:   5d                      pop    %rbp
  16:   c3                      ret
 
$ readelf -r hello.o
Relocation section '.rela.text' at offset 0x1c0 contains 2 entries:
  Offset          Type           Symbol
000000000007   R_X86_64_PC32   .rodata-0x4    ; Address of string
00000000000c   R_X86_64_PLT32  puts-0x4       ; Call to puts

Summary: The Compilation Pipeline

The compilation process is a sophisticated pipeline that transforms human-readable source code into machine-executable object files. Each stage has a precise purpose:

Key Takeaways

•Preprocessing performs textual substitution—macro expansion, file inclusion, conditional compilation—without understanding the language's semantics.
•Compilation proper parses the source, checks types, optimizes the IR, and generates assembly language for the target architecture.
•Assembly converts human-readable mnemonics to binary machine code, creating the object file with symbol tables and relocation entries.
•Object files are incomplete—they contain unresolved symbols and relocatable addresses that the linker must fix.
•Modern compilers use IR (like LLVM IR) to enable sophisticated optimizations independent of source language or target architecture.
•Understanding this pipeline is essential for debugging, performance analysis, and comprehending how programs interact with memory.

What's next:

Now that we understand how individual source files are compiled into object files, the next page explores object files in detail—their structure, formats (ELF, COFF, Mach-O), sections, and the metadata they contain that enables linking and loading.

Page Complete

You now understand the complete compilation process—from preprocessing through assembly. This knowledge is foundational for understanding how the operating system manages programs in memory and how linking resolves the relationships between separately-compiled modules.

1 / 5

Loading learning content...

Operating SystemsMemory Management Fundamentals

Linkers and Loaders

LevelIntermediate

Duration90 mins

TopicMemory Management Fundamentals

1 / 5

Compilation Process

From Human Thought to Machine Execution

What You Will Learn

The Big Picture: Source to Execution

Preprocessing — Textual manipulation (macro expansion, file inclusion)
Compilation — Translation to assembly language
Assembly — Conversion to machine code (object file)
Linking — Combining object files into an executable

Compilation Phases Overview
Phase	Input	Output	Tool	Purpose
Preprocessing	Source code (`.c`, `.cpp`)	Preprocessed source (`.i`)	`cpp` (C Preprocessor)	Macro expansion, includes, conditional compilation
Compilation	Preprocessed source (`.i`)	Assembly (`.s`)	`cc1` (Compiler proper)	Syntax analysis, optimization, code generation
Assembly	Assembly (`.s`)	Object code (`.o`)	`as` (Assembler)	Convert mnemonics to machine code
Linking	Object files (`.o`)	Executable	`ld` (Linker)	Resolve symbols, combine into executable

Separate Compilation Model

Preprocessing: Textual Transformation

Key operations performed during preprocessing:

Macro Expansion: Replaces macro invocations with their definitions
File Inclusion: Inserts the contents of header files
Conditional Compilation: Includes or excludes code based on conditions
Line Control: Generates line number information for error messages
Pragma Processing: Handles compiler-specific directives

example.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// example.c - Before preprocessing
#include <stdio.h>
 
#define MAX_BUFFER 1024
#define SQUARE(x) ((x) * (x))
 
#ifdef DEBUG
    #define LOG(msg) printf("[DEBUG] %s
", msg)
#else
    #define LOG(msg) ((void)0)
#endif
 
int main() {
    char buffer[MAX_BUFFER];
    int result = SQUARE(5);
    LOG("Computation complete");
    return 0;
}

example.i (preprocessed, simplified)
1
2
3
4
5
6
7
8
9
// ... thousands of lines from stdio.h ...
// ... type definitions, function declarations ...
 
int main() {
    char buffer[1024];              // MAX_BUFFER replaced
    int result = ((5) * (5));       // SQUARE(5) expanded
    ((void)0);                      // LOG expanded (DEBUG not defined)
    return 0;
}

Inspecting Preprocessor Output

Header guards and pragma once:

Because the preprocessor performs literal text inclusion, the same header might be included multiple times (through different paths), causing redefinition errors. Header guards prevent this:

myheader.h
1
2
3
4
5
6
7
8
9
10
11
12
#ifndef MYHEADER_H    // If not already defined
#define MYHEADER_H    // Define the guard macro
 
// Header contents here
typedef struct {
    int x;
    int y;
} Point;
 
void process_point(Point* p);
 
#endif  // MYHEADER_H

Modern compilers also support #pragma once as a simpler alternative, though it's not part of the C standard. The preprocessor's textual nature has important implications:

No type checking: Macros have no type safety
No scope: Macros affect everything after their definition
Side effects: SQUARE(x++) would evaluate x++ twice

Understanding these limitations is crucial for writing robust code and debugging preprocessor-related issues.

Compilation Proper: From Source to Assembly

Frontend phases:

Lexical Analysis — Breaking source into tokens
Syntax Analysis — Building a parse tree
Semantic Analysis — Type checking, scope resolution

Middle-end phases: 4. Intermediate Representation (IR) — Platform-independent code form 5. Optimization — Improving code efficiency

Backend phases: 6. Code Generation — Producing assembly for target architecture 7. Target-specific Optimization — Architecture-specific improvements

Converting Mermaid diagram...

Lexical Analysis (Tokenization)

The lexer (or scanner) reads the character stream and groups characters into tokens—the fundamental units of the language. Each token has a type and value:

Lexical Analysis Example
1
2
3
4
5
6
7
8
9
// Source code
int count = 42;
 
// Token stream produced by lexer:
// TOKEN_KEYWORD_INT    : "int"
// TOKEN_IDENTIFIER     : "count"
// TOKEN_ASSIGN         : "="
// TOKEN_INTEGER_LITERAL: "42"
// TOKEN_SEMICOLON      : ";"

Syntax Analysis (Parsing)

AST Example
// Expression: a + b * c
 
// AST structure (+ has lower precedence than *)
//       (+)
//      /   \
//    (a)   (*)
//         /   \
//       (b)   (c)
 
// This correctly represents that b*c is computed first

Semantic Analysis

Semantic analysis adds meaning to the syntax tree. This phase performs:

Type Checking: Ensuring operations are type-compatible
Scope Resolution: Matching identifiers to declarations
Declaration Checking: Verifying variables are declared before use
Type Coercion: Inserting implicit conversions where needed

After semantic analysis, the compiler has a complete understanding of the program's meaning and can detect errors like:

Using an undeclared variable
Type mismatches in assignments
Invalid function call arguments
Duplicate declarations in the same scope

Frontend vs Backend Errors

Intermediate Representation and Optimization

LLVM IR: Used by Clang, Rust, Swift
GCC GIMPLE/RTL: Used by GCC
JVM Bytecode: Used by Java compilers

The IR provides a uniform representation that allows optimizations to be written once and applied regardless of source language or target architecture.

LLVM IR Example
1
2
3
4
5
6
7
8
9
10
11
12
; Original C code:
; int square(int x) { return x * x; }
 
define i32 @square(i32 %x) {
entry:
  %mul = mul nsw i32 %x, %x
  ret i32 %mul
}
 
; LLVM IR uses typed registers (%x, %mul)
; Instructions are in SSA (Static Single Assignment) form
; Each register is assigned exactly once

Optimization Techniques

The optimizer transforms the IR to produce faster and/or smaller code. Optimizations operate at different levels:

Local optimizations (within a basic block):

Constant folding: Evaluate 3 + 4 at compile time to 7
Strength reduction: Replace x * 2 with x << 1
Dead code elimination: Remove unreachable code

Global optimizations (across basic blocks):

Common subexpression elimination: Compute a + b once if used multiple times
Loop-invariant code motion: Move invariant computations outside loops
Inline expansion: Replace function calls with function body

Interprocedural optimizations (across functions):

Function inlining: Insert function body at call sites
Devirtualization: Replace virtual calls with direct calls
Link-time optimization (LTO): Optimize across compilation units

Before Optimization
1
2
3
4
5
6
7
8
9
10
11
int example(int n) {
    int x = 3 + 4;      // Constant expression
    int y = x * 2;      // Multiply by power of 2
    int sum = 0;
    for (int i = 0; i < n; i++) {
        int base = 10;  // Loop invariant
        sum += base * i;
    }
    int unused = 42;    // Dead code
    return sum + y;
}

After Optimization
1
2
3
4
5
6
7
8
9
10
11
int example(int n) {
    // x = 7 (constant folded)
    // y = 14 (7 * 2, constant folded)
    int sum = 0;
    // base = 10 hoisted outside loop
    for (int i = 0; i < n; i++) {
        sum += 10 * i;  // Or: sum += (i << 1) + (i << 3)
    }
    // unused eliminated
    return sum + 14;
}

Optimization Levels

Code Generation: Producing Assembly

The code generator transforms optimized IR into assembly language for the target architecture. This phase must handle:

Instruction Selection: Choosing target instructions for IR operations
Register Allocation: Mapping virtual registers to physical registers
Instruction Scheduling: Ordering instructions to minimize pipeline stalls
Calling Convention Compliance: Following ABI for function calls

The output is assembly language—a human-readable representation of machine instructions.

x86-64 Assembly Output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
; C source: int add(int a, int b) { return a + b; }
; Compiled with gcc -S -O2 for x86-64
 
    .text
    .globl  add
    .type   add, @function
add:
    .cfi_startproc
    leal    (%rdi,%rsi), %eax    ; Compute a + b using LEA
    ret                           ; Return (result in %eax)
    .cfi_endproc
    .size   add, .-add
 
; Note: Arguments passed in %edi (a) and %esi (b) per System V ABI
; Result returned in %eax
; LEA (Load Effective Address) performs addition without affecting flags

Register Allocation

Compute live ranges — When is each value needed?
Build interference graph — Which values can't share registers?
Color the graph — Assign registers minimizing conflicts
Handle spills — Store values to memory when registers exhausted

Poor register allocation forces values to be stored in memory ("spilled"), dramatically slowing execution since memory access is 100x slower than register access.

x86-64 General Purpose Registers
Register	64-bit	32-bit	16-bit	8-bit low	Purpose/Convention
RAX	%rax	%eax	%ax	%al	Return value, accumulator
RBX	%rbx	%ebx	%bx	%bl	Callee-saved, base pointer
RCX	%rcx	%ecx	%cx	%cl	4th argument, counter
RDX	%rdx	%edx	%dx	%dl	3rd argument, I/O
RSI	%rsi	%esi	%si	%sil	2nd argument, source index
RDI	%rdi	%edi	%di	%dil	1st argument, dest index
RSP	%rsp	%esp	%sp	%spl	Stack pointer (reserved)
RBP	%rbp	%ebp	%bp	%bpl	Base pointer, callee-saved

ABI Compliance

Assembly Stage: From Mnemonics to Machine Code

Instruction Encoding: Converting mnemonics to binary opcodes
Symbol Table Creation: Recording addresses of labels and functions
Relocation Entry Generation: Marking addresses that need adjustment
Section Organization: Grouping code, data, and metadata

Assembly to Machine Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
; Assembly instruction
mov $42, %eax        ; Move immediate 42 into EAX
 
; Binary encoding (little-endian x86-64)
; B8 2A 00 00 00
; ^^ ^^^^^^^^^^
; |  42 in little-endian (0x0000002A)
; Opcode for "mov imm32, eax"
 
; The assembler performs this translation for every instruction
; and produces the final object file with:
; - .text section: machine code
; - .data section: initialized data
; - .bss section: uninitialized data
; - .symtab: symbol table
; - .rela.*: relocation entries

Symbol Table

The assembler creates a symbol table recording every label, function, and global variable defined or referenced in the file. Each entry contains:

Name: The symbol's identifier
Value: The current address or offset
Size: For data symbols, the size in bytes
Binding: Local (file-scoped) or Global (visible to linker)
Type: Function, Object (data), Section, etc.
Section: Which section contains this symbol

Symbols can be defined (have an address in this file) or undefined (reference external symbols to be resolved during linking).

Symbol Table Example (nm output)
1
2
3
4
5
6
7
8
9
10
11
$ nm example.o
0000000000000000 T main          ; Defined in .text, global
                 U printf        ; Undefined, external reference
0000000000000000 D global_var    ; Defined in .data, global
0000000000000004 d local_static  ; Defined in .data, local
0000000000000000 B uninitialized ; Defined in .bss, global
 
# Symbol types:
# T/t = text (code), D/d = data, B/b = BSS
# Uppercase = global, lowercase = local
# U = undefined (external reference)

Relocation Entries

Where in the code the address reference occurs
Which symbol is being referenced
How to compute the final address (relocation type)

Relocation allows object files to be compiled independently without knowing the final memory layout.

Object Files Are Not Executable

Complete Example: Tracing the Pipeline

Let's trace a simple program through the entire compilation process to solidify understanding. We'll use GCC with flags to observe each stage.

hello.c
1
2
3
4
5
6
7
8
9
#include <stdio.h>
 
#define GREETING "Hello, World!"
 
int main() {
    printf("%s
", GREETING);
    return 0;
}

Compilation Commands
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Step 1: Preprocessing only
gcc -E hello.c -o hello.i
# hello.i contains expanded stdio.h and GREETING macro
 
# Step 2: Compilation to assembly
gcc -S hello.i -o hello.s
# hello.s contains x86-64 assembly
 
# Step 3: Assembly to object file
gcc -c hello.s -o hello.o
# hello.o is an ELF object file
 
# Step 4: Linking to executable
gcc hello.o -o hello
# hello is the final executable
 
# Or all at once:
gcc hello.c -o hello

hello.s (Assembly Output, simplified)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
    .file   "hello.c"
    .section    .rodata
.LC0:
    .string "Hello, World!"     ; GREETING expanded
    .text
    .globl  main
    .type   main, @function
main:
    pushq   %rbp                ; Save frame pointer
    movq    %rsp, %rbp          ; Set up stack frame
    leaq    .LC0(%rip), %rdi    ; Load address of string
    call    puts@PLT            ; Call puts (printf optimized)
    movl    $0, %eax            ; Return 0
    popq    %rbp                ; Restore frame pointer
    ret
    .size   main, .-main

Compiler Optimizations in Action

Examining the Object File

We can use tools to inspect the generated object file:

Object File Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ objdump -d hello.o
hello.o:     file format elf64-x86-64
 
Disassembly of section .text:
 
0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi  # Address TBD
   b:   e8 00 00 00 00          call   10 <main+0x10>  # Needs relocation
  10:   b8 00 00 00 00          mov    $0x0,%eax
  15:   5d                      pop    %rbp
  16:   c3                      ret
 
$ readelf -r hello.o
Relocation section '.rela.text' at offset 0x1c0 contains 2 entries:
  Offset          Type           Symbol
000000000007   R_X86_64_PC32   .rodata-0x4    ; Address of string
00000000000c   R_X86_64_PLT32  puts-0x4       ; Call to puts

Summary: The Compilation Pipeline

The compilation process is a sophisticated pipeline that transforms human-readable source code into machine-executable object files. Each stage has a precise purpose:

Key Takeaways

•Preprocessing performs textual substitution—macro expansion, file inclusion, conditional compilation—without understanding the language's semantics.
•Compilation proper parses the source, checks types, optimizes the IR, and generates assembly language for the target architecture.
•Assembly converts human-readable mnemonics to binary machine code, creating the object file with symbol tables and relocation entries.
•Object files are incomplete—they contain unresolved symbols and relocatable addresses that the linker must fix.
•Modern compilers use IR (like LLVM IR) to enable sophisticated optimizations independent of source language or target architecture.
•Understanding this pipeline is essential for debugging, performance analysis, and comprehending how programs interact with memory.

What's next:

Page Complete

1 / 5