Data Structures & AlgorithmsCall Stack & Execution Contexts

Call Stack & Execution Contexts

LevelIntermediate

Duration60 mins

TopicCall Stack & Execution Contexts

3 / 4

Stack Frames and Local Variables

The Private Workspace of Every Function

Every time a function is called, it enters a world of its own. It has parameters passed to it, local variables it creates, and computations it performs—all somehow isolated from every other function currently on the call stack. How does a function executing deep in a call chain not corrupt the local variables of functions above it? How can recursive functions—calling themselves—maintain separate copies of their variables for each invocation?

The answer is the stack frame (also called an activation record or call frame). A stack frame is the structured block of memory allocated on the stack for each function invocation. It contains everything the function needs: parameters, local variables, saved registers, and metadata for returning control. Understanding stack frames is understanding how functions truly work.

What You Will Learn

By the end of this page, you will understand the precise structure and layout of stack frames, how local variables are allocated and accessed, how the frame pointer and stack pointer work together, and why stack-allocated memory is so remarkably fast. This knowledge is essential for debugging, security analysis, and systems programming.

What Is a Stack Frame?

A stack frame is a contiguous block of memory on the stack that represents a single function invocation. It's the function's private workspace—containing all the data that function needs to execute independently.

Conceptual model:

Think of the call stack as a tower of blocks, where each block is a stack frame:

┌─────────────────────────────┐
│  Stack Frame: main()        │  ← Oldest (first called)
├─────────────────────────────┤
│  Stack Frame: processData() │
├─────────────────────────────┤
│  Stack Frame: calculate()   │  ← Newest (currently executing)
└─────────────────────────────┘

Each frame is independent. When calculate() accesses its local variable x, it touches memory within its own frame—not the memory of processData() or main(). This isolation is what makes function calls predictable and composable.

Contents of a Stack Frame

•Return Address — Where to resume execution in the caller after this function returns (stored by CALL instruction).
•Saved Frame Pointer — The previous function's base pointer, enabling unwinding (if frame pointer is used).
•Saved Registers — Values of callee-saved registers that this function will modify (must restore before returning).
•Local Variables — All variables declared inside the function, allocated within the frame.
•Function Arguments — Parameters passed to the function (when passed on stack rather than registers).
•Temporary Values — Intermediate computation results and spilled registers.
•Alignment Padding — Extra bytes to maintain required memory alignment (e.g., 16-byte alignment).

Stack Frame Layout: The Anatomy

Let's examine the precise layout of a stack frame. The layout varies by platform and compiler, but the x86-64 System V ABI provides a reference model:

Stack Frame Structure (System V AMD64):

      Higher Memory Addresses
    ┌───────────────────────────┐
    │   Caller's Stack Frame    │
    ├───────────────────────────┤
    │   Argument 7+ (if any)    │  ← Arguments beyond register capacity
    ├───────────────────────────┤
    │      Return Address       │  ← Pushed by CALL instruction
    ├───────────────────────────┤
    │   Saved RBP (optional)    │  ← Old frame pointer (if used)
    ├───────────────────────────┤  ← RBP points here (frame base)
    │    Saved Registers        │  ← Callee-saved registers
    ├───────────────────────────┤
    │    Local Variable 1       │
    │    Local Variable 2       │
    │    Local Variable 3       │
    │         ...               │  ← Local storage area
    ├───────────────────────────┤
    │  Temporary/Spill Space    │
    ├───────────────────────────┤
    │   Alignment Padding       │
    ├───────────────────────────┤  ← RSP points here (stack top)
    │   (Next frame will go     │
    │    below this point)      │
      Lower Memory Addresses

Key observations:

The stack grows downward — Newer data is at lower addresses. Pushing decrements RSP.
Return address is always at a known offset — It's immediately above the saved RBP (or at a fixed offset from RSP).
Local variables are at negative offsets from RBP — If RBP is the frame base, local variables are at [RBP - 8], [RBP - 16], etc.
Arguments passed on stack are at positive offsets — They're in the caller's frame, at [RBP + 16] and higher.
Everything is at fixed, compile-time-known offsets — The compiler knows exactly where each variable lives relative to the frame pointers.

Why This Layout Matters

Debuggers use this layout to navigate stacks. Security researchers examine it to understand vulnerabilities. Systems programmers need it for low-level code. When you see a stack trace or analyze a crash dump, you're looking at information derived from these frame structures.

The Frame Pointer: A Stable Reference

The frame pointer (RBP on x86-64, also called base pointer) points to a fixed location within the current stack frame, providing a stable reference point even as the stack pointer moves during function execution.

Why do we need it?

The stack pointer (RSP) changes throughout a function's execution—pushing saves, making space for temporaries, aligning for calls. If local variables were accessed only via RSP, the compiler would need to track every RSP change and adjust offsets. The frame pointer provides stability:

RBP points to a fixed spot in the frame (typically the saved RBP or just above local variables)
Local variables are at fixed negative offsets from RBP
Arguments are at fixed positive offsets from RBP
RSP can change freely without affecting variable access

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
;; Standard function prologue (with frame pointer)
my_function:
    push rbp          ; Save caller's frame pointer
    mov rbp, rsp      ; Set our frame pointer to current stack top
    sub rsp, 32       ; Allocate 32 bytes for local variables
    
    ;; Now the frame layout is:
    ;; [rbp + 16]  = first stack argument (if any)
    ;; [rbp + 8]   = return address (pushed by CALL)
    ;; [rbp + 0]   = saved RBP (pushed by us)
    ;; [rbp - 8]   = first local variable
    ;; [rbp - 16]  = second local variable
    ;; [rbp - 24]  = third local variable
    ;; [rbp - 32]  = fourth local variable
    ;; [rsp]       = current stack top (same as [rbp - 32])
    
    ;; Access local variables:
    mov [rbp - 8], rdi      ; Store first argument in local var 1
    mov rax, [rbp - 16]     ; Load local var 2 into rax
    
    ;; Epilogue
    mov rsp, rbp      ; Deallocate locals (RSP = RBP)
    pop rbp           ; Restore caller's frame pointer
    ret

Frame pointer omission:

Modern optimizing compilers often eliminate the frame pointer to free up RBP as a general-purpose register. This is called frame pointer omission (FPO). When FPO is enabled:

All accesses use RSP with adjusted offsets
The compiler tracks RSP changes throughout the function
One more register is available for optimization
Debugging becomes harder (stack traces require debug info)

Production code often uses FPO for performance; debug builds often preserve the frame pointer for easier debugging.

Local Variable Allocation

Local variables are the most common use of stack frame space. Understanding how they're allocated and accessed reveals why stack-based memory is so efficient.

Allocation mechanism:

Local variables are allocated by adjusting the stack pointer. No malloc, no heap traversal, no fragmentation concerns—just decrement RSP by the total size needed:

void example() {
    int a;           // 4 bytes
    double b;        // 8 bytes  
    char buffer[32]; // 32 bytes
    int c;           // 4 bytes
}
// Total: possibly 48+ bytes, aligned

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
;; For a function with: int a, double b, char buffer[32], int c
 
example:
    push rbp
    mov rbp, rsp
    sub rsp, 64         ; Allocate space (48 bytes + padding for alignment)
    
    ;; Variable layout (compiler may reorder for alignment):
    ;; [rbp - 4]     = int c (4 bytes)
    ;; [rbp - 8]     = padding (4 bytes, for alignment of double)
    ;; [rbp - 16]    = double b (8 bytes, 8-byte aligned)
    ;; [rbp - 48]    = char buffer[32] (32 bytes)
    ;; [rbp - 52]    = int a (4 bytes)
    ;; [rbp - 64]    = padding (for 16-byte stack alignment)
    
    ;; Access examples:
    mov dword [rbp - 52], 42        ; a = 42
    movsd qword [rbp - 16], xmm0    ; b = (some double from xmm0)
    lea rdi, [rbp - 48]             ; rdi = &buffer[0]
    mov dword [rbp - 4], 100        ; c = 100

Local Variable Access Characteristics
Characteristic	Stack (Local Variables)	Heap (Dynamic Allocation)
Allocation Speed	O(1) - just pointer arithmetic	O(?) - malloc traverses free lists
Deallocation Speed	O(1) - automatic on return	O(1) or O(?) - free may coalesce
Fragmentation	None - stack is linear	Possible - depends on allocation patterns
Cache Behavior	Excellent - sequential, temporal locality	Varies - scattered across heap
Lifetime	Automatic - exists while function runs	Manual - exists until explicitly freed
Thread Safety	Each thread has own stack	Shared heap requires synchronization

Stack Allocation ≠ Unlimited

While stack allocation is fast and automatic, stack size is limited (typically 1-8 MB per thread). Allocating large arrays locally or deeply recursive functions can exhaust the stack. For large data, use heap allocation despite the overhead.

Variable Lifetime and Scope

The stack frame perfectly embodies the relationship between scope (where a name is visible) and lifetime (when memory exists). For stack-allocated variables, these align exactly: the variable exists as long as the frame exists, and the frame exists as long as the function is executing.

Lifetime begins when the function is called:

When a function is invoked, its stack frame is created. Local variables now have memory allocated—but they're uninitialized (containing whatever garbage was at those memory locations).

Lifetime ends when the function returns:

When a function returns, its frame is popped. The stack pointer moves back up, and that memory is now available for the next function call. The local variables don't "exist" anymore—the memory might still contain their values, but it could be overwritten at any moment.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// DANGER: Returning a pointer to a local variable
int* bad_function() {
    int local_value = 42;
    return &local_value;  // Returns address within stack frame
}
// When bad_function returns, its frame is gone.
// The returned pointer points to memory that will be
// overwritten by the next function call!
 
void caller() {
    int* ptr = bad_function();  // ptr points to "dead" stack memory
    printf("%d\n", *ptr);       // Might print 42... or garbage!
    
    some_other_function();      // This function's frame overwrites the memory
    printf("%d\n", *ptr);       // Almost certainly garbage now!
}

Stack-Use-After-Return Bug

Returning a pointer or reference to a local variable is a classic bug. The pointer is syntactically valid but semantically broken—pointing to memory that no longer holds the intended value. Modern compilers warn about this, and sanitizers can detect it at runtime.

Block-scoped variables:

In languages with block scope (C, C++, Rust), variables declared inside blocks (like if or for) have scope limited to that block. Historically, their stack space was allocated and deallocated with the block. Modern compilers typically allocate all local storage in the prologue and reuse space for non-overlapping scopes:

void example() {
    {
        int x = 1;  // Block 1 scope
        use(x);
    }
    {
        int y = 2;  // Block 2 scope
        use(y);     // x and y might share the same stack slot!
    }
}

This optimization reduces maximum stack usage without affecting semantics.

Recursion: Multiple Frames of the Same Function

Recursion is where the stack frame concept truly shines. When a function calls itself, a new stack frame is created for each invocation. Each frame has its own independent copy of local variables, enabling the function to maintain distinct state at each recursion level.

Recursive factorial example:

int factorial(int n) {
    if (n <= 1) return 1;
    return n * factorial(n - 1);
}

Calling factorial(4) creates 4 stack frames, each with its own n:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
=== Call: factorial(4) ===
Stack:
┌─────────────────────────────┐
│ [caller's frame]            │
├─────────────────────────────┤
│ [factorial frame #1]        │  ← n = 4
│   return addr → caller      │
│   local n = 4               │
└─────────────────────────────┘
 
=== Call: factorial(3) ===
Stack:
┌─────────────────────────────┐
│ [caller's frame]            │
├─────────────────────────────┤
│ [factorial frame #1]        │  n = 4 (still exists!)
├─────────────────────────────┤
│ [factorial frame #2]        │  ← n = 3
│   return addr → frame #1    │
│   local n = 3               │
└─────────────────────────────┘
 
=== Call: factorial(2) ===
Stack:
┌─────────────────────────────┐
│ [caller's frame]            │
├─────────────────────────────┤
│ [factorial frame #1]        │  n = 4
├─────────────────────────────┤
│ [factorial frame #2]        │  n = 3
├─────────────────────────────┤
│ [factorial frame #3]        │  ← n = 2
│   return addr → frame #2    │
│   local n = 2               │
└─────────────────────────────┘
 
=== Call: factorial(1) ===
Stack:
┌─────────────────────────────┐
│ [caller's frame]            │
├─────────────────────────────┤
│ [factorial frame #1]        │  n = 4
├─────────────────────────────┤
│ [factorial frame #2]        │  n = 3
├─────────────────────────────┤
│ [factorial frame #3]        │  n = 2
├─────────────────────────────┤
│ [factorial frame #4]        │  ← n = 1, base case!
│   return addr → frame #3    │     Returns 1
│   local n = 1               │
└─────────────────────────────┘
 
=== Returning ===
Frame #4 returns 1 → Frame #3 computes 2 * 1 = 2
Frame #3 returns 2 → Frame #2 computes 3 * 2 = 6
Frame #2 returns 6 → Frame #1 computes 4 * 6 = 24
Frame #1 returns 24 → Original caller receives 24

Key insight:

Each frame's n is at the same offset from its own frame pointer, but at a different absolute address. Frame #1's n (value 4) is at address X. Frame #2's n (value 3) is at address X-64 (or whatever the frame size is). The code is identical, but it operates on different memory.

This is why recursion "just works"—the stack provides automatic per-call storage. Without the stack, recursion would require manual management of multiple copies of state, which was exactly the problem before stacks were invented.

Stack Frame Walking: Traversing the Call Chain

Stack walking (or stack crawling) is the process of traversing the call stack from the current frame back to the original caller. This is used by debuggers, exception handlers, profilers, and crash reporters. Understanding how it works reveals why the frame pointer is valuable.

Walking with frame pointers:

When each frame saves the previous frame's RBP, the saved RBP values form a linked list:

Current Frame:
  RBP → [Saved RBP of caller] → [Saved RBP of caller's caller] → ...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Simplified stack walking (with frame pointers preserved)
void print_stack_trace() {
    // Get current frame pointer
    void** frame_pointer;
    asm("mov %%rbp, %0" : "=r" (frame_pointer));
    
    while (frame_pointer != NULL) {
        // Return address is at [RBP + 8]
        void* return_address = *(frame_pointer + 1);
        
        // Look up function name from return address
        // (in practice, uses debug symbols or DWARF info)
        printf("  at %p\n", return_address);
        
        // Previous frame's RBP is at [RBP + 0]
        frame_pointer = (void**)*frame_pointer;
        
        // Safety: stop at some sentinel or after N frames
    }
}
 
// The chain of RBP values:
// Current frame's RBP points to:
//   [0]: Caller's saved RBP
//   [1]: Return address to caller
// Caller's saved RBP points to:
//   [0]: Caller's caller's saved RBP
//   [1]: Return address to caller's caller
// ... and so on up the stack

Walking without frame pointers (FPO):

When frame pointers are omitted, walking the stack requires unwind information—metadata describing how to find the previous frame from each instruction address. Formats like DWARF (on Unix/Linux) or .pdata/.xdata (on Windows) encode:

How much RSP has been adjusted at each point
Which callee-saved registers were pushed and where
How to recover the return address

This metadata allows precise unwinding but requires debug info or exception handling tables. Without this information, stack walking becomes unreliable or impossible—which is why production crashes without debug symbols often show incomplete stack traces.

Practical Application

When you see a stack trace in an exception or debugger, you're seeing the result of stack walking. Each line represents a frame, discovered by following the chain of saved frame pointers (or using unwind tables). Understanding this helps you interpret stack traces accurately.

Stack Alignment Requirements

Modern CPUs and ABIs enforce stack alignment requirements. On x86-64, the stack must be 16-byte aligned before a CALL instruction. This affects how stack frames are laid out and how much padding is added.

Why alignment matters:

SIMD Instructions: SSE/AVX operations often require 16-byte or 32-byte aligned operands. Misaligned memory access either crashes or incurs severe performance penalties.
CPU Optimization: Aligned memory accesses can be faster, crossing fewer cache line boundaries.
ABI Compliance: The calling convention guarantees alignment. Functions rely on this guarantee to safely use aligned instructions.

The 16-byte alignment rule (System V AMD64):

The ABI requires that RSP be 16-byte aligned when CALL is executed. After CALL (which pushes an 8-byte return address), RSP becomes 8-byte aligned. A standard prologue that pushes RBP (another 8 bytes) restores 16-byte alignment:

Before CALL: RSP = 0x...XXX0 (16-byte aligned)
After CALL:  RSP = 0x...XXX8 (8-byte aligned, return address pushed)
After PUSH RBP: RSP = 0x...XXX0 (16-byte aligned again)

If additional local space is allocated, the compiler ensures the total adjustment maintains proper alignment.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
;; Function needing 20 bytes of locals
;; Must pad to maintain 16-byte alignment
 
my_function:
    push rbp              ; RSP now 16-byte aligned
    mov rbp, rsp
    sub rsp, 32           ; Allocate 32 bytes (rounds up 20 to next 16)
    
    ;; [rbp - 20] through [rbp - 1]: actual local storage (20 bytes)
    ;; [rbp - 32] through [rbp - 21]: padding (12 bytes)
    ;; RSP is at [rbp - 32], which is 16-byte aligned
    
    ;; If we call another function, RSP is correctly aligned
    call some_other_function  ; RSP is 16-byte aligned before this
    
    mov rsp, rbp
    pop rbp
    ret

Alignment Violations

Calling functions with a misaligned stack (on platforms requiring alignment) causes crashes or corruption. On Linux/macOS, calling a function with non-16-byte-aligned RSP may crash when SSE instructions are used. Assembly programmers must carefully maintain alignment; the compiler handles this automatically for high-level code.

Stack Frames Across Languages

While the fundamental concept of stack frames is universal, different language implementations have variations in how they structure and use frames.

Stack Frame Characteristics by Language
Language	Frame Implementation	Notable Features
C/C++	Direct machine stack frames	Exact control, can access frames via inline assembly, object addresses can be on stack
Java	JVM operand stack + frame	Each frame has operand stack, local variable array, and frame data; GC-managed object references
Python	PyFrameObject (heap-allocated)	Frames are Python objects on heap; enables introspection and modification at runtime
JavaScript	Engine-dependent frames	V8 uses optimized frames; closures may capture frame data; async alters frame semantics
Go	Segmented/copied stacks	Small initial stack grows dynamically; goroutines enable millions of concurrent stacks
Rust	Machine frames with ownership	Compiler tracks ownership through frames; no runtime overhead but strict compile-time rules

Closures and captured variables:

In languages with closures (JavaScript, Python, Rust, etc.), local variables captured by a closure must outlive the function's stack frame. This requires "escaping" the variable to heap storage:

function outer() {
    let x = 42;  // Normally would be on stack
    return function inner() {
        return x; // Captures x - it must live beyond outer's return
    };
}
let fn = outer();  // outer's frame is gone, but x must survive!
fn();  // Returns 42 - x was moved to heap

The compiler/runtime detects closure captures and allocates the variable on the heap instead of the stack, or copies it to a closure object. This is called variable hoisting or escape analysis.

Summary: Stack Frames as Function Contexts

We've thoroughly explored stack frames—the structured memory contexts that make functions work. This knowledge connects hardware, compilers, and your code.

Key Takeaways

•A stack frame is a function's private workspace — It contains return address, saved registers, local variables, and everything the function needs to execute independently.
•Frame pointer provides stable access — Local variables are at fixed offsets from RBP, unaffected by stack pointer changes during execution.
•Local variable allocation is O(1) — Just subtract from RSP. No malloc, no fragmentation, no garbage collection delay.
•Lifetime matches scope — Stack variables exist exactly when the function executes. Returning pointers to locals is a critical bug.
•Recursion works via multiple independent frames — Each call gets its own frame with its own local variables, enabling recursive algorithms naturally.
•Frame pointers enable stack walking — Debuggers and exception handlers traverse the linked list of saved RBP values to reconstruct call chains.
•Alignment is required, not optional — Modern ABIs mandate stack alignment (typically 16-byte). Violations cause crashes or corruption.

What's next:

Now that we understand stack frames and local variables, the final page of this module explores what happens when things go wrong—stack overflow errors. We'll examine why stacks have limited size, how overflow occurs, what symptoms to look for, and how to prevent or handle these critical errors in your code.

Page Complete

You now understand stack frames as the structured contexts that make function-based programming work. From return addresses to local variables, from recursion to stack walking, the stack frame is the unsung hero of program execution. Next, we'll explore the dark side—what happens when the stack runs out of space.

3 / 4

Loading learning content...

Data Structures & AlgorithmsCall Stack & Execution Contexts

Call Stack & Execution Contexts

LevelIntermediate

Duration60 mins

TopicCall Stack & Execution Contexts

3 / 4

Stack Frames and Local Variables

The Private Workspace of Every Function

What You Will Learn

What Is a Stack Frame?

Conceptual model:

Think of the call stack as a tower of blocks, where each block is a stack frame:

┌─────────────────────────────┐
│  Stack Frame: main()        │  ← Oldest (first called)
├─────────────────────────────┤
│  Stack Frame: processData() │
├─────────────────────────────┤
│  Stack Frame: calculate()   │  ← Newest (currently executing)
└─────────────────────────────┘

Contents of a Stack Frame

•Return Address — Where to resume execution in the caller after this function returns (stored by CALL instruction).
•Saved Frame Pointer — The previous function's base pointer, enabling unwinding (if frame pointer is used).
•Saved Registers — Values of callee-saved registers that this function will modify (must restore before returning).
•Local Variables — All variables declared inside the function, allocated within the frame.
•Function Arguments — Parameters passed to the function (when passed on stack rather than registers).
•Temporary Values — Intermediate computation results and spilled registers.
•Alignment Padding — Extra bytes to maintain required memory alignment (e.g., 16-byte alignment).

Stack Frame Layout: The Anatomy

Let's examine the precise layout of a stack frame. The layout varies by platform and compiler, but the x86-64 System V ABI provides a reference model:

Stack Frame Structure (System V AMD64):

      Higher Memory Addresses
    ┌───────────────────────────┐
    │   Caller's Stack Frame    │
    ├───────────────────────────┤
    │   Argument 7+ (if any)    │  ← Arguments beyond register capacity
    ├───────────────────────────┤
    │      Return Address       │  ← Pushed by CALL instruction
    ├───────────────────────────┤
    │   Saved RBP (optional)    │  ← Old frame pointer (if used)
    ├───────────────────────────┤  ← RBP points here (frame base)
    │    Saved Registers        │  ← Callee-saved registers
    ├───────────────────────────┤
    │    Local Variable 1       │
    │    Local Variable 2       │
    │    Local Variable 3       │
    │         ...               │  ← Local storage area
    ├───────────────────────────┤
    │  Temporary/Spill Space    │
    ├───────────────────────────┤
    │   Alignment Padding       │
    ├───────────────────────────┤  ← RSP points here (stack top)
    │   (Next frame will go     │
    │    below this point)      │
      Lower Memory Addresses

Key observations:

The stack grows downward — Newer data is at lower addresses. Pushing decrements RSP.
Return address is always at a known offset — It's immediately above the saved RBP (or at a fixed offset from RSP).
Local variables are at negative offsets from RBP — If RBP is the frame base, local variables are at [RBP - 8], [RBP - 16], etc.
Arguments passed on stack are at positive offsets — They're in the caller's frame, at [RBP + 16] and higher.
Everything is at fixed, compile-time-known offsets — The compiler knows exactly where each variable lives relative to the frame pointers.

Why This Layout Matters

The Frame Pointer: A Stable Reference

Why do we need it?

RBP points to a fixed spot in the frame (typically the saved RBP or just above local variables)
Local variables are at fixed negative offsets from RBP
Arguments are at fixed positive offsets from RBP
RSP can change freely without affecting variable access

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
;; Standard function prologue (with frame pointer)
my_function:
    push rbp          ; Save caller's frame pointer
    mov rbp, rsp      ; Set our frame pointer to current stack top
    sub rsp, 32       ; Allocate 32 bytes for local variables
    
    ;; Now the frame layout is:
    ;; [rbp + 16]  = first stack argument (if any)
    ;; [rbp + 8]   = return address (pushed by CALL)
    ;; [rbp + 0]   = saved RBP (pushed by us)
    ;; [rbp - 8]   = first local variable
    ;; [rbp - 16]  = second local variable
    ;; [rbp - 24]  = third local variable
    ;; [rbp - 32]  = fourth local variable
    ;; [rsp]       = current stack top (same as [rbp - 32])
    
    ;; Access local variables:
    mov [rbp - 8], rdi      ; Store first argument in local var 1
    mov rax, [rbp - 16]     ; Load local var 2 into rax
    
    ;; Epilogue
    mov rsp, rbp      ; Deallocate locals (RSP = RBP)
    pop rbp           ; Restore caller's frame pointer
    ret

Frame pointer omission:

Modern optimizing compilers often eliminate the frame pointer to free up RBP as a general-purpose register. This is called frame pointer omission (FPO). When FPO is enabled:

All accesses use RSP with adjusted offsets
The compiler tracks RSP changes throughout the function
One more register is available for optimization
Debugging becomes harder (stack traces require debug info)

Production code often uses FPO for performance; debug builds often preserve the frame pointer for easier debugging.

Local Variable Allocation

Local variables are the most common use of stack frame space. Understanding how they're allocated and accessed reveals why stack-based memory is so efficient.

Allocation mechanism:

Local variables are allocated by adjusting the stack pointer. No malloc, no heap traversal, no fragmentation concerns—just decrement RSP by the total size needed:

void example() {
    int a;           // 4 bytes
    double b;        // 8 bytes  
    char buffer[32]; // 32 bytes
    int c;           // 4 bytes
}
// Total: possibly 48+ bytes, aligned

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
;; For a function with: int a, double b, char buffer[32], int c
 
example:
    push rbp
    mov rbp, rsp
    sub rsp, 64         ; Allocate space (48 bytes + padding for alignment)
    
    ;; Variable layout (compiler may reorder for alignment):
    ;; [rbp - 4]     = int c (4 bytes)
    ;; [rbp - 8]     = padding (4 bytes, for alignment of double)
    ;; [rbp - 16]    = double b (8 bytes, 8-byte aligned)
    ;; [rbp - 48]    = char buffer[32] (32 bytes)
    ;; [rbp - 52]    = int a (4 bytes)
    ;; [rbp - 64]    = padding (for 16-byte stack alignment)
    
    ;; Access examples:
    mov dword [rbp - 52], 42        ; a = 42
    movsd qword [rbp - 16], xmm0    ; b = (some double from xmm0)
    lea rdi, [rbp - 48]             ; rdi = &buffer[0]
    mov dword [rbp - 4], 100        ; c = 100

Local Variable Access Characteristics
Characteristic	Stack (Local Variables)	Heap (Dynamic Allocation)
Allocation Speed	O(1) - just pointer arithmetic	O(?) - malloc traverses free lists
Deallocation Speed	O(1) - automatic on return	O(1) or O(?) - free may coalesce
Fragmentation	None - stack is linear	Possible - depends on allocation patterns
Cache Behavior	Excellent - sequential, temporal locality	Varies - scattered across heap
Lifetime	Automatic - exists while function runs	Manual - exists until explicitly freed
Thread Safety	Each thread has own stack	Shared heap requires synchronization

Stack Allocation ≠ Unlimited

Variable Lifetime and Scope

Lifetime begins when the function is called:

When a function is invoked, its stack frame is created. Local variables now have memory allocated—but they're uninitialized (containing whatever garbage was at those memory locations).

Lifetime ends when the function returns:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// DANGER: Returning a pointer to a local variable
int* bad_function() {
    int local_value = 42;
    return &local_value;  // Returns address within stack frame
}
// When bad_function returns, its frame is gone.
// The returned pointer points to memory that will be
// overwritten by the next function call!
 
void caller() {
    int* ptr = bad_function();  // ptr points to "dead" stack memory
    printf("%d\n", *ptr);       // Might print 42... or garbage!
    
    some_other_function();      // This function's frame overwrites the memory
    printf("%d\n", *ptr);       // Almost certainly garbage now!
}

Stack-Use-After-Return Bug

Block-scoped variables:

void example() {
    {
        int x = 1;  // Block 1 scope
        use(x);
    }
    {
        int y = 2;  // Block 2 scope
        use(y);     // x and y might share the same stack slot!
    }
}

This optimization reduces maximum stack usage without affecting semantics.

Recursion: Multiple Frames of the Same Function

Recursive factorial example:

int factorial(int n) {
    if (n <= 1) return 1;
    return n * factorial(n - 1);
}

Calling factorial(4) creates 4 stack frames, each with its own n:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
=== Call: factorial(4) ===
Stack:
┌─────────────────────────────┐
│ [caller's frame]            │
├─────────────────────────────┤
│ [factorial frame #1]        │  ← n = 4
│   return addr → caller      │
│   local n = 4               │
└─────────────────────────────┘
 
=== Call: factorial(3) ===
Stack:
┌─────────────────────────────┐
│ [caller's frame]            │
├─────────────────────────────┤
│ [factorial frame #1]        │  n = 4 (still exists!)
├─────────────────────────────┤
│ [factorial frame #2]        │  ← n = 3
│   return addr → frame #1    │
│   local n = 3               │
└─────────────────────────────┘
 
=== Call: factorial(2) ===
Stack:
┌─────────────────────────────┐
│ [caller's frame]            │
├─────────────────────────────┤
│ [factorial frame #1]        │  n = 4
├─────────────────────────────┤
│ [factorial frame #2]        │  n = 3
├─────────────────────────────┤
│ [factorial frame #3]        │  ← n = 2
│   return addr → frame #2    │
│   local n = 2               │
└─────────────────────────────┘
 
=== Call: factorial(1) ===
Stack:
┌─────────────────────────────┐
│ [caller's frame]            │
├─────────────────────────────┤
│ [factorial frame #1]        │  n = 4
├─────────────────────────────┤
│ [factorial frame #2]        │  n = 3
├─────────────────────────────┤
│ [factorial frame #3]        │  n = 2
├─────────────────────────────┤
│ [factorial frame #4]        │  ← n = 1, base case!
│   return addr → frame #3    │     Returns 1
│   local n = 1               │
└─────────────────────────────┘
 
=== Returning ===
Frame #4 returns 1 → Frame #3 computes 2 * 1 = 2
Frame #3 returns 2 → Frame #2 computes 3 * 2 = 6
Frame #2 returns 6 → Frame #1 computes 4 * 6 = 24
Frame #1 returns 24 → Original caller receives 24

Key insight:

Stack Frame Walking: Traversing the Call Chain

Walking with frame pointers:

When each frame saves the previous frame's RBP, the saved RBP values form a linked list:

Current Frame:
  RBP → [Saved RBP of caller] → [Saved RBP of caller's caller] → ...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Simplified stack walking (with frame pointers preserved)
void print_stack_trace() {
    // Get current frame pointer
    void** frame_pointer;
    asm("mov %%rbp, %0" : "=r" (frame_pointer));
    
    while (frame_pointer != NULL) {
        // Return address is at [RBP + 8]
        void* return_address = *(frame_pointer + 1);
        
        // Look up function name from return address
        // (in practice, uses debug symbols or DWARF info)
        printf("  at %p\n", return_address);
        
        // Previous frame's RBP is at [RBP + 0]
        frame_pointer = (void**)*frame_pointer;
        
        // Safety: stop at some sentinel or after N frames
    }
}
 
// The chain of RBP values:
// Current frame's RBP points to:
//   [0]: Caller's saved RBP
//   [1]: Return address to caller
// Caller's saved RBP points to:
//   [0]: Caller's caller's saved RBP
//   [1]: Return address to caller's caller
// ... and so on up the stack

Walking without frame pointers (FPO):

How much RSP has been adjusted at each point
Which callee-saved registers were pushed and where
How to recover the return address

Practical Application

Stack Alignment Requirements

Why alignment matters:

SIMD Instructions: SSE/AVX operations often require 16-byte or 32-byte aligned operands. Misaligned memory access either crashes or incurs severe performance penalties.
CPU Optimization: Aligned memory accesses can be faster, crossing fewer cache line boundaries.
ABI Compliance: The calling convention guarantees alignment. Functions rely on this guarantee to safely use aligned instructions.

The 16-byte alignment rule (System V AMD64):

Before CALL: RSP = 0x...XXX0 (16-byte aligned)
After CALL:  RSP = 0x...XXX8 (8-byte aligned, return address pushed)
After PUSH RBP: RSP = 0x...XXX0 (16-byte aligned again)

If additional local space is allocated, the compiler ensures the total adjustment maintains proper alignment.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
;; Function needing 20 bytes of locals
;; Must pad to maintain 16-byte alignment
 
my_function:
    push rbp              ; RSP now 16-byte aligned
    mov rbp, rsp
    sub rsp, 32           ; Allocate 32 bytes (rounds up 20 to next 16)
    
    ;; [rbp - 20] through [rbp - 1]: actual local storage (20 bytes)
    ;; [rbp - 32] through [rbp - 21]: padding (12 bytes)
    ;; RSP is at [rbp - 32], which is 16-byte aligned
    
    ;; If we call another function, RSP is correctly aligned
    call some_other_function  ; RSP is 16-byte aligned before this
    
    mov rsp, rbp
    pop rbp
    ret

Alignment Violations

Stack Frames Across Languages

While the fundamental concept of stack frames is universal, different language implementations have variations in how they structure and use frames.

Stack Frame Characteristics by Language
Language	Frame Implementation	Notable Features
C/C++	Direct machine stack frames	Exact control, can access frames via inline assembly, object addresses can be on stack
Java	JVM operand stack + frame	Each frame has operand stack, local variable array, and frame data; GC-managed object references
Python	PyFrameObject (heap-allocated)	Frames are Python objects on heap; enables introspection and modification at runtime
JavaScript	Engine-dependent frames	V8 uses optimized frames; closures may capture frame data; async alters frame semantics
Go	Segmented/copied stacks	Small initial stack grows dynamically; goroutines enable millions of concurrent stacks
Rust	Machine frames with ownership	Compiler tracks ownership through frames; no runtime overhead but strict compile-time rules

Closures and captured variables:

In languages with closures (JavaScript, Python, Rust, etc.), local variables captured by a closure must outlive the function's stack frame. This requires "escaping" the variable to heap storage:

function outer() {
    let x = 42;  // Normally would be on stack
    return function inner() {
        return x; // Captures x - it must live beyond outer's return
    };
}
let fn = outer();  // outer's frame is gone, but x must survive!
fn();  // Returns 42 - x was moved to heap

The compiler/runtime detects closure captures and allocates the variable on the heap instead of the stack, or copies it to a closure object. This is called variable hoisting or escape analysis.

Summary: Stack Frames as Function Contexts

We've thoroughly explored stack frames—the structured memory contexts that make functions work. This knowledge connects hardware, compilers, and your code.

Key Takeaways

•A stack frame is a function's private workspace — It contains return address, saved registers, local variables, and everything the function needs to execute independently.
•Frame pointer provides stable access — Local variables are at fixed offsets from RBP, unaffected by stack pointer changes during execution.
•Local variable allocation is O(1) — Just subtract from RSP. No malloc, no fragmentation, no garbage collection delay.
•Lifetime matches scope — Stack variables exist exactly when the function executes. Returning pointers to locals is a critical bug.
•Recursion works via multiple independent frames — Each call gets its own frame with its own local variables, enabling recursive algorithms naturally.
•Frame pointers enable stack walking — Debuggers and exception handlers traverse the linked list of saved RBP values to reconstruct call chains.
•Alignment is required, not optional — Modern ABIs mandate stack alignment (typically 16-byte). Violations cause crashes or corruption.

What's next:

Page Complete

3 / 4