Data Structures & AlgorithmsCall Stack & Execution Contexts

Call Stack & Execution Contexts

LevelIntermediate

Duration60 mins

TopicCall Stack & Execution Contexts

2 / 4

Function Call and Return Mechanism

The Dance of Call and Return

When you write result = calculateTotal(items), something remarkable happens behind the scenes. Your program must:

Pause what it's doing and remember where to resume
Transfer control to an entirely different location in code
Pass data (the items parameter) to that new location
Execute the function's code
Return to exactly where it left off
Retrieve the result and continue as if nothing complex happened

This seamless transition happens millions of times per second in running programs. The mechanism that enables it is the call stack, and understanding its precise operation transforms you from a developer who uses functions to one who truly understands them.

What You Will Learn

By the end of this page, you will understand the exact sequence of operations that occur during a function call and return. You'll know how arguments are passed, how return addresses are saved, how execution context is preserved, and how all this information is managed on the stack—knowledge that illuminates debugging, performance optimization, and low-level programming.

The Function Call Problem

Before examining the solution, let's understand the problem deeply. For a function call to work, the system must solve several non-trivial challenges:

Challenge 1: Saving the Return Address

When you call a function, you leave one location in code and jump to another. But you need to come back. Where do you come back to? The instruction immediately after the call. This "return address" must be saved somewhere before the jump—and the stack is that somewhere.

Challenge 2: Passing Arguments

Functions need input. How does add(3, 5) get the values 3 and 5 to the add function? These values must be placed somewhere the function can find them. Different calling conventions solve this differently (registers, stack, or both).

Challenge 3: Providing Return Value

The function computes a result. How does it give that result back to the caller? There must be a designated location—often a specific register—where the return value is placed.

Challenge 4: Nested Calls

Functions call other functions which call other functions. Each level must independently handle all the above challenges. This is where the stack's LIFO nature becomes essential—each call adds its own context, and each return removes it.

Historical Context

Early computers didn't have stacks. Return addresses were often stored in the function itself—making recursion impossible since a second call would overwrite the first return address. The invention of the call stack is what made recursive algorithms practical.

The Call Sequence: Step by Step

Let's trace exactly what happens during a function call. We'll use a simplified model based on x86-64 calling conventions, though the principles apply across architectures.

The Scenario:

void caller() {
    int result = add(3, 5);
    // ... continue with result
}

int add(int a, int b) {
    int sum = a + b;
    return sum;
}

Function Call Sequence

•Argument Preparation — The caller places arguments where the callee expects them. On x86-64/Linux, the first few integer arguments go in registers (RDI, RSI, RDX, RCX, R8, R9). On Windows, it's RCX, RDX, R8, R9. Additional arguments go on the stack.
•CALL Instruction — The CPU executes the CALL instruction which does two things atomically: (a) pushes the return address (the address of the next instruction after CALL) onto the stack, and (b) jumps to the function's entry point.
•Function Prologue — The called function executes its "prologue"—setup code that saves the old base pointer and establishes a new stack frame.
•Function Body Execution — The function executes its actual logic, using its stack frame for local variables and any additional working space needed.
•Return Value Placement — The function places its return value in the designated register (RAX for integers on x86-64).
•Function Epilogue — The function executes its "epilogue"—cleanup code that restores the previous base pointer and stack pointer.
•RET Instruction — The CPU pops the return address from the stack and jumps to that address, resuming the caller.
•Result Retrieval — The caller retrieves the return value from RAX and continues execution.

The Atomicity of CALL

The CALL instruction's atomic push-and-jump is critical. If these were separate operations, an interrupt between them could corrupt the stack or return address. CPU designers recognized this need early and made CALL and RET atomic operations.

Anatomy of the CALL Instruction

The CALL instruction is the heart of function invocation. Let's examine it in detail:

What CALL does (pseudocode):

CALL target_address:
    PUSH (address of next instruction)   ; Save return address
    JMP target_address                    ; Transfer control

In concrete terms:

If the CALL instruction is at memory address 0x1000 and is 5 bytes long, and the target function is at 0x2000:

The return address 0x1005 (next instruction) is pushed onto the stack
The instruction pointer (RIP) is set to 0x2000
Execution continues from 0x2000 (the function entry)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
; Before CALL
; RSP (Stack Pointer) = 0x7FFF1000  (example address)
; RIP (Instruction Pointer) = 0x400500 (at the CALL instruction)
 
; The CALL instruction
0x400500: call 0x400700    ; Call function at 0x400700
 
; What happens:
; 1. RSP decremented by 8:  RSP = 0x7FFF0FF8
; 2. Return address stored: [0x7FFF0FF8] = 0x400505
;    (0x400505 is the address right after the CALL)
; 3. Jump to function:      RIP = 0x400700
 
; After CALL, execution continues at 0x400700 (the function)
; Stack now contains the return address for later retrieval

Variations in CALL:

Direct Call: The target address is encoded in the instruction itself. Example: call 0x400700

Indirect Call: The target address comes from a register or memory location. Example: call rax or call [rbx]. This enables virtual function dispatch, callbacks, and function pointers.

Relative Call: The instruction encodes an offset from the current position rather than an absolute address. This is common for position-independent code (PIC) in shared libraries.

Regardless of the variation, the stack behavior—pushing the return address—remains consistent.

Anatomy of the RET Instruction

The RET (return) instruction is the counterpart to CALL, completing the round trip:

What RET does (pseudocode):

RET:
    POP temp                 ; Remove return address from stack
    JMP temp                 ; Jump to that address

This simple operation undoes what CALL did—pops the saved return address and jumps to it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
; At the end of the function
; RSP = 0x7FFF0FF8  (pointing to saved return address)
; [0x7FFF0FF8] = 0x400505  (the return address from CALL)
 
; The RET instruction
0x400750: ret
 
; What happens:
; 1. Pop return address: temp = [RSP] = 0x400505
; 2. Increment RSP:      RSP = 0x7FFF1000
; 3. Jump to return address: RIP = 0x400505
 
; Execution continues at 0x400505 (instruction after original CALL)
; Stack is restored to its state before the CALL

Symmetry of CALL and RET:

Notice the beautiful symmetry:

CALL	RET
Decrement RSP	Increment RSP
Push return address	Pop return address
Jump to callee	Jump to caller

This symmetry ensures that every CALL is perfectly undone by RET, maintaining stack integrity across arbitrarily deep call chains.

RET Requires Correct Stack State

RET blindly pops whatever is at the top of the stack and jumps there. If the function corrupted the stack—say, by pushing something without popping it—RET will jump to the wrong address, typically causing a crash. This is a common bug in assembly programming and a security concern (return-oriented programming exploits this).

Argument Passing Mechanisms

How arguments travel from caller to callee is defined by calling conventions—agreements about where arguments are placed. Different platforms and languages have different conventions, but they all solve the same problem.

The Three Primary Approaches:

Argument Passing Strategies
Strategy	How It Works	Pros	Cons
Stack-Based	All arguments pushed onto stack before CALL	Simple, unlimited arguments, consistent	Slower due to memory access, stack overhead
Register-Based	Arguments passed in CPU registers	Fast (registers are CPU-internal), no memory access	Limited to available registers, complex spilling
Hybrid	First N arguments in registers, rest on stack	Best of both approaches, common in modern ABIs	More complex calling convention

Common Calling Conventions:

System V AMD64 ABI (Linux, macOS, BSD):

Integer/pointer arguments: RDI, RSI, RDX, RCX, R8, R9 (first 6)
Floating-point arguments: XMM0–XMM7 (first 8)
Additional arguments: pushed right-to-left onto stack
Return value: RAX (integer), XMM0 (floating-point)

Microsoft x64 (Windows):

Integer/pointer arguments: RCX, RDX, R8, R9 (first 4)
Floating-point arguments: XMM0–XMM3 (first 4)
Requires 32-byte "shadow space" on stack
Return value: RAX (integer), XMM0 (floating-point)

cdecl (32-bit x86):

All arguments pushed onto stack, right-to-left
Caller cleans up the stack after call
Return value: EAX

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
;; C function call: result = add(3, 5);
;; Using System V AMD64 ABI
 
;; Caller code:
mov edi, 3           ; First argument (a = 3) in EDI (lower 32 bits of RDI)
mov esi, 5           ; Second argument (b = 5) in ESI (lower 32 bits of RSI)
call add             ; Call the function
mov [result], eax    ; Return value is in EAX
 
;; Callee (add function):
add:
    ;; EDI contains 'a' (3)
    ;; ESI contains 'b' (5)
    lea eax, [edi + esi]   ; EAX = a + b = 8
    ret                     ; Return (result in EAX)

Why Right-to-Left?

When arguments are pushed right-to-left, the first argument ends up at the lowest stack address (nearest the top after all pushes). This allows the callee to find arguments at consistent offsets from the stack pointer, regardless of how many arguments were passed—enabling variadic functions like printf.

Return Values and Result Delivery

Returning a value from a function involves placing it where the caller can find it. This is simpler than argument passing because there's only one return value (in most languages), but there are still important considerations.

Standard Return Value Locations:

Return Value Mechanisms by Type
Value Type	Typical Location	Notes
Integer (up to 64-bit)	RAX register	Fast, direct, most common
Integer (128-bit)	RAX:RDX pair	Lower 64 bits in RAX, upper in RDX
Floating-point	XMM0 register	SSE/AVX registers for float/double
Small struct (≤16 bytes)	RAX:RDX pair or XMM registers	Returned in registers if it fits
Large struct	Caller-allocated memory, pointer passed	Caller provides space; function fills it

The Hidden First Argument:

When a function returns a large struct (too big for registers), the compiler implements a clever workaround:

// What you write:
Struct getBigData() {
    Struct result;
    // ... fill result
    return result;
}

// What the compiler generates (conceptually):
void getBigData(Struct* __hidden_return_ptr) {
    // ... fill *__hidden_return_ptr
    return;
}

The caller allocates space for the return value and passes a pointer as a hidden first argument. The function fills that memory directly. This is called Return Value Optimization (RVO) when the compiler optimizes away intermediate copies.

Performance Implication

Returning by register is fast—no memory access needed. Returning large structs by value incurs memory overhead. When performance matters, consider designing APIs that avoid returning large data structures, or use pointers/references for in-place modification.

Nested Calls and Stack Unwinding

The true power of the call stack emerges with nested function calls. Each call adds a layer to the stack; each return removes a layer. Let's trace through a nested scenario:

Example:

int main() {
    int x = outer(10);
    return x;
}

int outer(int n) {
    int y = inner(n + 5);
    return y * 2;
}

int inner(int m) {
    return m + 1;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
=== State 1: In main(), about to call outer(10) ===
Stack:
┌─────────────────────────────┐
│ [main's stack frame]        │  ← Stack Pointer
│ - return address to OS      │
│ - local variable x          │
└─────────────────────────────┘
 
=== State 2: In outer(), about to call inner(15) ===
Stack:
┌─────────────────────────────┐
│ [main's stack frame]        │
│ - return address to OS      │
│ - local variable x          │
├─────────────────────────────┤
│ [outer's stack frame]       │  ← Stack Pointer
│ - return address to main    │
│ - parameter n = 10          │
│ - local variable y          │
└─────────────────────────────┘
 
=== State 3: In inner(), executing ===
Stack:
┌─────────────────────────────┐
│ [main's stack frame]        │
│ - return address to OS      │
│ - local variable x          │
├─────────────────────────────┤
│ [outer's stack frame]       │
│ - return address to main    │
│ - parameter n = 10          │
│ - local variable y          │
├─────────────────────────────┤
│ [inner's stack frame]       │  ← Stack Pointer
│ - return address to outer   │
│ - parameter m = 15          │
└─────────────────────────────┘
 
=== State 4: inner() returns 16, back in outer() ===
Stack:
┌─────────────────────────────┐  
│ [main's stack frame]        │
│ - return address to OS      │
│ - local variable x          │
├─────────────────────────────┤
│ [outer's stack frame]       │  ← Stack Pointer (restored)
│ - return address to main    │
│ - parameter n = 10          │
│ - local variable y = 16     │
└─────────────────────────────┘
 
=== State 5: outer() returns 32, back in main() ===
Stack:
┌─────────────────────────────┐
│ [main's stack frame]        │  ← Stack Pointer (restored)
│ - return address to OS      │
│ - local variable x = 32     │
└─────────────────────────────┘

Stack Unwinding:

The process of returning through nested calls—popping stack frames in reverse order—is called stack unwinding. This concept becomes especially important for:

Exception handling: When an exception is thrown, the runtime unwinds the stack, searching for a catch handler and calling destructors along the way.
Debugging: Stack traces show unwound call chains, revealing how execution reached the current point.
Profilers: Sampling profilers capture stack frames periodically, then unwind them to attribute time to calling functions.

The LIFO nature of the stack ensures unwinding happens in exactly the right order—most recent call first, then its caller, and so on.

Functions use CPU registers for computation. But when function A calls function B, B will use the same registers. If A had important values in those registers, they'd be overwritten. How is this managed?

The Solution: Dividing Registers into Two Categories

Register Saving Responsibilities (System V AMD64)
Register Type	Saved By	Registers	Use Case
Caller-Saved (Volatile)	Caller, if needed after call	RAX, RCX, RDX, RSI, RDI, R8-R11	Scratch values, temporary computations
Callee-Saved (Non-Volatile)	Callee, if it modifies them	RBX, RBP, R12-R15	Long-lived values that persist across calls

How it works in practice:

Caller-saved registers: The caller assumes these registers may be modified by any function it calls. If the caller needs a value in RAX to persist across a call, it must save that value (typically on the stack) before the call and restore it after.

Callee-saved registers: The callee promises that these registers will have the same values after it returns as they did when it was called. If the callee wants to use RBX, it must save the original RBX value (on the stack) at entry and restore it before returning.

This division minimizes total saving:

If all registers were callee-saved, every function would have to save them all. If all were caller-saved, every caller would have to save everything. The split lets functions that use few registers avoid saving, while functions that need many registers bear the cost of saving only what they modify.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
;; Function that uses callee-saved registers RBX and R12
 
my_function:
    ;; Prologue: save callee-saved registers we'll modify
    push rbx             ; Save RBX
    push r12             ; Save R12
    
    ;; Now we can freely use RBX and R12
    mov rbx, [some_data]
    mov r12, [other_data]
    ;; ... more computation using RBX and R12 ...
    
    ;; Compute result
    add rax, rbx
    
    ;; Epilogue: restore callee-saved registers (reverse order!)
    pop r12              ; Restore R12
    pop rbx              ; Restore RBX
    ret

Order Matters

Notice that registers are restored in reverse order of how they were saved. This is the LIFO principle in action—push A, push B, then pop B, pop A. Getting this wrong corrupts register values and causes mysterious bugs.

The Complete Call/Return Cycle

Let's synthesize everything into a complete picture of what happens during a function call and return. This detailed understanding represents exactly what happens in compiled code.

The Call Sequence

•Caller evaluates arguments
•Arguments placed in registers/stack
•Caller saves needed caller-saved registers
•CALL instruction pushes return address
•CALL instruction jumps to function
•Callee pushes old RBP (if used)
•Callee sets RBP = RSP (frame pointer)
•Callee allocates local variables (RSP -= size)
•Callee saves any callee-saved registers it uses
•Function body executes

The Return Sequence

•Callee places return value in RAX
•Callee restores callee-saved registers
•Callee deallocates locals (RSP += size)
•Callee pops old RBP into RBP
•RET instruction pops return address
•RET instruction jumps to return address
•Caller is now executing again
•Caller restores caller-saved registers
•Caller retrieves return value from RAX
•Caller continues with result

Simplified Model

This is a simplified model. Modern compilers may optimize away the frame pointer (using RSP directly), skip saving registers that aren't used, inline small functions entirely eliminating the call, or reorder operations for performance. But the fundamental logic—push context on call, pop context on return—remains the conceptual foundation.

Summary: The Mechanics of Call and Return

We've examined the precise mechanics that enable functions to work. This knowledge connects abstract understanding to concrete implementation.

Key Takeaways

•CALL pushes return address, RET pops it — This push/pop symmetry is the core mechanism enabling the call stack to track arbitrarily deep nesting.
•Calling conventions standardize argument passing — Different platforms have different rules, but all solve the same problems: where to put arguments, where to find return values.
•Register saving divides responsibility — Caller-saved and callee-saved registers minimize unnecessary saving while ensuring values persist across calls.
•Nested calls create a stack of contexts — Each call adds a frame; each return removes one. The LIFO order ensures correct unwinding.
•Stack unwinding enables exception handling and debugging — Understanding the stack lets you trace, debug, and handle errors effectively.

What's next:

Now that we understand function calls and returns, the next page examines what's actually inside each stack frame. We'll explore stack frames in detail—the structure that holds local variables, saved registers, and everything a function needs to execute independently. This is where the stack becomes a sophisticated memory manager for each function's execution context.

Page Complete

You now understand the precise mechanics of function calls and returns. The CALL and RET instructions, argument passing conventions, register saving rules, and stack unwinding all work together to enable the seamless function invocations you write every day. Next, we'll explore the structure of stack frames themselves.

2 / 4

Loading learning content...

Data Structures & AlgorithmsCall Stack & Execution Contexts

Call Stack & Execution Contexts

LevelIntermediate

Duration60 mins

TopicCall Stack & Execution Contexts

2 / 4

Function Call and Return Mechanism

The Dance of Call and Return

When you write result = calculateTotal(items), something remarkable happens behind the scenes. Your program must:

Pause what it's doing and remember where to resume
Transfer control to an entirely different location in code
Pass data (the items parameter) to that new location
Execute the function's code
Return to exactly where it left off
Retrieve the result and continue as if nothing complex happened

What You Will Learn

The Function Call Problem

Before examining the solution, let's understand the problem deeply. For a function call to work, the system must solve several non-trivial challenges:

Challenge 1: Saving the Return Address

Challenge 2: Passing Arguments

Challenge 3: Providing Return Value

The function computes a result. How does it give that result back to the caller? There must be a designated location—often a specific register—where the return value is placed.

Challenge 4: Nested Calls

Historical Context

The Call Sequence: Step by Step

Let's trace exactly what happens during a function call. We'll use a simplified model based on x86-64 calling conventions, though the principles apply across architectures.

The Scenario:

void caller() {
    int result = add(3, 5);
    // ... continue with result
}

int add(int a, int b) {
    int sum = a + b;
    return sum;
}

Function Call Sequence

•Argument Preparation — The caller places arguments where the callee expects them. On x86-64/Linux, the first few integer arguments go in registers (RDI, RSI, RDX, RCX, R8, R9). On Windows, it's RCX, RDX, R8, R9. Additional arguments go on the stack.
•CALL Instruction — The CPU executes the CALL instruction which does two things atomically: (a) pushes the return address (the address of the next instruction after CALL) onto the stack, and (b) jumps to the function's entry point.
•Function Prologue — The called function executes its "prologue"—setup code that saves the old base pointer and establishes a new stack frame.
•Function Body Execution — The function executes its actual logic, using its stack frame for local variables and any additional working space needed.
•Return Value Placement — The function places its return value in the designated register (RAX for integers on x86-64).
•Function Epilogue — The function executes its "epilogue"—cleanup code that restores the previous base pointer and stack pointer.
•RET Instruction — The CPU pops the return address from the stack and jumps to that address, resuming the caller.
•Result Retrieval — The caller retrieves the return value from RAX and continues execution.

The Atomicity of CALL

Anatomy of the CALL Instruction

The CALL instruction is the heart of function invocation. Let's examine it in detail:

What CALL does (pseudocode):

CALL target_address:
    PUSH (address of next instruction)   ; Save return address
    JMP target_address                    ; Transfer control

In concrete terms:

If the CALL instruction is at memory address 0x1000 and is 5 bytes long, and the target function is at 0x2000:

The return address 0x1005 (next instruction) is pushed onto the stack
The instruction pointer (RIP) is set to 0x2000
Execution continues from 0x2000 (the function entry)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
; Before CALL
; RSP (Stack Pointer) = 0x7FFF1000  (example address)
; RIP (Instruction Pointer) = 0x400500 (at the CALL instruction)
 
; The CALL instruction
0x400500: call 0x400700    ; Call function at 0x400700
 
; What happens:
; 1. RSP decremented by 8:  RSP = 0x7FFF0FF8
; 2. Return address stored: [0x7FFF0FF8] = 0x400505
;    (0x400505 is the address right after the CALL)
; 3. Jump to function:      RIP = 0x400700
 
; After CALL, execution continues at 0x400700 (the function)
; Stack now contains the return address for later retrieval

Variations in CALL:

Direct Call: The target address is encoded in the instruction itself. Example: call 0x400700

Indirect Call: The target address comes from a register or memory location. Example: call rax or call [rbx]. This enables virtual function dispatch, callbacks, and function pointers.

Relative Call: The instruction encodes an offset from the current position rather than an absolute address. This is common for position-independent code (PIC) in shared libraries.

Regardless of the variation, the stack behavior—pushing the return address—remains consistent.

Anatomy of the RET Instruction

The RET (return) instruction is the counterpart to CALL, completing the round trip:

What RET does (pseudocode):

RET:
    POP temp                 ; Remove return address from stack
    JMP temp                 ; Jump to that address

This simple operation undoes what CALL did—pops the saved return address and jumps to it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
; At the end of the function
; RSP = 0x7FFF0FF8  (pointing to saved return address)
; [0x7FFF0FF8] = 0x400505  (the return address from CALL)
 
; The RET instruction
0x400750: ret
 
; What happens:
; 1. Pop return address: temp = [RSP] = 0x400505
; 2. Increment RSP:      RSP = 0x7FFF1000
; 3. Jump to return address: RIP = 0x400505
 
; Execution continues at 0x400505 (instruction after original CALL)
; Stack is restored to its state before the CALL

Symmetry of CALL and RET:

Notice the beautiful symmetry:

CALL	RET
Decrement RSP	Increment RSP
Push return address	Pop return address
Jump to callee	Jump to caller

This symmetry ensures that every CALL is perfectly undone by RET, maintaining stack integrity across arbitrarily deep call chains.

RET Requires Correct Stack State

Argument Passing Mechanisms

The Three Primary Approaches:

Argument Passing Strategies
Strategy	How It Works	Pros	Cons
Stack-Based	All arguments pushed onto stack before CALL	Simple, unlimited arguments, consistent	Slower due to memory access, stack overhead
Register-Based	Arguments passed in CPU registers	Fast (registers are CPU-internal), no memory access	Limited to available registers, complex spilling
Hybrid	First N arguments in registers, rest on stack	Best of both approaches, common in modern ABIs	More complex calling convention

Common Calling Conventions:

System V AMD64 ABI (Linux, macOS, BSD):

Integer/pointer arguments: RDI, RSI, RDX, RCX, R8, R9 (first 6)
Floating-point arguments: XMM0–XMM7 (first 8)
Additional arguments: pushed right-to-left onto stack
Return value: RAX (integer), XMM0 (floating-point)

Microsoft x64 (Windows):

Integer/pointer arguments: RCX, RDX, R8, R9 (first 4)
Floating-point arguments: XMM0–XMM3 (first 4)
Requires 32-byte "shadow space" on stack
Return value: RAX (integer), XMM0 (floating-point)

cdecl (32-bit x86):

All arguments pushed onto stack, right-to-left
Caller cleans up the stack after call
Return value: EAX

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
;; C function call: result = add(3, 5);
;; Using System V AMD64 ABI
 
;; Caller code:
mov edi, 3           ; First argument (a = 3) in EDI (lower 32 bits of RDI)
mov esi, 5           ; Second argument (b = 5) in ESI (lower 32 bits of RSI)
call add             ; Call the function
mov [result], eax    ; Return value is in EAX
 
;; Callee (add function):
add:
    ;; EDI contains 'a' (3)
    ;; ESI contains 'b' (5)
    lea eax, [edi + esi]   ; EAX = a + b = 8
    ret                     ; Return (result in EAX)

Why Right-to-Left?

Return Values and Result Delivery

Standard Return Value Locations:

Return Value Mechanisms by Type
Value Type	Typical Location	Notes
Integer (up to 64-bit)	RAX register	Fast, direct, most common
Integer (128-bit)	RAX:RDX pair	Lower 64 bits in RAX, upper in RDX
Floating-point	XMM0 register	SSE/AVX registers for float/double
Small struct (≤16 bytes)	RAX:RDX pair or XMM registers	Returned in registers if it fits
Large struct	Caller-allocated memory, pointer passed	Caller provides space; function fills it

The Hidden First Argument:

When a function returns a large struct (too big for registers), the compiler implements a clever workaround:

// What you write:
Struct getBigData() {
    Struct result;
    // ... fill result
    return result;
}

// What the compiler generates (conceptually):
void getBigData(Struct* __hidden_return_ptr) {
    // ... fill *__hidden_return_ptr
    return;
}

Performance Implication

Nested Calls and Stack Unwinding

The true power of the call stack emerges with nested function calls. Each call adds a layer to the stack; each return removes a layer. Let's trace through a nested scenario:

Example:

int main() {
    int x = outer(10);
    return x;
}

int outer(int n) {
    int y = inner(n + 5);
    return y * 2;
}

int inner(int m) {
    return m + 1;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
=== State 1: In main(), about to call outer(10) ===
Stack:
┌─────────────────────────────┐
│ [main's stack frame]        │  ← Stack Pointer
│ - return address to OS      │
│ - local variable x          │
└─────────────────────────────┘
 
=== State 2: In outer(), about to call inner(15) ===
Stack:
┌─────────────────────────────┐
│ [main's stack frame]        │
│ - return address to OS      │
│ - local variable x          │
├─────────────────────────────┤
│ [outer's stack frame]       │  ← Stack Pointer
│ - return address to main    │
│ - parameter n = 10          │
│ - local variable y          │
└─────────────────────────────┘
 
=== State 3: In inner(), executing ===
Stack:
┌─────────────────────────────┐
│ [main's stack frame]        │
│ - return address to OS      │
│ - local variable x          │
├─────────────────────────────┤
│ [outer's stack frame]       │
│ - return address to main    │
│ - parameter n = 10          │
│ - local variable y          │
├─────────────────────────────┤
│ [inner's stack frame]       │  ← Stack Pointer
│ - return address to outer   │
│ - parameter m = 15          │
└─────────────────────────────┘
 
=== State 4: inner() returns 16, back in outer() ===
Stack:
┌─────────────────────────────┐  
│ [main's stack frame]        │
│ - return address to OS      │
│ - local variable x          │
├─────────────────────────────┤
│ [outer's stack frame]       │  ← Stack Pointer (restored)
│ - return address to main    │
│ - parameter n = 10          │
│ - local variable y = 16     │
└─────────────────────────────┘
 
=== State 5: outer() returns 32, back in main() ===
Stack:
┌─────────────────────────────┐
│ [main's stack frame]        │  ← Stack Pointer (restored)
│ - return address to OS      │
│ - local variable x = 32     │
└─────────────────────────────┘

Stack Unwinding:

The process of returning through nested calls—popping stack frames in reverse order—is called stack unwinding. This concept becomes especially important for:

Exception handling: When an exception is thrown, the runtime unwinds the stack, searching for a catch handler and calling destructors along the way.
Debugging: Stack traces show unwound call chains, revealing how execution reached the current point.
Profilers: Sampling profilers capture stack frames periodically, then unwind them to attribute time to calling functions.

The LIFO nature of the stack ensures unwinding happens in exactly the right order—most recent call first, then its caller, and so on.

The Solution: Dividing Registers into Two Categories

Register Saving Responsibilities (System V AMD64)
Register Type	Saved By	Registers	Use Case
Caller-Saved (Volatile)	Caller, if needed after call	RAX, RCX, RDX, RSI, RDI, R8-R11	Scratch values, temporary computations
Callee-Saved (Non-Volatile)	Callee, if it modifies them	RBX, RBP, R12-R15	Long-lived values that persist across calls

How it works in practice:

This division minimizes total saving:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
;; Function that uses callee-saved registers RBX and R12
 
my_function:
    ;; Prologue: save callee-saved registers we'll modify
    push rbx             ; Save RBX
    push r12             ; Save R12
    
    ;; Now we can freely use RBX and R12
    mov rbx, [some_data]
    mov r12, [other_data]
    ;; ... more computation using RBX and R12 ...
    
    ;; Compute result
    add rax, rbx
    
    ;; Epilogue: restore callee-saved registers (reverse order!)
    pop r12              ; Restore R12
    pop rbx              ; Restore RBX
    ret

Order Matters

The Complete Call/Return Cycle

Let's synthesize everything into a complete picture of what happens during a function call and return. This detailed understanding represents exactly what happens in compiled code.

The Call Sequence

•Caller evaluates arguments
•Arguments placed in registers/stack
•Caller saves needed caller-saved registers
•CALL instruction pushes return address
•CALL instruction jumps to function
•Callee pushes old RBP (if used)
•Callee sets RBP = RSP (frame pointer)
•Callee allocates local variables (RSP -= size)
•Callee saves any callee-saved registers it uses
•Function body executes

The Return Sequence

•Callee places return value in RAX
•Callee restores callee-saved registers
•Callee deallocates locals (RSP += size)
•Callee pops old RBP into RBP
•RET instruction pops return address
•RET instruction jumps to return address
•Caller is now executing again
•Caller restores caller-saved registers
•Caller retrieves return value from RAX
•Caller continues with result

Simplified Model

Summary: The Mechanics of Call and Return

We've examined the precise mechanics that enable functions to work. This knowledge connects abstract understanding to concrete implementation.

Key Takeaways

•CALL pushes return address, RET pops it — This push/pop symmetry is the core mechanism enabling the call stack to track arbitrarily deep nesting.
•Calling conventions standardize argument passing — Different platforms have different rules, but all solve the same problems: where to put arguments, where to find return values.
•Register saving divides responsibility — Caller-saved and callee-saved registers minimize unnecessary saving while ensuring values persist across calls.
•Nested calls create a stack of contexts — Each call adds a frame; each return removes one. The LIFO order ensures correct unwinding.
•Stack unwinding enables exception handling and debugging — Understanding the stack lets you trace, debug, and handle errors effectively.

What's next:

Page Complete

2 / 4