Loading learning content...
When you write result = calculateTotal(items), something remarkable happens behind the scenes. Your program must:
items parameter) to that new locationThis seamless transition happens millions of times per second in running programs. The mechanism that enables it is the call stack, and understanding its precise operation transforms you from a developer who uses functions to one who truly understands them.
By the end of this page, you will understand the exact sequence of operations that occur during a function call and return. You'll know how arguments are passed, how return addresses are saved, how execution context is preserved, and how all this information is managed on the stack—knowledge that illuminates debugging, performance optimization, and low-level programming.
Before examining the solution, let's understand the problem deeply. For a function call to work, the system must solve several non-trivial challenges:
Challenge 1: Saving the Return Address
When you call a function, you leave one location in code and jump to another. But you need to come back. Where do you come back to? The instruction immediately after the call. This "return address" must be saved somewhere before the jump—and the stack is that somewhere.
Challenge 2: Passing Arguments
Functions need input. How does add(3, 5) get the values 3 and 5 to the add function? These values must be placed somewhere the function can find them. Different calling conventions solve this differently (registers, stack, or both).
Challenge 3: Providing Return Value
The function computes a result. How does it give that result back to the caller? There must be a designated location—often a specific register—where the return value is placed.
Challenge 4: Nested Calls
Functions call other functions which call other functions. Each level must independently handle all the above challenges. This is where the stack's LIFO nature becomes essential—each call adds its own context, and each return removes it.
Early computers didn't have stacks. Return addresses were often stored in the function itself—making recursion impossible since a second call would overwrite the first return address. The invention of the call stack is what made recursive algorithms practical.
Let's trace exactly what happens during a function call. We'll use a simplified model based on x86-64 calling conventions, though the principles apply across architectures.
The Scenario:
void caller() {
int result = add(3, 5);
// ... continue with result
}
int add(int a, int b) {
int sum = a + b;
return sum;
}
The CALL instruction's atomic push-and-jump is critical. If these were separate operations, an interrupt between them could corrupt the stack or return address. CPU designers recognized this need early and made CALL and RET atomic operations.
The CALL instruction is the heart of function invocation. Let's examine it in detail:
What CALL does (pseudocode):
CALL target_address:
PUSH (address of next instruction) ; Save return address
JMP target_address ; Transfer control
In concrete terms:
If the CALL instruction is at memory address 0x1000 and is 5 bytes long, and the target function is at 0x2000:
0x1005 (next instruction) is pushed onto the stack0x20000x2000 (the function entry)123456789101112131415
; Before CALL; RSP (Stack Pointer) = 0x7FFF1000 (example address); RIP (Instruction Pointer) = 0x400500 (at the CALL instruction) ; The CALL instruction0x400500: call 0x400700 ; Call function at 0x400700 ; What happens:; 1. RSP decremented by 8: RSP = 0x7FFF0FF8; 2. Return address stored: [0x7FFF0FF8] = 0x400505; (0x400505 is the address right after the CALL); 3. Jump to function: RIP = 0x400700 ; After CALL, execution continues at 0x400700 (the function); Stack now contains the return address for later retrievalVariations in CALL:
Direct Call: The target address is encoded in the instruction itself. Example: call 0x400700
Indirect Call: The target address comes from a register or memory location. Example: call rax or call [rbx]. This enables virtual function dispatch, callbacks, and function pointers.
Relative Call: The instruction encodes an offset from the current position rather than an absolute address. This is common for position-independent code (PIC) in shared libraries.
Regardless of the variation, the stack behavior—pushing the return address—remains consistent.
The RET (return) instruction is the counterpart to CALL, completing the round trip:
What RET does (pseudocode):
RET:
POP temp ; Remove return address from stack
JMP temp ; Jump to that address
This simple operation undoes what CALL did—pops the saved return address and jumps to it.
1234567891011121314
; At the end of the function; RSP = 0x7FFF0FF8 (pointing to saved return address); [0x7FFF0FF8] = 0x400505 (the return address from CALL) ; The RET instruction0x400750: ret ; What happens:; 1. Pop return address: temp = [RSP] = 0x400505; 2. Increment RSP: RSP = 0x7FFF1000; 3. Jump to return address: RIP = 0x400505 ; Execution continues at 0x400505 (instruction after original CALL); Stack is restored to its state before the CALLSymmetry of CALL and RET:
Notice the beautiful symmetry:
| CALL | RET |
|---|---|
| Decrement RSP | Increment RSP |
| Push return address | Pop return address |
| Jump to callee | Jump to caller |
This symmetry ensures that every CALL is perfectly undone by RET, maintaining stack integrity across arbitrarily deep call chains.
RET blindly pops whatever is at the top of the stack and jumps there. If the function corrupted the stack—say, by pushing something without popping it—RET will jump to the wrong address, typically causing a crash. This is a common bug in assembly programming and a security concern (return-oriented programming exploits this).
How arguments travel from caller to callee is defined by calling conventions—agreements about where arguments are placed. Different platforms and languages have different conventions, but they all solve the same problem.
The Three Primary Approaches:
| Strategy | How It Works | Pros | Cons |
|---|---|---|---|
| Stack-Based | All arguments pushed onto stack before CALL | Simple, unlimited arguments, consistent | Slower due to memory access, stack overhead |
| Register-Based | Arguments passed in CPU registers | Fast (registers are CPU-internal), no memory access | Limited to available registers, complex spilling |
| Hybrid | First N arguments in registers, rest on stack | Best of both approaches, common in modern ABIs | More complex calling convention |
Common Calling Conventions:
System V AMD64 ABI (Linux, macOS, BSD):
Microsoft x64 (Windows):
cdecl (32-bit x86):
123456789101112131415
;; C function call: result = add(3, 5);;; Using System V AMD64 ABI ;; Caller code:mov edi, 3 ; First argument (a = 3) in EDI (lower 32 bits of RDI)mov esi, 5 ; Second argument (b = 5) in ESI (lower 32 bits of RSI)call add ; Call the functionmov [result], eax ; Return value is in EAX ;; Callee (add function):add: ;; EDI contains 'a' (3) ;; ESI contains 'b' (5) lea eax, [edi + esi] ; EAX = a + b = 8 ret ; Return (result in EAX)When arguments are pushed right-to-left, the first argument ends up at the lowest stack address (nearest the top after all pushes). This allows the callee to find arguments at consistent offsets from the stack pointer, regardless of how many arguments were passed—enabling variadic functions like printf.
Returning a value from a function involves placing it where the caller can find it. This is simpler than argument passing because there's only one return value (in most languages), but there are still important considerations.
Standard Return Value Locations:
| Value Type | Typical Location | Notes |
|---|---|---|
| Integer (up to 64-bit) | RAX register | Fast, direct, most common |
| Integer (128-bit) | RAX:RDX pair | Lower 64 bits in RAX, upper in RDX |
| Floating-point | XMM0 register | SSE/AVX registers for float/double |
| Small struct (≤16 bytes) | RAX:RDX pair or XMM registers | Returned in registers if it fits |
| Large struct | Caller-allocated memory, pointer passed | Caller provides space; function fills it |
The Hidden First Argument:
When a function returns a large struct (too big for registers), the compiler implements a clever workaround:
// What you write:
Struct getBigData() {
Struct result;
// ... fill result
return result;
}
// What the compiler generates (conceptually):
void getBigData(Struct* __hidden_return_ptr) {
// ... fill *__hidden_return_ptr
return;
}
The caller allocates space for the return value and passes a pointer as a hidden first argument. The function fills that memory directly. This is called Return Value Optimization (RVO) when the compiler optimizes away intermediate copies.
Returning by register is fast—no memory access needed. Returning large structs by value incurs memory overhead. When performance matters, consider designing APIs that avoid returning large data structures, or use pointers/references for in-place modification.
The true power of the call stack emerges with nested function calls. Each call adds a layer to the stack; each return removes a layer. Let's trace through a nested scenario:
Example:
int main() {
int x = outer(10);
return x;
}
int outer(int n) {
int y = inner(n + 5);
return y * 2;
}
int inner(int m) {
return m + 1;
}
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
=== State 1: In main(), about to call outer(10) ===Stack:┌─────────────────────────────┐│ [main's stack frame] │ ← Stack Pointer│ - return address to OS ││ - local variable x │└─────────────────────────────┘ === State 2: In outer(), about to call inner(15) ===Stack:┌─────────────────────────────┐│ [main's stack frame] ││ - return address to OS ││ - local variable x │├─────────────────────────────┤│ [outer's stack frame] │ ← Stack Pointer│ - return address to main ││ - parameter n = 10 ││ - local variable y │└─────────────────────────────┘ === State 3: In inner(), executing ===Stack:┌─────────────────────────────┐│ [main's stack frame] ││ - return address to OS ││ - local variable x │├─────────────────────────────┤│ [outer's stack frame] ││ - return address to main ││ - parameter n = 10 ││ - local variable y │├─────────────────────────────┤│ [inner's stack frame] │ ← Stack Pointer│ - return address to outer ││ - parameter m = 15 │└─────────────────────────────┘ === State 4: inner() returns 16, back in outer() ===Stack:┌─────────────────────────────┐ │ [main's stack frame] ││ - return address to OS ││ - local variable x │├─────────────────────────────┤│ [outer's stack frame] │ ← Stack Pointer (restored)│ - return address to main ││ - parameter n = 10 ││ - local variable y = 16 │└─────────────────────────────┘ === State 5: outer() returns 32, back in main() ===Stack:┌─────────────────────────────┐│ [main's stack frame] │ ← Stack Pointer (restored)│ - return address to OS ││ - local variable x = 32 │└─────────────────────────────┘Stack Unwinding:
The process of returning through nested calls—popping stack frames in reverse order—is called stack unwinding. This concept becomes especially important for:
Exception handling: When an exception is thrown, the runtime unwinds the stack, searching for a catch handler and calling destructors along the way.
Debugging: Stack traces show unwound call chains, revealing how execution reached the current point.
Profilers: Sampling profilers capture stack frames periodically, then unwind them to attribute time to calling functions.
The LIFO nature of the stack ensures unwinding happens in exactly the right order—most recent call first, then its caller, and so on.
Functions use CPU registers for computation. But when function A calls function B, B will use the same registers. If A had important values in those registers, they'd be overwritten. How is this managed?
The Solution: Dividing Registers into Two Categories
| Register Type | Saved By | Registers | Use Case |
|---|---|---|---|
| Caller-Saved (Volatile) | Caller, if needed after call | RAX, RCX, RDX, RSI, RDI, R8-R11 | Scratch values, temporary computations |
| Callee-Saved (Non-Volatile) | Callee, if it modifies them | RBX, RBP, R12-R15 | Long-lived values that persist across calls |
How it works in practice:
Caller-saved registers: The caller assumes these registers may be modified by any function it calls. If the caller needs a value in RAX to persist across a call, it must save that value (typically on the stack) before the call and restore it after.
Callee-saved registers: The callee promises that these registers will have the same values after it returns as they did when it was called. If the callee wants to use RBX, it must save the original RBX value (on the stack) at entry and restore it before returning.
This division minimizes total saving:
If all registers were callee-saved, every function would have to save them all. If all were caller-saved, every caller would have to save everything. The split lets functions that use few registers avoid saving, while functions that need many registers bear the cost of saving only what they modify.
12345678910111213141516171819
;; Function that uses callee-saved registers RBX and R12 my_function: ;; Prologue: save callee-saved registers we'll modify push rbx ; Save RBX push r12 ; Save R12 ;; Now we can freely use RBX and R12 mov rbx, [some_data] mov r12, [other_data] ;; ... more computation using RBX and R12 ... ;; Compute result add rax, rbx ;; Epilogue: restore callee-saved registers (reverse order!) pop r12 ; Restore R12 pop rbx ; Restore RBX retNotice that registers are restored in reverse order of how they were saved. This is the LIFO principle in action—push A, push B, then pop B, pop A. Getting this wrong corrupts register values and causes mysterious bugs.
Let's synthesize everything into a complete picture of what happens during a function call and return. This detailed understanding represents exactly what happens in compiled code.
This is a simplified model. Modern compilers may optimize away the frame pointer (using RSP directly), skip saving registers that aren't used, inline small functions entirely eliminating the call, or reorder operations for performance. But the fundamental logic—push context on call, pop context on return—remains the conceptual foundation.
We've examined the precise mechanics that enable functions to work. This knowledge connects abstract understanding to concrete implementation.
What's next:
Now that we understand function calls and returns, the next page examines what's actually inside each stack frame. We'll explore stack frames in detail—the structure that holds local variables, saved registers, and everything a function needs to execute independently. This is where the stack becomes a sophisticated memory manager for each function's execution context.
You now understand the precise mechanics of function calls and returns. The CALL and RET instructions, argument passing conventions, register saving rules, and stack unwinding all work together to enable the seamless function invocations you write every day. Next, we'll explore the structure of stack frames themselves.