Defense Mechanisms - Learning Module

Loading content...

0/227

Stack Protection

Fortifying the Stack

The stack is the most frequently attacked memory region in a process. It contains the most valuable targets for exploitation—return addresses, saved frame pointers, function arguments, and local variables including security-critical data. Every function call pushes new control information onto the stack; every return trusts that information hasn't been tampered with.

We've already explored stack canaries, which detect linear buffer overflows before a corrupted return address is used. But canaries are just one tool in a comprehensive stack protection arsenal. Modern systems deploy multiple layers of stack hardening:

Stack Canaries — Detect buffer overflow before return (already covered in Page 1)
Safe Stack — Separate stacks for control flow and data
Shadow Stack — Hardware-protected backup of return addresses
Stack Clash Protection — Prevent stack-heap collision attacks
Stack Bounds Checking — Hardware memory tagging for stack allocations
Compiler Reordering — Strategic placement of arrays and pointers

Each mechanism addresses different attack vectors, and together they create a stack that's remarkably difficult to compromise. This page explores these protections in depth, examining their implementation, performance impact, and security guarantees.

What You Will Learn

By the end of this page, you will understand: • The complete stack threat model and attack surface • Safe Stack architecture and implementation • Shadow Stack hardware and software implementations • Stack Clash Protection and guard pages • Compiler-based stack variable reordering • Memory tagging for stack allocations • Performance-security tradeoffs for each protection

The Complete Stack Threat Model

Before discussing protections, we must understand the full scope of stack-based attacks. Stack canaries address only one vector—linear overflow overwriting the return address. But attackers have developed numerous other techniques:

Stack Attack Vectors
Attack Type	Mechanism	Target	Canary Protection?
Linear Overflow	Writing past buffer end	Return address	YES - canary is corrupted
Off-by-One	Single byte overwrite	Saved frame pointer (LSB)	MAYBE - depends on layout
Format String	Arbitrary read/write via %n	Any stack location	NO - writes can skip canary
Stack Pivot	Redirect RSP to attacker memory	Stack pointer itself	NO - bypasses entire stack
Local Variable Corruption	Overwrite sensitive locals	Security flags, pointers	NO - below canary
Use-After-Return	Access stack frame after return	Dangling pointers	NO - different attack class
Stack Clash	Collide stack into heap/mmap	Memory layout	NO - no overflow needed
Return Address Overwrite (ROP)	Corrupt return for code reuse	Return address	YES - canary is corrupted

The key insight is that canaries protect only the return address path. Many attacks exploit stack data without touching the return address, or bypass the canary through non-linear writes.

Stack Attack Surfaces
Standard Stack Frame (with canary):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
High Address (bottom of stack)
┌─────────────────────────────────┐
│         Return Address          │ ← Canary protects this
├─────────────────────────────────┤
│         Saved RBP               │ ← Canary protects this
├─────────────────────────────────┤
│     ★★★ STACK CANARY ★★★       │
├─────────────────────────────────┤
│         security_flag           │ ← VULNERABLE: Below canary!
├─────────────────────────────────┤
│         is_admin (int)          │ ← VULNERABLE: Below canary!
├─────────────────────────────────┤
│         function_ptr            │ ← VULNERABLE: Below canary!
├─────────────────────────────────┤
│     │                     │     │
│     │   char buffer[64]   │     │ ← Overflow source
│     │                     │     │
├─────────────────────────────────┤
│         More locals...          │
└─────────────────────────────────┘
Low Address (top of stack)
 
Attack: Overflow buffer by exactly enough bytes to:
1. Overwrite function_ptr → Redirect call to shellcode/ROP
2. Overwrite is_admin → Bypass permission check
3. NOT touch the canary → Attack succeeds!
 
This is why we need ADDITIONAL protections beyond canaries.

The Local Variable Problem

Buffer overflows can corrupt security-critical local variables (permission flags, function pointers, object vtable pointers) without ever reaching the canary. The overflow is detected when the function returns—but by then, the corrupted locals may have already been used, leading to exploitation.

Safe Stack Architecture

Safe Stack addresses the local variable problem by physically separating the stack into two regions:

Safe Stack — Contains return addresses, saved registers, and other control data. Only accessed through normal function prologue/epilogue. Never contains potentially overflowable buffers.
Unsafe Stack — Contains local arrays, large structures, and any variables whose address is taken (which might be used unsafely). Overflows here cannot reach control data.

This separation ensures that even if an attacker overflows a buffer on the unsafe stack, they cannot reach return addresses because they're on an entirely different memory region.

Safe Stack Layout
Traditional Single Stack:          Safe Stack Separation:
━━━━━━━━━━━━━━━━━━━━━━━━━          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
┌────────────────────┐             SAFE STACK (Control Data)
│   Return Address   │ ←┐          ┌────────────────────┐
├────────────────────┤  │          │   Return Address   │ Protected!
│   Saved RBP        │  │          ├────────────────────┤
├────────────────────┤  │          │   Saved RBP        │
│   Canary           │  │ Overflow ├────────────────────┤
├────────────────────┤  │ path     │   Register saves   │
│   function_ptr     │  │          └────────────────────┘
├────────────────────┤  │                Not reachable!
│                    │  │                      ↑
│   buffer[64]       │ ─┘                No connection
│                    │                         ↓
└────────────────────┘             UNSAFE STACK (Data Only)
       ↑                           ┌────────────────────┐
 Overflow corrupts                 │   function_ptr     │ May be
 return address!                   ├────────────────────┤ corrupted,
                                   │                    │ but no
                                   │   buffer[64]       │ control
                                   │                    │ flow impact
                                   └────────────────────┘
                                          ↑
                                   Overflow stays here!

Implementation Details

Safe Stack is implemented in LLVM/Clang and works as follows:

Two stack pointers: The regular RSP points to the safe stack. A separate thread-local pointer tracks the unsafe stack.
Compiler analysis: For each function, the compiler classifies local variables:
- Safe: Scalars not referenced by address → stay on safe stack
- Unsafe: Arrays, address-taken variables → allocated on unsafe stack
Function prologue/epilogue: Safe variables use normal push/pop. Unsafe variables are allocated via the unsafe stack pointer and freed on return.

safe_stack_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Compile with: clang -fsanitize=safe-stack -o example example.c
 
#include <stdio.h>
#include <string.h>
 
void process_input(const char *input) {
    int loop_counter = 0;     // SAFE: scalar, not address-taken
    char buffer[256];          // UNSAFE: array
    char *ptr = buffer;        // SAFE: pointer value itself is safe
    int result = 0;            // SAFE: scalar
    
    // buffer and ptr's target are on unsafe stack
    // loop_counter and result are on safe stack
    // Return address is on safe stack
    
    strcpy(buffer, input);     // Overflow stays on unsafe stack!
    
    // Even if buffer overflows by 1000 bytes,
    // it CANNOT reach loop_counter, result, or return address
    // because they're on a completely different memory region
    
    printf("Processed: %s\n", buffer);
}
 
// Generated code (conceptual):
void process_input_safestack(const char *input) {
    // Unsafe stack allocation
    extern __thread char *__safestack_unsafe_stack_ptr;
    char *unsafe_sp = __safestack_unsafe_stack_ptr;
    unsafe_sp -= 256;  // Allocate buffer on unsafe stack
    char *buffer = unsafe_sp;
    __safestack_unsafe_stack_ptr = unsafe_sp;
    
    // Safe local variables (on regular stack via RSP)
    int loop_counter = 0;
    int result = 0;
    
    strcpy(buffer, input);  // Overflow confined to unsafe stack!
    
    // Restore unsafe stack pointer
    __safestack_unsafe_stack_ptr = unsafe_sp + 256;
}

Performance and Compatibility

Safe Stack has minimal performance overhead because:

Most functions don't have unsafe variables (no change)
Functions with unsafe variables add only a few instructions for allocation
No runtime checks (unlike canaries)

Performance impact: ~0.1% average overhead in benchmarks.

Compatibility: Requires compiler support (Clang). Libraries must be recompiled. Runtime provides the unsafe stack management.

Deployed in Production

Safe Stack is used in production by Google (Chrome OS, Fuchsia), Apple (parts of iOS), and other security-conscious projects. It provides strong protection against stack buffer overflows with negligible overhead—a rare combination in security engineering.

Shadow Stack: Hardware-Protected Return Addresses

While Safe Stack separates control and data, Shadow Stack takes a different approach: maintain a hardware-protected backup copy of return addresses. Every CALL instruction pushes the return address to both the regular stack and the shadow stack. Every RET instruction compares the two—any mismatch indicates tampering.

Shadow Stack provides definitive protection against return address corruption, including ROP attacks. Unlike software protections, hardware enforcement is nearly impossible to bypass.

Shadow Stack Operation
Shadow Stack Operation:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
CALL instruction execution:
┌──────────────────────────────────────────────────────────────┐
│  1. Push return address to REGULAR STACK (RSP)              │
│     RSP -= 8                                                 │
│     [RSP] = return_address                                   │
│                                                              │
│  2. Push return address to SHADOW STACK (SSP) [HARDWARE!]   │
│     SSP -= 8                                                 │
│     [SSP] = return_address                                   │
│                                                              │
│  3. Jump to called function                                  │
└──────────────────────────────────────────────────────────────┘
 
RET instruction execution:
┌──────────────────────────────────────────────────────────────┐
│  1. Pop return address from REGULAR STACK                    │
│     regular_ret = [RSP]                                      │
│     RSP += 8                                                 │
│                                                              │
│  2. Pop return address from SHADOW STACK [HARDWARE!]         │
│     shadow_ret = [SSP]                                       │
│     SSP += 8                                                 │
│                                                              │
│  3. COMPARE: regular_ret == shadow_ret?                      │
│     YES → Jump to return address (normal return)             │
│     NO  → #CP (Control Protection) Exception → CRASH!        │
└──────────────────────────────────────────────────────────────┘
 
Attack scenario:
┌────────────────────┐    ┌────────────────────┐
│   Regular Stack    │    │   Shadow Stack     │
├────────────────────┤    ├────────────────────┤
│  0xDEADBEEF ★      │    │  0x401234          │  ← Original return addr
│   (attacker's      │    │   (unchanged -     │
│    address)        │    │    read-only!)     │
└────────────────────┘    └────────────────────┘
        ↑                          ↑
   Attacker overwrote       Attacker CANNOT write here!
   
On RET: 0xDEADBEEF ≠ 0x401234 → #CP Exception → Attack failed!

Intel CET Shadow Stack

Intel Control-flow Enforcement Technology (CET) includes hardware shadow stack support, available since 11th Gen Core (Tiger Lake) and 3rd Gen Xeon (Ice Lake):

intel_cet_shadow_stack.txt
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Check for CET shadow stack support
grep -o 'shstk' /proc/cpuinfo | head -1
# shstk  ← Shadow Stack supported
 
# Check if kernel supports CET
cat /proc/sys/kernel/arch_cet_status 2>/dev/null || echo "Check dmesg"
dmesg | grep -i cet
# x86/cet: User-mode shadow stack enabled
 
# Compile with CET support (GCC 11+ or Clang 12+)
gcc -fcf-protection=full -mshstk -o secure_app app.c
 
# Verify CET marking in binary
readelf -n ./secure_app | grep -i shstk
# Properties: x86 feature: SHSTK
 
# New CPU registers for shadow stack:
# SSP - Shadow Stack Pointer (like RSP, but for shadow stack)
# MSR_IA32_PL3_SSP - User mode SSP base
# MSR_IA32_U_CET - User CET configuration
 
# New instructions:
# INCSSP - Increment SSP (unwind shadow stack)  
# RDSSP - Read SSP value
# SAVEPREVSSP/RSTORSSP - Save/restore SSP for context switches
# SETSSBSY/CLRSSBSY - Token management for shadow stack pages

Shadow Stack Memory Protection

The shadow stack lives in special memory pages that have unique properties:

Shadow Stack Page Properties

•Cannot be written by normal instructions: Regular MOV, PUSH, etc. cannot modify shadow stack pages. Only CALL/RET can write.
•Not readable as data: Prevents information leaks of return addresses.
•Allocated by kernel only: User code cannot create shadow stack pages; OS allocates at thread creation.
•Token-based validation: Each shadow stack page has a token that must be valid for the page to be used.
•Per-thread: Each thread has its own shadow stack, sized appropriately for the call depth.

Performance Characteristics

Shadow stack operations are handled in hardware with no measurable performance impact for normal code. The shadow stack push/pop happens in parallel with regular stack operations. The comparison on RET adds ~1 cycle. Typical overhead: <0.5% on real workloads.

Stack Clash Protection

Stack Clash is a class of vulnerabilities where an attacker causes the stack to grow large enough to collide with other memory regions (typically the heap or memory-mapped areas). This collision can corrupt critical data or enable code execution—without any traditional buffer overflow.

The attack exploits systems where:

Stack guard pages are not probed during large allocations
The attacker can control stack allocation size (recursion, alloca, VLAs)
The stack grows into adjacent writable memory

Stack Clash Attack
Memory Layout (Vulnerable):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
High Address
┌─────────────────────────────────────┐
│              Kernel                 │
├─────────────────────────────────────┤
│              Stack                  │ ← Stack grows DOWN
│               ↓                     │
│               ↓                     │
│               ↓                     │
├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┤ Guard page
│               ↓                     │
│               ↓ Large alloca()      │
│               ↓ JUMPS the guard!    │
│ ═══════════════════════════════════ │ ← CLASH!
│               ↑                     │
│              Heap                   │ ← Heap grows UP
│               ↑                     │
├─────────────────────────────────────┤
│              mmap                   │
├─────────────────────────────────────┤
│          Program code               │
└─────────────────────────────────────┘
Low Address
 
Attack mechanism:
1. Attacker triggers large stack allocation (array, alloca, recursion)
2. Stack pointer jumps over guard page (no page fault!)
3. Stack pointer lands in heap region
4. Stack writes corrupt heap metadata/data
5. Heap operations become controllable → code execution
 
Example vulnerable code:
void vulnerable(size_t size) {
    // Attacker controls 'size'
    char buffer[size];  // VLA - allocated on stack
    // If size = 1MB and stack is only 1MB from heap...
    // Stack pointer JUMPS into heap!
    buffer[0] = 'A';    // Writes to heap, not stack!
}

Stack Clash Protection Mechanisms

Modern systems implement multiple protections against stack clash:

stack_clash_protection.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// GCC/Clang: -fstack-clash-protection
 
// What the compiler generates for large stack allocations:
 
// Without protection:
void vulnerable(void) {
    char buffer[100000];  // 100KB allocation
    // Compiler: sub rsp, 100000
    // RSP jumps 100KB - may skip guard pages!
    use(buffer);
}
 
// With -fstack-clash-protection:
void protected(void) {
    char buffer[100000];
    // Compiler generates PROBING code:
    // 
    // Instead of: sub rsp, 100000
    // 
    // It does:
    // probe_loop:
    //   sub rsp, 4096        ; Allocate one page
    //   mov [rsp], 0         ; PROBE: Touch the page (triggers guard fault!)
    //   cmp allocation_left, 0
    //   jg probe_loop
    //
    // Each page is touched before the next is allocated.
    // If we hit the guard page, we get SIGSEGV immediately.
    // Stack CANNOT jump over the guard!
    
    use(buffer);
}
 
// The key insight: By touching each page as we grow the stack,
// we guarantee hitting the guard page before reaching other regions.
 
// Probing example (x86-64):
// For 100KB allocation with 4KB pages = 25 probes
void equivalent_protected(void) {
    volatile char probe;
    for (int i = 0; i < 25; i++) {
        // Probe each page
        asm volatile("sub $4096, %%rsp; movb $0, (%%rsp)" ::: "memory");
    }
    char *buffer = (char*)__builtin_alloca(100000);
    use(buffer);
}

Kernel-Level Stack Clash Protection

The kernel implements additional protections:

Kernel Stack Clash Mitigations

•Large guard regions: Modern kernels use 1MB guard regions (default was 4KB), making it harder to jump over.
•Stack limit checking: The kernel tracks stack boundaries and prevents allocations that would exceed them.
•RLIMIT_STACK enforcement: Hard limits on stack size prevent infinite growth.
•Address space layout: Randomization (ASLR) makes targeting harder even if clash occurs.
•sigaltstack protection: Alternate signal stacks are placed far from main stack.

stack_limits.txt
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Check current stack limit
ulimit -s
# 8192 (8MB default on most Linux systems)
 
# Check guard page size (kernel config)
cat /proc/sys/vm/stack_guard_gap 2>/dev/null || echo "Default: 256 pages"
# 256 (256 * 4KB = 1MB guard region)
 
# Verify stack clash protection in binary
objdump -d ./app | grep -A5 'sub.*rsp' | head -20
# Look for probing patterns after large RSP subtractions
 
# Compile with explicit protection
gcc -fstack-clash-protection -o safe_app app.c
 
# Test: Try to trigger stack clash (will be caught)
./test_stack_clash
# Segmentation fault (core dumped)  ← Guard page caught the clash!

CVE-2017-1000364: The Stack Clash

In 2017, Qualys discovered that stack clash attacks affected nearly all Unix-like systems. The vulnerability allowed local privilege escalation because the 4KB guard pages were easily jumped. The fix required both compiler-level probing and kernel-level guard region expansion. Modern systems with -fstack-clash-protection and large guard gaps are protected.

Compiler-Based Stack Enhancements

Beyond canaries and safe stack, compilers implement numerous other stack protection techniques. These operate at compile-time, transforming code to reduce stack vulnerability.

Variable Reordering

Compilers can strategically reorder local variables to minimize the impact of buffer overflows:

variable_reordering.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Source code as written:
void vulnerable(char *input) {
    int is_admin = 0;           // Security-critical variable
    char buffer[64];            // Overflow source
    void (*callback)(void);     // Function pointer
    int counter = 0;            // Non-critical variable
    
    strcpy(buffer, input);      // Overflow!
    
    if (is_admin) { /* grant access */ }
    callback();
}
 
// Without reordering, stack layout:
// [return address]
// [saved rbp]
// [canary]
// [is_admin]     ← Vulnerable: reachable by overflow
// [buffer]       ← Overflow starts here
// [callback]     ← Vulnerable
// [counter]
 
// With compiler reordering (MSVC /GS, GCC):
// [return address]
// [saved rbp]  
// [canary]
// [callback]     ← MOVED: Now after canary
// [is_admin]     ← MOVED: Now after canary
// [counter]
// [buffer]       ← Arrays placed LAST (lowest address)
 
// Now overflow from buffer hits only:
// 1. counter (non-critical)
// 2. is_admin (still vulnerable, but better than control data)
// 3. callback (still vulnerable, but after is_admin)
// 4. canary (attack detected!)
//
// Critical: is_admin and callback are still corrupt, but
// they're accessed AFTER function return, where canary catches it.

FORTIFY_SOURCE

The _FORTIFY_SOURCE feature replaces dangerous standard library functions with bounds-checked versions:

fortify_source.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Compile with: gcc -D_FORTIFY_SOURCE=2 -O2 -o app app.c
 
#include <string.h>
 
void example(char *input) {
    char buffer[64];
    
    // Without FORTIFY:
    // strcpy(buffer, input);  // No checking at all!
    
    // With FORTIFY_SOURCE=2, this becomes:
    // __strcpy_chk(buffer, input, 64);
    // If strlen(input) >= 64, __chk_fail() is called → ABORT
    
    strcpy(buffer, input);
}
 
// FORTIFY levels:
// -D_FORTIFY_SOURCE=1 → Compile-time checks only
// -D_FORTIFY_SOURCE=2 → Compile-time + runtime checks
// -D_FORTIFY_SOURCE=3 → More aggressive (GCC 12+)
 
// Functions fortified:
// memcpy, memmove, memset, strcpy, strncpy, strcat, strncat,
// sprintf, snprintf, vsprintf, vsnprintf, gets, read, recv, etc.
 
// Example of compiled code:
// Original: strcpy(buffer, input);
// Fortified:
//   if (__builtin_object_size(buffer, 0) != (size_t)-1) {
//       if (__builtin_constant_p(strlen(input))) {
//           // Compile-time check
//           if (strlen(input) >= __builtin_object_size(buffer, 0))
//               __chk_fail();
//       }
//       __strcpy_chk(buffer, input, __builtin_object_size(buffer, 0));
//   } else {
//       strcpy(buffer, input);
//   }

Zero-Initialization of Stack Variables

Uninitialized stack variables can leak sensitive data or contain exploitable values. Modern compilers can zero-initialize all stack memory:

stack_init.txt
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# GCC/Clang: Zero-initialize stack variables
gcc -ftrivial-auto-var-init=zero -o app app.c
 
# Clang: Pattern initialization (more likely to crash on use)
clang -ftrivial-auto-var-init=pattern -o app app.c
 
# Linux kernel uses this for security
# CONFIG_INIT_STACK_ALL_ZERO=y
 
# Benefits:
# 1. Prevents information leaks from uninitialized stack
# 2. Makes uninitialized-variable bugs crash reliably
# 3. Prevents exploitation via stack garbage
 
# Cost: ~1-5% overhead depending on workload
# Worth it for security-critical code!

Compiler Stack Protection Flags
Flag	Compiler	Protection	Overhead
-fstack-protector-strong	GCC/Clang	Stack canaries for vulnerable functions	~1%
-fsanitize=safe-stack	Clang	Separate safe/unsafe stacks	~0.1%
-fstack-clash-protection	GCC/Clang	Probe stack on large allocations	~1%
-D_FORTIFY_SOURCE=2	GCC/Clang	Bounds-check stdlib functions	~0.5%
-ftrivial-auto-var-init=zero	GCC/Clang	Zero all stack variables	~3%
-fcf-protection=full	GCC/Clang	CET shadow stack + IBT	~0.5%
/GS	MSVC	Stack canaries + reordering	~1%
/guard:cf	MSVC	Control Flow Guard	~1%

Recommended Production Flags

For maximum stack protection, use: GCC/Clang: -fstack-protector-strong -fstack-clash-protection -D_FORTIFY_SOURCE=2 -fcf-protection=full MSVC: /GS /DYNAMICBASE /NXCOMPAT /guard:cf

Combined overhead is typically <5%, providing excellent security-performance tradeoff.

Memory Tagging for Stack Protection

Memory Tagging is an emerging hardware feature that assigns random tags to memory allocations. Pointers carry the tag in unused high bits, and memory access validates that the pointer tag matches the memory tag. This provides fine-grained detection of spatial (overflow) and temporal (use-after-free) memory errors.

For stack protection, each stack allocation gets a unique tag. Accessing beyond the allocation (overflow) or after return (use-after-return) is detected with high probability.

Memory Tagging for Stack
ARM Memory Tagging Extension (MTE) for Stack:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
Without MTE:
┌───────────────────────────────┐
│ buffer[64]                    │ ← Write buffer[100] = 'X'
│                               │   Works! No error detected.
│ other_data                    │ ← Corrupted!
└───────────────────────────────┘
 
With MTE:
┌───────────────────────────────┐         Tagged Pointer
│ buffer[64]  [Tag: 0xA]        │         ┌──────────────────┐
│                               │         │ 0xA | 0x7fff1234 │
│ other_data  [Tag: 0xB]        │         └──────────────────┘
└───────────────────────────────┘           ↑         ↑
                                          Tag     Address
 
Access buffer[100]:
1. Pointer has tag 0xA (for buffer)
2. Memory at buffer+100 has tag 0xB (belongs to other_data)
3. 0xA ≠ 0xB → Tag mismatch!
4. Hardware raises exception → BUG DETECTED!
 
Memory tags are stored separately (1 tag byte per 16 bytes of memory).
Tags are 4 bits, so 16 possible values.
Probability of matching wrong tag by chance: 1/16 = 6.25%
 
For security: Randomized tags make exploitation probabilistic.
Attacker must guess correct tag, with 93.75% chance of detection per attempt.

Use-After-Return Detection

Memory tagging excels at detecting use-after-return vulnerabilities:

use_after_return.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Use-After-Return vulnerability
 
int *get_pointer(void) {
    int local = 42;      // Tag: 0x3 assigned
    return &local;       // Returning pointer to stack!
}
 
void other_function(void) {
    int x = 100;         // NEW Tag: 0x7 (different random tag!)
    // ...
}
 
int main() {
    int *ptr = get_pointer();  // ptr has tag 0x3
    
    other_function();  // Stack frame reused, tag changed to 0x7
    
    // Use-after-return!
    printf("%d\n", *ptr);  // Dereference with tag 0x3
                            // Memory now has tag 0x7
                            // 0x3 ≠ 0x7 → EXCEPTION!
    
    return 0;
}
 
// Without MTE: This might print garbage, corrupt data,
// or appear to work (undefined behavior).
//
// With MTE: Guaranteed detection (with 93.75% probability
// due to 4-bit tags—deterministic mode uses separate
// pools for near-certainty).

Hardware Support

Currently available on ARM processors:

Memory Tagging Hardware Support
Platform	Feature	Availability	Status
ARM	MTE (Memory Tagging Extension)	ARMv8.5-A	Available in Pixel 8, Samsung S24
ARM	Top-Byte Ignore (TBI)	ARMv8.0-A	Widely available, software-based tagging
Intel	Linear Address Masking (LAM)	Future CPUs	Announced, not yet shipping
AMD	Upper Address Ignore (UAI)	Future CPUs	Planned

Android MTE Adoption

Google has enabled MTE in Android 12+ on supported devices (Pixel 8, etc.). Security-critical processes like Bluetooth, NFC, and media codecs run with MTE enabled. This provides strong protection against memory corruption exploits in the most attacked components.

Summary: Comprehensive Stack Protection

The stack remains a critical attack surface, but modern systems deploy multiple layers of protection that make exploitation extremely difficult. Each mechanism addresses different attack vectors:

Key Takeaways

•Stack canaries detect linear overflows but don't protect local variables below the canary.
•Safe Stack separates control data from overflowable data, providing strong isolation with minimal overhead.
•Shadow Stack (Intel CET) provides hardware-guaranteed return address integrity, defeating ROP.
•Stack Clash Protection prevents stack-heap collision attacks through probing and large guard regions.
•Variable reordering reduces the chance of corrupting critical locals before detection.
•FORTIFY_SOURCE adds runtime bounds checking to dangerous standard library functions.
•Memory tagging (MTE) detects both spatial and temporal stack memory errors probabilistically.

What's Next:

The stack is only one part of the attack surface. Compilers also provide protections for heap allocations, control flow integrity, and other security-critical aspects. The next page explores Compiler Protections more broadly, examining Control Flow Integrity (CFI), sanitizers, and other toolchain-based security features.

Page Complete

You now have comprehensive knowledge of stack protection mechanisms beyond basic canaries. From Safe Stack to Shadow Stack, Stack Clash Protection to Memory Tagging, you understand the multi-layered approach modern systems use to protect this critical memory region.