Intel X86 Memory Model - Learning Module

Loading content...

0/227

Modern x86-64 Changes

The 64-bit Revolution and Segmentation's Twilight

When AMD introduced the x86-64 architecture (originally called AMD64) in 2003, it represented the most significant evolution of the x86 platform since the 80386. While maintaining backward compatibility with 32-bit protected mode, x86-64 fundamentally reimagined memory management for the 64-bit era.

The most striking change? Segmentation was largely deprecated. The elaborate segment base addresses, limits, and type checking that defined protected mode became irrelevant artifacts in 64-bit long mode. Instead, x86-64 embraced a flat memory model with mandatory paging—the model that modern operating systems had been using anyway, now with hardware that matched the software reality.

This page explores the evolution to x86-64, what changed, what survived, and why. Understanding these changes is essential for anyone working with modern systems, as they explain why segment registers still exist but are mostly vestigial in 64-bit code.

What You Will Learn

By the end of this page, you will understand the architectural motivations behind x86-64's segmentation changes, the distinction between long mode and compatibility mode, which segmentation features are disabled or ignored in 64-bit code, how FS and GS bases remain useful for TLS and per-CPU data, the evolution of system segment descriptors to 16 bytes, and practical implications for modern operating system design.

The Path to x86-64

To understand why x86-64 simplified segmentation, we must understand the historical context and the practical realities that drove the change.

The State of Segmentation in the 1990s:

By the mid-1990s, despite the elaborate segmentation architecture in protected mode:

All major operating systems used flat memory models: Windows NT, Linux, FreeBSD, and others set all segments to base=0, limit=4GB
Paging provided finer-grained protection: 4KB pages with independent permissions were more flexible than variable-size segments
Programming models had evolved: C compilers and applications assumed flat addressing
Segment arithmetic was a burden: Near vs. far pointers, 64KB limitations in 16-bit code, and segment arithmetic were legacy headaches
Other architectures didn't have segmentation: ARM, MIPS, PowerPC, Alpha all used flat models with paging

Segmentation had effectively become overhead—descriptors had to be maintained, limits had to be set to maximum, but the actual protection came from paging.

x86 Architecture Evolution
Processor	Year	Address Bus	Segmentation Model	Key Changes
8086/8088	1978	20-bit (1 MB)	Real mode only	Original segment:offset
80286	1982	24-bit (16 MB)	Protected mode intro	Descriptor tables, protection rings
80386	1985	32-bit (4 GB)	Full protected mode	32-bit segments, paging introduced
80486	1989	32-bit (4 GB)	No change	Performance, on-chip cache
Pentium	1993	32-bit (4 GB)	No change	Superscalar, faster operations
Pentium Pro	1995	36-bit (64 GB)	PAE paging	Out-of-order execution
AMD64/x86-64	2003	48-bit virtual (256 TB)	Flat model mandatory	64-bit mode, simplified segments

AMD's Design Philosophy:

When AMD designed the x86-64 extension (Intel later adopted it as EM64T/Intel 64), they made a deliberate choice to simplify segmentation:

Keep backward compatibility: 32-bit code must run unchanged in compatibility mode
Eliminate unnecessary complexity: Features no one used shouldn't burden 64-bit code
Embrace the flat model: What OSes actually use should be what hardware supports
Preserve useful features: Privilege levels and FS/GS bases serve real purposes
Require paging: Virtual memory should be mandatory, not optional

The result was a clean slate for 64-bit code while maintaining an escape hatch (compatibility mode) for legacy software.

The IA-64 Cautionary Tale

Intel's own 64-bit architecture, IA-64 (Itanium), was a radical redesign incompatible with x86. Its market failure demonstrated that backward compatibility matters enormously. AMD64's success came precisely from its ability to run existing 32-bit software natively while adding 64-bit capabilities incrementally.

Long Mode Architecture

x86-64 introduces a new operating mode called Long Mode, which encompasses two sub-modes:

64-bit Mode: Native 64-bit code execution with simplified segmentation
Compatibility Mode: 32-bit (and 16-bit) code execution with full legacy segmentation

The system as a whole is in long mode when the EFER.LME (Long Mode Enable) bit is set and paging is enabled. Within long mode, the current sub-mode is determined by the L (Long) bit in the current code segment descriptor.

Converting Mermaid diagram...

Enabling Long Mode:

The transition to long mode requires:

Disable paging (if enabled)
Enable PAE (Physical Address Extension) in CR4
Load CR3 with the physical address of the PML4 table (4-level page table)
Enable long mode by setting EFER.LME (bit 8 of the EFER MSR)
Enable paging by setting CR0.PG
Execute in 64-bit segment (CS with L=1)

Once in long mode, the processor can switch between 64-bit and compatibility modes simply by changing the CS selector to one with a different L bit—an ordinary far jump or call suffices.

Key Constraints of Long Mode:

Long Mode Requirements and Constraints

•Paging is mandatory: Cannot operate in long mode with paging disabled
•PAE paging required: Must use 4-level (or 5-level) page tables
•GDT still required: Even with simplified segmentation, a GDT must exist
•64-bit addresses: Virtual addresses are 48-bit canonical (or 57-bit with LA57)
•System segments are 16 bytes: TSS and LDT descriptors expand to 16 bytes

Canonical Addresses

Although x86-64 uses 64-bit virtual addresses, only 48 bits are currently meaningful (addressable space of 256 TB). Bits 63-48 must be sign-extended from bit 47 (all 0s or all 1s), creating a gap in the middle of the address space. Addresses satisfying this rule are 'canonical'; non-canonical addresses generate #GP faults. This allows future expansion to 57 bits (128 PB) with LA57 or beyond.

Segmentation in 64-bit Mode: What Changed

In 64-bit mode, segmentation is dramatically simplified. Most segment descriptor fields that mattered in 32-bit protected mode are ignored. Let's examine each segment register and what changed:

Segment Register Behavior in 64-bit Mode
Register	Base Address	Limit	Access Rights	Usage
CS	Fixed at 0	Ignored	L bit checked (must be 1)	Privilege level (CPL), 64-bit mode indicator
SS	Fixed at 0	Ignored	DPL still checked	Stack operations, privilege transitions
DS	Fixed at 0	Ignored	Minimal checks	Often set to null (0)
ES	Fixed at 0	Ignored	Minimal checks	Often set to null (0)
FS	User-defined	Ignored	Minimal checks	Thread-Local Storage (TLS)
GS	User-defined	Ignored	Minimal checks	Per-CPU data (kernel), TLS (user)

What's Ignored:

Base Address (except FS/GS): For CS, SS, DS, ES, the base is treated as 0 regardless of descriptor contents
Segment Limits: No limit checking occurs for 64-bit code; offsets can be any 64-bit value (subject to canonical address rules)
Expand-Down Attribute: Meaningless without limit checking
D/B Bit: In 64-bit mode (L=1), the D bit must be 0; operand size is determined by instruction prefixes
Type Checking (partially): Many type checks are relaxed or eliminated

What's Preserved:

Privilege Levels: CPL is still derived from CS selector bits 0-1; DPL checks still occur for segment access and call gates
L Bit: Critically determines 64-bit vs. compatibility mode execution
Present Bit: Non-present segments still generate #NP fault
FS/GS Base Addresses: The only actively used segment base addresses in 64-bit mode
System Segment Descriptors: TSS, LDT, and gate descriptors are still processed

64bit_segment_behavior.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
/* Demonstrating 64-bit mode segment behavior */
 
/* In 64-bit mode, these accesses work even with DS=null: */
void access_memory_without_ds(void) {
    int value;
    
    /* DS is typically null (0) in 64-bit Linux user space */
    /* But memory access still works - base is forced to 0 */
    __asm__ volatile(
        "xor %%eax, %%eax\n"
        "mov %%ax, %%ds\n"         /* DS = null */
        "mov (%%rsp), %0\n"        /* Still works! Base = 0 */
        : "=r"(value)
        : : "ax"
    );
}
 
/* FS and GS retain their special purpose */
void access_tls_via_fs(void) {
    void *tls_ptr;
    
    /* glibc stores TLS pointer at offset 0 from FS base */
    __asm__ volatile(
        "mov %%fs:0, %0"
        : "=r"(tls_ptr)
    );
}
 
/* Linux kernel uses GS for per-CPU data */
unsigned long get_percpu_variable(void) {
    unsigned long value;
    
    /* GS base points to current CPU's per-cpu area */
    __asm__ volatile(
        "mov %%gs:0, %0"
        : "=r"(value)
    );
    return value;
}
 
/* Setting FS/GS base - multiple methods */
void set_fs_base_msrs(void *base) {
    /* Method 1: Use WRMSR (always works, slow) */
    __asm__ volatile(
        "mov $0xC0000100, %%ecx\n" /* IA32_FS_BASE */
        "mov %0, %%eax\n"
        "mov %1, %%edx\n"
        "wrmsr\n"
        :
        : "r"((uint32_t)(uintptr_t)base),
          "r"((uint32_t)((uintptr_t)base >> 32))
        : "eax", "ecx", "edx"
    );
}
 
void set_fs_base_wrfsbase(void *base) {
    /* Method 2: Use WRFSBASE (requires CPUID.FSGSBASE, fast) */
    __asm__ volatile(
        "wrfsbase %0"
        :
        : "r"(base)
    );
}

Compatibility Mode is Different

Everything above applies to 64-bit mode (CS.L=1). In compatibility mode (CS.L=0), full 32-bit segmentation is in effect: segment bases are used, limits are checked, type validation occurs. This is how 32-bit applications run on 64-bit operating systems—they operate in compatibility mode with legacy segmentation semantics.

FS and GS in Modern Systems

While most segmentation is disabled in 64-bit mode, FS and GS remain fully functional with their base addresses. This seemingly minor exception is actually critical for modern operating system and runtime library design.

Why FS and GS Survived:

FS and GS provide a mechanism for efficient per-context data access without pointer indirection:

Access via %fs:offset or %gs:offset adds the segment base automatically
No register is consumed for a base pointer
The access pattern is very common (every thread needs its own pointer to TLS)
Changing the base (via MSR or WRFSBASE/WRGSBASE) is fast

FS Usage: Thread-Local Storage (User space)

glibc and other C runtimes use FS for TLS access:

fs_tls_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
/* Thread-Local Storage via FS segment */
 
#include <stdint.h>
 
/* glibc TCB (Thread Control Block) structure (simplified) */
struct tcbhead_t {
    void *tcb;           /* offset 0: Pointer to the TCB */
    void *dtv;           /* offset 8: Dynamic Thread Vector */
    void *self;          /* offset 16: Pointer to this structure */
    int multiple_threads;/* offset 24: Thread count > 1? */
    int gscope_flag;     /* offset 28: Global scope flag */
    /* ... more fields ... */
    uintptr_t stack_guard;/* offset 40: Stack canary value */
};
 
/* Accessing stack canary (used by GCC -fstack-protector) */
uintptr_t get_stack_canary(void) {
    uintptr_t canary;
    /* Linux x86-64 ABI: stack canary at %fs:40 */
    __asm__ volatile(
        "mov %%fs:40, %0"
        : "=r"(canary)
    );
    return canary;
}
 
/* Getting a TLS variable address 
   Compiler generates: mov %fs:offset, %reg 
   where offset is determined at link time */
__thread int my_tls_variable;
 
void increment_tls_var(void) {
    my_tls_variable++;  /* Compiles to %fs-relative access */
}
 
/* Setting FS base for a new thread */
void setup_thread_tls(struct tcbhead_t *tcb) {
    /* On Linux, use arch_prctl system call */
    #include <asm/prctl.h>
    #include <sys/prctl.h>
    
    arch_prctl(ARCH_SET_FS, (unsigned long)tcb);
}

GS Usage: Per-CPU Data (Kernel space)

The Linux kernel uses GS for per-CPU data structures. Each CPU has its own distinct GS base pointing to its local per-CPU area:

gs_percpu_kernel.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/* Linux kernel per-CPU data access via GS */
 
/* Kernel per-CPU variables are accessed through GS */
/* The GS base is different on each CPU */
 
/* Reading the current task struct pointer */
struct task_struct *get_current(void) {
    struct task_struct *current;
    /* current_task is at a fixed offset in per-CPU area */
    __asm__ volatile(
        "mov %%gs:current_task, %0"
        : "=r"(current)
    );
    return current;
}
 
/* Per-CPU variable declaration (simplified from kernel) */
#define DEFINE_PER_CPU(type, name) \
    __attribute__((section(".data..percpu"))) type name
 
/* Reading a per-CPU variable */
#define this_cpu_read(var) \
    ({ typeof(var) __ret; \
       __asm__ volatile("mov %%gs:%1, %0" \
                        : "=r"(__ret) \
                        : "m"(var)); \
       __ret; })
 
/* SWAPGS: Switching between user and kernel GS */
/* Called at every syscall entry/exit */
 
void syscall_entry(void) {
    /* Swap user's GS (TLS) with kernel's GS (per-CPU) */
    __asm__ volatile("swapgs");
    
    /* Now GS points to current CPU's per-CPU area */
    /* Can access per-CPU data */
}
 
void syscall_exit(void) {
    /* Restore user's GS before returning */
    __asm__ volatile("swapgs");
    
    /* Now GS points to user's TLS again */
}

FSGSBASE Extension:

Originally, setting FS/GS base required privileged MSR operations from kernel code. Intel introduced the FSGSBASE extension (available from Ivy Bridge onward), adding four new instructions:

RDFSBASE: Read FS base into register (user-mode accessible)
WRFSBASE: Write FS base from register (user-mode accessible if enabled)
RDGSBASE: Read GS base into register (user-mode accessible)
WRGSBASE: Write GS base from register (user-mode accessible if enabled)

These instructions are much faster than MSR-based access and allow user-space TLS switching without kernel involvement. The kernel enables this via CR4.FSGSBASE.

GS Base Swap Mechanism

The SWAPGS instruction atomically exchanges the current GS base with the value in IA32_KERNEL_GS_BASE MSR. This enables efficient kernel entry/exit: user mode has its GS base (for TLS), kernel mode has its GS base (for per-CPU data), and SWAPGS switches between them. This must be done exactly once at each kernel entry and exit—getting this wrong causes subtle and catastrophic bugs.

System Segment Descriptors in Long Mode

While code and data segments are simplified to flat models in 64-bit mode, system segment descriptors (TSS and LDT) required expansion to support 64-bit addresses. In long mode, these descriptors are 16 bytes rather than 8 bytes.

Why 16 Bytes?

The TSS and LDT contain base addresses that point to memory structures. In 64-bit mode, these structures can be located anywhere in the 64-bit address space, but the original 8-byte descriptor format only has room for a 32-bit base address. The solution: expand system segment descriptors to 16 bytes to accommodate the additional 32 bits.

TSS in Long Mode:

The Task State Segment in 64-bit mode serves a different purpose than in 32-bit mode:

Hardware task switching is disabled (no task gates, no automatic TSS saving)
TSS is still required for the RSP0 stack pointer (kernel stack for Ring 3 → Ring 0 transitions)
Interrupt Stack Table (IST) is new: provides dedicated stacks for specific interrupt handlers

64-bit System Segment Descriptor Format

Layout

64-bit TSS/LDT Descriptor (16 bytes, occupies 2 GDT slots):
 
Bytes 0-7 (Standard format, similar to 32-bit):
┌────────────────────────────────────────────────────────────────────┐
│ Byte 7       │ Byte 6        │ Byte 5       │ Byte 4               │
│ Base[31:24]  │ G 0 0 AVL     │ P DPL 0 Type │ Base[23:16]          │
│              │ Limit[19:16]  │              │                      │
├──────────────┴───────────────┴──────────────┴──────────────────────┤
│ Bytes 3-2                    │ Bytes 1-0                           │
│ Base Address [15:0]          │ Segment Limit [15:0]                │
└────────────────────────────────────────────────────────────────────┘
 
Bytes 8-15 (64-bit extension):
┌────────────────────────────────────────────────────────────────────┐
│ Bytes 12-15                                                        │
│ Reserved (must be 0)                                               │
├────────────────────────────────────────────────────────────────────┤
│ Bytes 8-11                                                         │
│ Base Address [63:32]                                               │
└────────────────────────────────────────────────────────────────────┘
 
Type field for 64-bit TSS:
  0x9 = 64-bit TSS (Available)
  0xB = 64-bit TSS (Busy)
  0x2 = 64-bit LDT
 
Example: TSS at address 0x0000000100200000
  Bytes 0-1:  Limit (e.g., 0x0067 for minimum TSS)
  Bytes 2-3:  Base[15:0]  = 0x0000
  Byte 4:     Base[23:16] = 0x20
  Byte 5:     Access (0x89 = P=1, DPL=0, Type=9)
  Byte 6:     Flags/Limit[19:16] (0x00)
  Byte 7:     Base[31:24] = 0x01
  Bytes 8-11: Base[63:32] = 0x00000001
  Bytes 12-15: Reserved = 0x00000000

64-bit TSS Structure:

The 64-bit TSS is significantly different from the 32-bit version:

tss64_structure.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
/* 64-bit Task State Segment structure */
struct tss64 {
    uint32_t reserved0;       /* 0x00: Reserved */
    
    /* Stack pointers for ring transitions */
    uint64_t rsp0;            /* 0x04: RSP for Ring 0 (kernel stack) */
    uint64_t rsp1;            /* 0x0C: RSP for Ring 1 (rarely used) */
    uint64_t rsp2;            /* 0x14: RSP for Ring 2 (rarely used) */
    
    uint64_t reserved1;       /* 0x1C: Reserved */
    
    /* Interrupt Stack Table (IST) - new in 64-bit mode */
    /* Dedicated stacks for specific interrupt handlers */
    uint64_t ist1;            /* 0x24: IST entry 1 */
    uint64_t ist2;            /* 0x2C: IST entry 2 */
    uint64_t ist3;            /* 0x34: IST entry 3 */
    uint64_t ist4;            /* 0x3C: IST entry 4 */
    uint64_t ist5;            /* 0x44: IST entry 5 */
    uint64_t ist6;            /* 0x4C: IST entry 6 */
    uint64_t ist7;            /* 0x54: IST entry 7 */
    
    uint64_t reserved2;       /* 0x5C: Reserved */
    uint16_t reserved3;       /* 0x64: Reserved */
    
    uint16_t iopb_offset;     /* 0x66: I/O Permission Bitmap offset */
    
    /* Optional: I/O Permission Bitmap follows */
    /* uint8_t iopb[...]; */
} __attribute__((packed));
 
/* Minimum 64-bit TSS size is 104 bytes (0x68) */
/* iopb_offset points beyond TSS if no IOPB is used */
 
/* Linux TSS setup (simplified) */
struct tss64 cpu_tss[MAX_CPUS];
 
void setup_tss(int cpu) {
    struct tss64 *tss = &cpu_tss[cpu];
    
    /* Set kernel stack pointer */
    tss->rsp0 = (uint64_t)&kernel_stacks[cpu][KERNEL_STACK_SIZE];
    
    /* Set up IST for critical interrupts */
    tss->ist1 = (uint64_t)&nmi_stacks[cpu][NMI_STACK_SIZE];     /* NMI */
    tss->ist2 = (uint64_t)&df_stacks[cpu][DF_STACK_SIZE];       /* #DF */
    tss->ist3 = (uint64_t)&debug_stacks[cpu][DEBUG_STACK_SIZE]; /* #DB */
    
    /* No I/O permission bitmap */
    tss->iopb_offset = sizeof(struct tss64);
    
    /* Load TSS into GDT and TR register */
    /* ... */
}

Interrupt Stack Table (IST)

The IST is a powerful new feature in 64-bit mode. IDT entries can specify an IST index (1-7), and when that interrupt fires, the CPU automatically switches to the corresponding IST stack. This is crucial for handling nested interrupts and faults that might corrupt the regular stack (like #NMI on top of a page fault, or a double fault). Each IST provides an independent, guaranteed-safe stack.

Compatibility Mode Details

Compatibility mode allows 32-bit (and 16-bit) code to run within a 64-bit operating system. This is how Windows runs 32-bit applications via WoW64 (Windows on Windows 64-bit) and how Linux runs 32-bit binaries on x86-64 systems.

Entering Compatibility Mode:

The system is in long mode (EFER.LME=1, paging enabled), but code executes in compatibility mode when the current CS has L=0. The OS creates separate code segment descriptors:

compatibility_mode_segments.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/* Segment descriptors for 64-bit and compatibility mode */
 
/* 64-bit User Code Segment (L=1, D=0) */
/* Access: 0xFB = P(1) DPL(3) S(1) E(1) DC(1) RW(0) A(1) */
/* Flags:  0xA = G=1, L=1, D=0, AVL=0 */
struct gdt_entry user64_cs = {
    .limit_low = 0xFFFF,
    .base_low = 0x0000,
    .base_middle = 0x00,
    .access = 0xFB,        /* Ring 3, code, readable */
    .granularity = 0xAF,   /* G=1, L=1, D=0 (64-bit mode) */
    .base_high = 0x00
};
 
/* 32-bit User Code Segment for Compatibility Mode (L=0, D=1) */
/* Access: 0xFB = P(1) DPL(3) S(1) E(1) DC(1) RW(0) A(1) */
/* Flags:  0xC = G=1, L=0, D=1, AVL=0 */
struct gdt_entry user32_cs = {
    .limit_low = 0xFFFF,
    .base_low = 0x0000,
    .base_middle = 0x00,
    .access = 0xFB,        /* Ring 3, code, readable */
    .granularity = 0xCF,   /* G=1, L=0, D=1 (32-bit compat mode) */
    .base_high = 0x00
};
 
/* Switching from 64-bit to compatibility mode */
void switch_to_compat_mode(void) {
    /* Far jump to 32-bit code segment */
    __asm__ volatile(
        "push $0x23\n"      /* USER32_CS selector with RPL=3 */
        "push $compat_entry\n"
        "lretq\n"           /* Far return loads new CS */
        ".code32\n"
        "compat_entry:\n"
        /* Now executing in 32-bit compatibility mode */
        ".code64\n"
        :
    );
}

Segmentation in Compatibility Mode:

When running in compatibility mode, full 32-bit segmentation is active:

Segment bases are used (not fixed to 0)
Segment limits are checked
Type validation occurs
Expand-down segments work
All 32-bit protected mode rules apply

This is essential for running legacy 32-bit applications that may have used segment-based features (rare but possible, especially in older software).

The Thunk Layer:

When a 32-bit application makes a system call on a 64-bit OS, a transition occurs:

The 32-bit application invokes a syscall (via SYSENTER, INT 0x80, or similar)
The kernel traps the call and switches to 64-bit mode
The kernel processes the syscall using 64-bit code
Results are transformed back to 32-bit format
The kernel returns to compatibility mode

This thunk layer handles differences in calling conventions, pointer sizes, and data structure layouts.

64-bit Mode vs. Compatibility Mode
Aspect	64-bit Mode (L=1)	Compatibility Mode (L=0)
Register width	64-bit (RAX, RBX, ...)	32-bit (EAX, EBX, ...)
Default operand size	32-bit (64-bit with REX.W)	32-bit (16-bit with 0x66 prefix)
Default address size	64-bit	32-bit
Segment bases	Fixed at 0 (except FS/GS)	Used from descriptor
Segment limits	Ignored	Checked
RIP-relative addressing	Available	Not available
New registers (R8-R15)	Available	Not available
REX prefix	Valid	Invalid (different encoding)

Mixed-Mode Binaries

A single process can theoretically switch between 64-bit and 32-bit code (and even 16-bit code) by changing the CS selector. This is exploited by some code (including malware and some sandboxing techniques) to confuse analysis tools. It's known as 'Heaven's Gate' in the Windows security community, referring to the technique of calling 64-bit system calls from 32-bit code by temporarily switching to 64-bit mode.

Practical Implications for Modern OS Design

The x86-64 segmentation simplifications have significant implications for operating system design and security.

Memory Protection Architecture:

With segment limits no longer functional in 64-bit mode, all memory protection comes from paging:

User/Supervisor separation: Page table entries have a U/S bit
Read/Write protection: Page table entries have an R/W bit
Execute protection (NX/XD): Page table entries have an NX bit
SMAP (Supervisor Mode Access Prevention): Prevents kernel from accessing user pages
SMEP (Supervisor Mode Execution Prevention): Prevents kernel from executing user pages

These paging features provide finer-grained protection than segmentation ever could.

Modern OS Memory Protection Stack

•Privilege Rings (from segmentation): Still used—Ring 0 for kernel, Ring 3 for user
•Page Tables: Primary protection mechanism—U/S, R/W, NX bits per page
•SMEP: Hardware blocks execution of user pages from kernel mode
•SMAP: Hardware blocks access to user pages from kernel mode (unless AC flag set)
•KAISER/KPTI: Unmaps kernel pages from user page tables (Meltdown mitigation)
•ASLR: Randomizes address space layout (software, uses paging)
•Intel MPK/PKU: Memory Protection Keys for user-space compartmentalization

Simplified GDT Layout:

A minimal 64-bit GDT can be remarkably simple compared to elaborate 32-bit configurations:

minimal_gdt64.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/* Minimal 64-bit GDT for a modern OS */
 
/* With flat memory model, only need: */
/*  0x00: Null descriptor (required) */
/*  0x08: Kernel code segment (64-bit, Ring 0) */
/*  0x10: Kernel data segment (Ring 0) */
/*  0x18: User code segment (64-bit, Ring 3) */
/*  0x20: User data segment (Ring 3) */
/*  0x28: TSS descriptor (16 bytes, Ring 0) */
/*  [Optional: 32-bit compatibility segments] */
 
struct gdt_entry64 gdt64[] = {
    /* 0x00: Null */
    { 0, 0, 0, 0, 0, 0 },
    
    /* 0x08: Kernel Code (64-bit) */
    { 0xFFFF, 0, 0, 0x9A, 0xAF, 0 },  /* L=1, D=0 */
    
    /* 0x10: Kernel Data */
    { 0xFFFF, 0, 0, 0x92, 0xCF, 0 },
    
    /* 0x18: User Code (64-bit) */
    { 0xFFFF, 0, 0, 0xFA, 0xAF, 0 },  /* L=1, D=0, DPL=3 */
    
    /* 0x20: User Data */
    { 0xFFFF, 0, 0, 0xF2, 0xCF, 0 },  /* DPL=3 */
    
    /* 0x28-0x30: TSS (16 bytes) - filled at runtime */
    { 0, 0, 0, 0, 0, 0 },
    { 0, 0, 0, 0, 0, 0 },
};
 
/* That's it! 7 entries (56 bytes) + TSS for a complete system */
/* Compare to elaborate 32-bit GDTs with call gates, etc. */

Security Research Implications:

The segmentation simplification affects security research and exploitation:

Easier Exploitation: Without segment limit checks, buffer overflows have fewer hardware barriers
Simpler Analysis: Analysts don't need to worry about segment arithmetic and far pointers
FS/GS Abuse: The remaining functional segments (FS/GS) become attack targets—corrupting FS base can hijack TLS access
Compatibility Mode Tricks: Switching to compatibility mode can confuse 64-bit analysis tools
Mitigation Focus: Security mitigations focus on paging (NX, ASLR, SMEP, SMAP) rather than segmentation

Historical Segment-Based Protections

Some older security research explored using segment limits for overflow protection—if an array's segment limit matched its size, hardware would detect out-of-bounds access. Intel's MPX (Memory Protection Extensions) was a later attempt at hardware bounds checking. However, MPX was deprecated due to performance overhead and limited adoption. Modern approaches focus on software bounds checking and hardware features like Intel CET (Control-flow Enforcement Technology).

Summary: Modern x86-64 Changes

We've traced the evolution from protected mode's elaborate segmentation to x86-64's streamlined flat model, understanding what changed, what survived, and why.

Key Takeaways

•Segmentation was deprecated because no one used it—OSes universally adopted flat models with paging
•Long mode has two sub-modes: 64-bit mode (CS.L=1) with simplified segments, and compatibility mode (CS.L=0) with full 32-bit segmentation
•In 64-bit mode, segment bases are fixed at 0 (except FS and GS) and limits are ignored
•Privilege levels survive intact—Ring 0/3 distinction remains fundamental
•FS and GS provide per-context data access—TLS (user) and per-CPU data (kernel)
•System segment descriptors (TSS, LDT) expanded to 16 bytes to support 64-bit base addresses
•The TSS gained the Interrupt Stack Table (IST) for dedicated interrupt stacks
•Compatibility mode enables legacy 32-bit applications on 64-bit operating systems
•All memory protection now comes from paging—enhanced with SMEP, SMAP, KPTI, and more

Module Conclusion:

This concludes our deep dive into the Intel x86 Memory Model. We've journeyed from the foundations of protected mode through the GDT, LDT, segment selectors, and the evolution to x86-64. You now possess the knowledge to understand bootloader code, kernel initialization, memory protection mechanisms, and the architectural decisions that shaped modern operating systems.

The x86 memory model—even in its simplified 64-bit form—remains the foundation on which billions of devices operate daily. Understanding this foundation empowers you to work at the deepest levels of systems software.

Module Complete

Congratulations! You've completed Module 6: Intel x86 Memory Model. You now understand protected mode architecture, descriptor tables, segment selectors, and the transition to x86-64. This knowledge forms essential groundwork for operating system development, security research, and deep systems programming.