Loading content...
The combined segmentation-paging model we've studied was the dominant memory management paradigm from the mid-1980s through the early 2000s. But if you examine a modern operating system—Linux, Windows, macOS—you'll find that segmentation has largely faded into the background. Modern systems present processes with a flat virtual address space managed almost entirely through paging.
This shift wasn't arbitrary. It reflects hard-won lessons about portability, performance, and complexity. Yet segmentation hasn't disappeared entirely. In x86-64 systems, segment registers still exist, certain segment-related features remain essential, and new uses have emerged for what was once a foundational memory management mechanism.
This final page traces this evolution, explains why the industry moved toward flat memory, documents what segmentation features persist in modern systems, and examines how contemporary operating systems bridge the gap between legacy and modern approaches.
By the end of this page, you will understand why segmentation gave way to flat memory models, how modern x86-64 handles (and largely ignores) segmentation, what segment-related features remain important in contemporary systems, and how operating systems like Linux and Windows use the surviving segment mechanisms.
The transition from segmented to flat memory models occurred gradually during the 1990s, driven by multiple converging factors. Understanding this history illuminates both the limitations of segmentation and the principles that guide modern OS design.
Factor 1: Portability Concerns
The rise of Unix and the C programming language created pressure for portable code. Segmentation, as implemented in x86, was architecturally specific:
Operating system developers chose to emulate a flat model on x86 rather than segmented models on flat architectures. This meant configuring x86 segments with base=0 and limit=max, effectively neutering segmentation while maintaining hardware compatibility.
| Architecture | Memory Model | Segmentation Support | Dominant OS |
|---|---|---|---|
| x86 (32-bit) | Segmented + Paging | Full hardware support | DOS, Windows, Linux |
| MIPS R4000 | Flat, Paged | None | IRIX, embedded |
| SPARC | Flat, Paged | None | Solaris |
| Alpha | Flat, Paged | None | Digital UNIX, VMS |
| PA-RISC | Flat, Paged | Limited | HP-UX |
| PowerPC | Flat, Paged | Optional | AIX, macOS |
| ARM | Flat, Paged | None | RISC OS, later Linux |
Factor 2: Complexity and Performance
Combined segmentation-paging added complexity with limited benefit:
Paging alone provided sufficient memory virtualization. Adding segmentation on top didn't solve problems that paging couldn't solve independently—it just added overhead.
Factor 3: Compiler and Language Evolution
Programming language and compiler improvements reduced segmentation's appeal:
Factor 4: 64-bit Transition
The move to 64-bit computing (AMD64/x86-64) provided a natural breaking point. With address spaces expanding from 4 GB to effectively unlimited, the 'protection' argument for segments weakened—you could simply use different address ranges. AMD/Intel took this opportunity to essentially disable segmentation in 64-bit mode.
It's worth noting that segmentation originated in MULTICS (1960s), an academic system where segments represented named objects with capability-based access control. x86 segmentation was a simplified version. The industry's rejection of x86 segmentation doesn't invalidate segmentation's theoretical elegance—it reflects that x86's implementation didn't justify its cost/complexity ratio in practice.
When AMD designed the x86-64 architecture (later adopted by Intel), they made deliberate choices to enforce flat memory while maintaining backward compatibility. Understanding these choices clarifies how modern x86 systems actually work.
Long Mode Segmentation Changes:
In 64-bit long mode, segmentation is dramatically simplified:
Segment base addresses are ignored (mostly):
Segment limits are ignored:
Only certain descriptor fields matter:
x86 32-bit Protected Mode vs x86-64 Long Mode Segmentation════════════════════════════════════════════════════════════════ Feature 32-bit Protected 64-bit Long Mode────────────────────────────────────────────────────────────────CS base Used Forced to 0DS/ES/SS base Used Forced to 0FS base Used Used (from descriptor OR MSR)GS base Used Used (from MSR)Segment limit Enforced Ignored (for code/data)DPL checking Active Active (ring protection)Type checking Active Active (code vs data)L bit N/A Required for 64-bit codeGDT required Yes Yes (but minimal)LDT supported Yes Mostly disabled ──────────────────────────────────────────────────────────────── Typical 64-bit OS GDT (Minimal): Entry 0: Null descriptor (required)Entry 1: 64-bit kernel code (L=1, DPL=0)Entry 2: Kernel data (DPL=0)Entry 3: 64-bit user code (L=1, DPL=3)Entry 4: User data (DPL=3)Entry 5: TSS descriptor (64-bit TSS)Entry 6: (Optional 32-bit compatibility segments for running 32-bit apps) Note: Base and limit fields in entries 1-4 are ignored by hardware. They're set to 0/max by convention but have no effect.FS and GS: The Survivors
Interestingly, the FS and GS segment bases remain functional in 64-bit mode. This exception exists because operating systems found these segments useful for per-CPU and per-thread data:
FS Register:
GS Register:
Hardware Support:
Setting segment bases via MSRs (Model-Specific Registers) is faster than loading segment registers, which would require GDT access and descriptor parsing. Since FS and GS bases change frequently (on context switch), the MSR path reduces overhead. Modern processors also provide WRFSBASE/WRGSBASE instructions that are even faster.
Linux adopted a flat memory model early in its x86 port history. Examining how Linux uses (and avoids) segmentation provides practical insight into modern OS memory management.
Linux x86-64 Segment Setup:
Linux configures five essential GDT entries:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
// From Linux kernel arch/x86/include/asm/segment.h (simplified) // GDT entry numbers#define GDT_ENTRY_KERNEL_CS 1 // Selector 0x08#define GDT_ENTRY_KERNEL_DS 2 // Selector 0x10#define GDT_ENTRY_DEFAULT_USER_CS 3 // Selector 0x1B (with RPL=3)#define GDT_ENTRY_DEFAULT_USER_DS 4 // Selector 0x23 (with RPL=3)#define GDT_ENTRY_TSS 5 // Selector 0x28 // Segment selectors#define __KERNEL_CS (GDT_ENTRY_KERNEL_CS * 8) // 0x08#define __KERNEL_DS (GDT_ENTRY_KERNEL_DS * 8) // 0x10#define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS * 8 | 3) // 0x1B (RPL=3)#define __USER_DS (GDT_ENTRY_DEFAULT_USER_DS * 8 | 3) // 0x23 (RPL=3) // Kernel code segment descriptor// In 64-bit mode, base and limit are ignored// Key fields: L=1 (long mode), DPL=0, Type=0xA (execute/read)static struct desc_struct gdt_entries[] = { [GDT_ENTRY_KERNEL_CS] = { .limit0 = 0xFFFF, .base0 = 0, .base1 = 0, .type = 0xA, // Execute/Read .s = 1, // Code/Data (not system) .dpl = 0, // Ring 0 .p = 1, // Present .limit1 = 0xF, .avl = 0, .l = 1, // Long mode (64-bit) .d = 0, // Must be 0 when L=1 .g = 1, // Page granularity .base2 = 0, }, // User code segment (DPL=3) [GDT_ENTRY_DEFAULT_USER_CS] = { // ... similar but DPL=3, L=1 for 64-bit user code }, // Data segments have L=0, D=1 (32-bit operand size for compatibility)}; /* * Per-CPU GDT with additional entries for: * - TLS (Thread Local Storage) descriptors * - Per-CPU data area * - 32-bit compatibility code segments (for running 32-bit binaries) */Linux Use of FS and GS:
GS for Per-CPU Data (Kernel):
Linux kernel uses GS to access per-CPU data structures efficiently:
// Access current task quickly
current = this_cpu_read(current_task);
// Expands to: gs-relative load from per-CPU area
// Get CPU number without function call
cpu = raw_smp_processor_id(); // Uses GS-relative access
The GS base is set to each CPU's per-CPU data area during boot. SWAPGS swaps between kernel GS and user GS on system call entry/exit.
FS for Thread-Local Storage (User Mode):
User-space programs use FS for TLS:
// GCC/glibc thread-local variable
__thread int my_var = 17;
// Compiles to:
mov %fs:offset_of_my_var, %eax
// libc sets FS.base = &thread_control_block for each thread
The arch_prctl() system call allows setting FS.base for thread-local storage.
| Register | Kernel Mode | User Mode |
|---|---|---|
| CS | 0x08 (kernel code) | 0x1B (user code, ring 3) |
| DS | 0x10 (kernel data) | 0x23 (user data, ring 3) |
| ES | 0x10 (same as DS) | 0x23 (same as DS) |
| SS | 0x10 (kernel stack) | 0x23 (user stack) |
| FS | Unused | TLS base (set by libc/glibc) |
| GS | Per-CPU data base | Usually unused (or for custom TLS) |
The SWAPGS instruction swaps GS.base with IA32_KERNEL_GS_BASE MSR atomically. It's executed on every kernel entry (syscall, interrupt). Failing to execute SWAPGS correctly was the source of security vulnerabilities—if kernel code runs with user's GS or vice versa, information leaks or privilege escalation could occur.
Windows takes a slightly different approach, driven by its strong commitment to backward compatibility and its specific thread management architecture.
The Thread Environment Block (TEB):
Windows uses segment registers to provide fast access to thread-specific data:
32-bit Windows (x86):
64-bit Windows (x64):
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
// Windows Thread Environment Block access patterns // 32-bit Windows (FS-relative)// Structure at FS:0struct TEB_32 { /* 0x00 */ void* ExceptionList; // SEH chain /* 0x04 */ void* StackBase; // Top of stack /* 0x08 */ void* StackLimit; // Bottom of stack /* 0x0C */ void* SubSystemTib; /* ... */ /* 0x18 */ void* EnvironmentPointer; /* ... */ /* 0x30 */ void* PEB; // Process Environment Block /* 0x34 */ DWORD LastErrorValue; /* ... */ /* 0x2C */ void* ThreadLocalStorage; // TLS array}; // User-mode access to TEB (32-bit)__declspec(naked) struct TEB_32* NtCurrentTeb_x86() { __asm mov eax, fs:[0x18] // TEB self-pointer at offset 0x18 __asm ret} // Get last error efficientlyDWORD GetLastError_inline() { return *(DWORD*)(__readfsdword(0x34)); // Compiles to: mov eax, fs:[0x34]} // 64-bit Windows (GS-relative)// Structure at GS:0struct TEB_64 { /* 0x00 */ void* ExceptionList; /* 0x08 */ void* StackBase; /* 0x10 */ void* StackLimit; /* ... */ /* 0x30 */ void* EnvironmentPointer; /* ... */ /* 0x60 */ void* PEB; /* 0x68 */ DWORD LastErrorValue; /* ... */ /* 0x58 */ void* ThreadLocalStorage;}; // User-mode access to TEB (64-bit)struct TEB_64* NtCurrentTeb_x64() { return (struct TEB_64*)__readgsqword(0x30); // Compiles to: mov rax, gs:[0x30] (self-pointer)} // TLS access pattern (Windows)__declspec(thread) int myTlsVar = 42; // Compiled code accesses TLS array via FS/GS:// 1. Load TLS array pointer from TEB// 2. Index by TLS slot number// 3. Access variable at offset within slotWindows GDT Structure:
Windows maintains a more complex GDT than Linux to support various compatibility modes:
| Entry | Purpose |
|---|---|
| 0 | Null descriptor |
| 1 | Kernel mode code (Ring 0) |
| 2 | Kernel mode data (Ring 0) |
| 3 | 32-bit user code (Ring 3) |
| 4 | User mode data (Ring 3) |
| 5 | TSS 64 descriptor (16 bytes) |
| 7 | 64-bit user code (Ring 3) |
| 8 | TEB descriptor (per-thread, updated on switch) |
| ... | Additional entries for WoW64, VDM, etc. |
Windows and Compatibility:
Windows supports running 32-bit applications on 64-bit (WoW64 - Windows-on-Windows 64). This requires:
Windows Structured Exception Handling (SEH) relies on FS/GS segment access. The exception handler chain is rooted at FS:[0] (32-bit) or GS:[0] (64-bit). When an exception occurs, the handler walks this chain to find an appropriate handler. This deep integration with segments explains why Windows cannot simply eliminate segment usage.
Despite the flat memory trend, several segmentation-related features remain important in modern systems:
1. Ring-Based Protection
Segment DPL still enforces the ring protection model:
However, protection within rings comes from paging, not segments.
2. Hardware Task Switching (TSS)
The TSS remains essential even though hardware task switching is unused:
Every x86-64 OS must maintain a valid TSS for interrupts to work correctly.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
// 64-bit Task State Segment (TSS) Structure// Used for interrupt stack switching, NOT hardware task switching struct tss64 { uint32_t reserved1; // Ring 0, 1, 2 stack pointers (RSPn) uint64_t rsp0; // Kernel stack pointer (used on ring 3→0 transition) uint64_t rsp1; // Ring 1 stack (unused in most OS) uint64_t rsp2; // Ring 2 stack (unused in most OS) uint64_t reserved2; // Interrupt Stack Table (IST1-7) // Used for specific interrupts that need dedicated stacks uint64_t ist1; // Double fault handler stack uint64_t ist2; // NMI handler stack uint64_t ist3; // Machine Check handler stack uint64_t ist4; // Available for OS use uint64_t ist5; // Available for OS use uint64_t ist6; // Available for OS use uint64_t ist7; // Available for OS use uint64_t reserved3; uint16_t reserved4; uint16_t iomap_base; // Offset to I/O permission bitmap // I/O permission bitmap (up to 8KB, variable size) // Each bit controls access to one I/O port // uint8_t iomap[8192]; // Optional, follows TSS } __attribute__((packed)); /* * Critical uses in modern OS: * * 1. RSP0: When interrupt occurs in ring 3, CPU loads RSP from tss->rsp0 * This gives kernel a valid stack before handling the interrupt. * * 2. IST1-7: IDT entries can specify an IST index (1-7) * CPU will use ist[n] as stack regardless of current stack * Essential for handling double faults, NMI where current stack is bad * * 3. IOMAP_BASE: If set, CPU checks permission bitmap for IN/OUT in ring 3 * Allows user-mode drivers controlled I/O port access */3. Thread-Local Storage (TLS)
FS and GS segment bases enable efficient TLS:
mov rax, fs:[offset]4. Kernel/Per-CPU Data Access
Linux GS-relative addressing for per-CPU data:
this_cpu_read(variable) → mov %gs:offset, %reg
this_cpu_write(variable) → mov %reg, %gs:offset
No atomic operations needed—each CPU has its own GS base.
5. ASLR and Stack Canaries
Some security features leverage segment-relative addressing:
cmp fs:[canary_offset], expected| Feature | Still Used? | Purpose | Mechanism |
|---|---|---|---|
| Segment base (DS/ES/SS/CS) | No | N/A (forced to 0) | — |
| Segment base (FS) | Yes | User TLS | MSR or WRFSBASE |
| Segment base (GS) | Yes | Kernel per-CPU | MSR + SWAPGS |
| Segment limit checking | No | N/A (ignored) | — |
| Segment DPL (privilege) | Yes | Ring protection | CPL vs DPL on load |
| TSS ring stacks (RSP0) | Yes | Interrupt handling | CPU reads on int |
| TSS IST entries | Yes | Critical interrupts | IDT IST field |
| Task gates | No | N/A (obsolete) | — |
| Call gates | Rare | Some VMs use | CALL FAR |
| LDT | Rare | Some VMs, Wine | LLDT |
Virtual machine monitors sometimes use segmentation features for isolation. For example, some versions of Xen used ring 1 for paravirtualized guests, leveraging the segment-based ring protection. VMware and hardware-assisted VT-x use different approaches but still interact with GDT/LDT for guest OS compatibility.
What does the future hold for segmentation in x86 and beyond? The trend is clear, but the legacy persists.
x86-64 Trajectory:
Intel and AMD are unlikely to remove segment registers entirely—doing so would break backward compatibility with 32-bit applications and existing operating systems. However:
Alternative Protection Mechanisms:
Modern CPUs add protection features via paging extensions rather than segments:
Memory Protection Keys (PKU):
Control-flow Enforcement (CET):
ARM and RISC-V:
| Mechanism | Architecture | Purpose | Relation to Segments |
|---|---|---|---|
| Page protection bits | All | R/W/X permissions | Replaces segment protection |
| NX (No-Execute) | All | Prevent code injection | Page-level, not segment |
| SMEP/SMAP | x86 | Kernel can't exec/access user | Via paging, independent of segments |
| Memory Protection Keys | x86 | Domain-based protection | Via paging, independent of segments |
| CET Shadow Stack | x86 | Return address protection | Via paging, new stack type |
| Pointer Auth (PAC) | ARM | Pointer integrity | No segments involved |
| MTE (Memory Tagging) | ARM | Use-after-free detection | No segments involved |
| PMP (Physical Mem Prot) | RISC-V | Physical memory access control | No segments involved |
Lessons for System Architecture:
The segmentation story offers valuable lessons for system designers:
Simplicity wins at scale. Pure paging outcompeted combined segmentation-paging partly because it was simpler to understand, implement, and optimize.
Portability drives abstraction. The desire for portable C programs pushed OSes toward the simplest common memory model—flat address spaces.
Hardware features need software adoption. Segmentation's theoretical elegance didn't translate to practical benefit because software ecosystems built around flat memory.
Backward compatibility constrains evolution. x86-64 couldn't simply remove segments; it had to keep them but reduce their impact.
Protection mechanisms evolve. New threats require new mitigations. Modern CPUs add protections through paging extensions rather than resurrecting segments.
Segmentation's legacy in x86 serves as a reminder that architectures accumulate history, and understanding that history is essential for working effectively with modern systems.
Interestingly, capability-based addressing (the academic ancestor of segmentation) is experiencing a renaissance through projects like CHERI (Capability Hardware Enhanced RISC Instructions). CHERI extends pointers to include bounds and permissions, providing fine-grained memory safety. This suggests that while x86 segmentation is dead, the underlying concepts of bounded, protected memory regions remain valuable—just implemented differently.
We've traced the evolution from active segmentation to flat memory and examined what remains of segmentation in modern systems. Let's consolidate the key insights:
Module Complete:
This concludes our exploration of Segmentation with Paging. We've examined:
You now possess comprehensive knowledge of how segmentation and paging can work together, why this approach dominated an era of computing, and how modern systems have evolved beyond it while maintaining necessary compatibility.
You've completed Module 5: Segmentation with Paging. This deep understanding of combined memory management schemes—their mechanics, their history, and their modern remnants—provides essential context for understanding contemporary operating systems, where the ghosts of segmentation still influence design even as paging dominates implementation.